What is index?
An index is a data structure that optimizes searching and accessing the data. It’s like an index at the back of a book. Indexing is one of the ways that will optimize your database searching and access is having indexes on the columns that you usually access the table using it.
What the DBMS will do when you ask for a specific row, it will go sequentially and check one by one row; “Is this exactly the row that I need?”, If yes return it, if no, keep searching until the end.
But, we have a better way to do that. An index, as we’ve mentioned, is a data structure, it won’t be obvious for you, but it’s stored inside the DBMS, most commonly as a B- tree.
By default, Most of the DBMS automatically create an index on primary and unique columns.
How do Indexes work?
Let’s assume that you have an index for a primary key. This will create an ordered list of primary key values in a separate table, each item has a pointer that points to the relative value in the original table.
So, whenever you want to access a table using the primary key, it will use binary search algorithm (takes time of O(LogN)) to access the required value in the Index table, and then, go to the relative value in the original table.
In addition, definitely, we can create another index on another column, even if it’s a non-primary column, like Order date, assuming that you usually access the table using that column (pic above).
The decision for choosing another column (besides the primary key) to be indexed can be delayed until the database has been used for a while. This is because we want to know how users are really using our database, and what kind of queries they’re running rather than how we hoped or thought.
In this picture, we can also create an index on a combination of columns, meaning if you often access the table using the Town, and LastName, you can create an index on both, the Town and LastName.
Now, the Index table will be sorted according to the Town, and for each value of the Town it will be sorted according to the LastName.
When you access the data, It’s more efficient to specify the columns in the right order as in the index definition. So, here, it should be Town, then LastName, and not the vice-versa.
Clustered & Non-Clustered Indexes
Some DBMS has different implementation of indexes. They use the idea of clustered & non-clustered index.
— Clustered Index
Every table can have one and only one clustered index. The most common clustered index in any database table is the primary key column.
The database will then order the data in the table based on the clustered index. The binary search algorithm will be used to get the required data in the table (already ordered).
— Non-Clustered Indexes
If you found yourself often also accessing the data using another column, we can create a secondary index; a non-clustered index.
Let’s say we want to create a secondary index for the last name, while the clustered index is the employee id. It’s created in a separate table, it has two columns, one for the last name, and one for the corresponding employee id.
The created table is now sorted by last name, the way that we can’t actually do in the employee table because we’re already sorted by the employee id.
It’s not as quick as using the clustered index. Why? We still need to read from the table created for secondary index then jump to the employee table to get to a specific employee. But, it’s much quicker than a full table scan.
A table can have only one clustered index, while it can have more than one non-clustered index.
Indexes are a great way to optimize the performance, but you don’t have to put the index on every column of the data table.
Yes, it’s much quicker than the sequential search. However, if your column changes a lot in your original table, it will take a long time to insert data into index table. Keep in mind that every update, or insert, or even delete has to reflect back on the Index table. In addition, since every index is a new table, it requires additional space.
The moral of the story, indexes will improve the performance for searching, use them on the columns that you will access a lot.
We can CREATE, ALTER, or DROP an index.
There are two ways to create an index, either when creating a table, or using the CREATE INDEX statement.
-- Create an index in CREATE TABLE (duplicate values are allowed) CREATE TABLE books ( title VARCHAR(255), INDEX index_name (title) ); -- Create an index in CREATE TABLE (duplicate values aren't allowed) CREATE TABLE books ( title VARCHAR(255), UNIQUE INDEX index_name (title) ); -- Create an index (duplicate values are allowed) CREATE INDEX index_name ON table_name (column_name); -- Create a unique index (duplicate values aren't allowed) CREATE UNIQUE INDEX index_name ON table_name (column_name);
The CREATE INDEX statement is mapped to an ALTER TABLE statement to create indexes (see ALTER).
If you want to create an index on a combination of columns, you can list the column names within the parentheses, separated by commas.
CREATE INDEX index_name ON table_name (column_A, column_B, ...);
The ALTER TABLE statement changes the structure of a table. One example is to create or destroy indexes.
-- Add an index to existing table ALTER TABLE
-- Add a unique (no duplicates) index to existing table ALTER TABLE
tbl_nameADD UNIQUE INDEX
-- Delete an index in an existing table ALTER TABLE
The DROP INDEX statement deletes an index a table.
The DROP INDEX statement is mapped to an ALTER TABLE statement to drop the index (see ALTER).