How to make queries faster with multicolumn indexes

January 19, 2023

This article introduces multicolumn indexes and explains how they can by used to make large queries run faster.

1. Example

 

In layman’s terms, a multicolumn index means an index created for multiple columns.

The main purpose of using a multicolumn index is to retrieve data faster from the table.

Keeping this in mind, when there is less data and you are trying to fetch it from the table, a sequential scan will be used instead of index scan, as the cost for doing a sequential scan will be less since the data count is lower.

Let's understand this by looking at an example.

 

Example

Create a table:

CREATE TABLE class_records(id INT,firstname VARCHAR(50),lastname VARCHAR(50));

 

Insert a few records like below, at least up to 500 records:

edb=# insert into class_records values (1,'harman','baweja');

INSERT 0 1

edb=# insert into class_records values (2,'hardik','pandya');

INSERT 0 1

edb=# insert into class_records values (3,'tom','lane');

INSERT 0 1

 

Before creating an index, let’s look at the explain plan for a query:

edb=# explain select * from class_records where lastname='lane';

                           QUERY PLAN                            

-----------------------------------------------------------------

 Seq Scan on class_records  (cost=0.00..180.00 rows=18 width=17)

   Filter: ((lastname)::text = 'lane'::text)

(2 rows)

 

Now, let's see what happens after creating the index.

Please note we will be creating indexes on the two columns lastname and firstname. We are presuming that lastname is widely used while fetching data from the class_records table. So we are prioritizing the lastname column over firstname here.  

edb=# create index class_records_index1 on class_records (lastname,firstname);

CREATE INDEX

 

From the explain plan, we can see that an index scan is now used instead of a sequential scan:

edb=# explain select * from class_records where lastname='lane';

                                     QUERY PLAN                                     

------------------------------------------------------------------------------------

 Bitmap Heap Scan on class_records  (cost=4.42..42.76 rows=18 width=17)

   Recheck Cond: ((lastname)::text = 'lane'::text)

   ->  Bitmap Index Scan on class_records_index1  (cost=0.00..4.42 rows=18 width=0)

         Index Cond: ((lastname)::text = 'lane'::text)

(4 rows)

 

Now, let's use both the columns, firstname as well as lastname, and let's observe the difference:

edb=# explain select * from class_records where lastname='lane' and firstname='tom';

                                        QUERY PLAN                                         

-------------------------------------------------------------------------------------------

 Index Scan using class_records_index1 on class_records  (cost=0.29..8.30 rows=1 width=17)

   Index Cond: (((lastname)::text = 'lane'::text) AND ((firstname)::text = 'tom'::text))

(2 rows)

 

If we observe the result above while using both the columns (lastname and firstname), in the exact sequence or the order in which we created the index the query takes less time for execution, whereas from the result below, with firstname only, it's clear that the index scan is not even used.

edb=# explain select * from class_records where firstname='tom';

                           QUERY PLAN                            

-----------------------------------------------------------------

 Seq Scan on class_records  (cost=0.00..180.04 rows=18 width=17)

   Filter: ((firstname)::text = 'tom'::text)

(2 rows)

 

NOTE: As of now, multicolumn index is supported with B-tree, GiST, GIN, and BRIN indexes. There is a hard limit of 32 columns, although you can change it by tweaking the file pg_config_manual.h.

 

 

Share this

Relevant Blogs

More Blogs