How to make queries faster with multicolumn indexes

Ranjeet Dhumal

January 19, 2023

This article introduces multicolumn indexes and explains how they can by used to make large queries run faster.

1. Example

In layman’s terms, a multicolumn index means an index created for multiple columns.

The main purpose of using a multicolumn index is to retrieve data faster from the table.

Keeping this in mind, when there is less data and you are trying to fetch it from the table, a sequential scan will be used instead of index scan, as the cost for doing a sequential scan will be less since the data count is lower.

Let's understand this by looking at an example.

Example

Create a table:

CREATE TABLE class_records(id INT,firstname VARCHAR(50),lastname VARCHAR(50));

Insert a few records like below, at least up to 500 records:

edb=# insert into class_records values (1,'harman','baweja');

INSERT 0 1

edb=# insert into class_records values (2,'hardik','pandya');

INSERT 0 1

edb=# insert into class_records values (3,'tom','lane');

INSERT 0 1

Before creating an index, let’s look at the explain plan for a query:

edb=# explain select * from class_records where lastname='lane';

                           QUERY PLAN                            

-----------------------------------------------------------------

 Seq Scan on class_records  (cost=0.00..180.00 rows=18 width=17)

   Filter: ((lastname)::text = 'lane'::text)

(2 rows)

Now, let's see what happens after creating the index.

Please note we will be creating indexes on the two columns lastname and firstname. We are presuming that lastname is widely used while fetching data from the class_records table. So we are prioritizing the lastname column over firstname here.

edb=# create index class_records_index1 on class_records (lastname,firstname);

CREATE INDEX

From the explain plan, we can see that an index scan is now used instead of a sequential scan:

edb=# explain select * from class_records where lastname='lane';

                                     QUERY PLAN                                     

------------------------------------------------------------------------------------

 Bitmap Heap Scan on class_records  (cost=4.42..42.76 rows=18 width=17)

   Recheck Cond: ((lastname)::text = 'lane'::text)

   ->  Bitmap Index Scan on class_records_index1  (cost=0.00..4.42 rows=18 width=0)

         Index Cond: ((lastname)::text = 'lane'::text)

(4 rows)

Now, let's use both the columns, firstname as well as lastname, and let's observe the difference:

edb=# explain select * from class_records where lastname='lane' and firstname='tom';

                                        QUERY PLAN                                         

-------------------------------------------------------------------------------------------

 Index Scan using class_records_index1 on class_records  (cost=0.29..8.30 rows=1 width=17)

   Index Cond: (((lastname)::text = 'lane'::text) AND ((firstname)::text = 'tom'::text))

(2 rows)

If we observe the result above while using both the columns (lastname and firstname), in the exact sequence or the order in which we created the index the query takes less time for execution, whereas from the result below, with firstname only, it's clear that the index scan is not even used.

edb=# explain select * from class_records where firstname='tom';

                           QUERY PLAN                            

-----------------------------------------------------------------

 Seq Scan on class_records  (cost=0.00..180.04 rows=18 width=17)

   Filter: ((firstname)::text = 'tom'::text)

(2 rows)

NOTE: As of now, multicolumn index is supported with B-tree, GiST, GIN, and BRIN indexes. There is a hard limit of 32 columns, although you can change it by tweaking the file pg_config_manual.h.

Resource Feature Callout 1

Resource Feature Callout 2

How to make queries faster with multicolumn indexes

Ranjeet Dhumal

Example

Relevant Blogs

EDB Tutorial: How to run a TPC-E-like benchmark easily

EDB Tutorial: How To Run a Complex Postgres Benchmark Easily - Master TPC-C in 3 Short Steps

EDB Tutorial: Achieving High Availability Using Enterprise Failover Manager

More Blogs

Why you should use Connection Pooling when setting Max_connections in Postgres

How to create a PostgreSQL database and users using psql and pgAdmin

Some numbers, oh, and pgAdmin 4 v1.6 was released.