In this tutorial, we're using database indexes to speed up queries. We'll use PostgreSQL database.
Introduction
Efficient data retrieval is essential for any web hosting provider managing large volumes of user and system information. Slow database queries can translate into sluggish website performance and diminished customer satisfaction. We’ll demonstrate how to harness PostgreSQL indexes to speed up queries dramatically.
We’ll walk through creating a test database, populating it with sample data, benchmarking query performance, and then applying indexes to optimize those queries. By the end of this guide, we’ll have a clear understanding of when and how to use indexes for maximum impact.
Prerequisites
Before starting, make sure our new Ubuntu server is ready. The following components should be installed and configured:
- A Ubuntu 24.04 installed dedicated server or KVM VPS.
- A root user or normal user with administrative privileges.
- A PostgreSQL installed
Setting Up the Test Environment
First, let’s create a dedicated PostgreSQL database and a sample table to simulate a real-world workload.
-- Connect to PostgreSQL as a superuser
psql -U postgres
-- Create a new test database
CREATE DATABASE hosting_perf_test;
-- Switch to the new database
\c hosting_perf_test
-- Create a sample table for hosting logs
CREATE TABLE access_logs (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
url TEXT NOT NULL,
status_code INT NOT NULL,
access_time TIMESTAMP NOT NULL
);
Next, we’ll insert a substantial amount of dummy data to mimic heavy usage:
-- Populate with 1 million rows
INSERT INTO access_logs (user_id, url, status_code, access_time)
SELECT
(random()*1000)::INT,
'/page/' || (random()*100)::INT,
CASE WHEN random() < 0.9 THEN 200 ELSE 404 END,
NOW() - ((random()*86400)::INT || ' seconds')::INTERVAL
FROM generate_series(1, 1000000);
This setup gives us a solid foundation for testing query performance under load.
Benchmarking Baseline Query Performance
Before adding any indexes, we should measure how long our queries take. PostgreSQL’s EXPLAIN ANALYZE
provides both the execution plan and exact timings.
-- Example query: count of 404 errors for a given user
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM access_logs
WHERE user_id = 42
AND status_code = 404;
In a fresh table without indexes, PostgreSQL performs a sequential scan, examining every row:
Seq Scan on access_logs (cost=0.00..12040.54 rows=8 width=0) (actual time=0.238..88.178 rows=34 loops=3)
Filter: ((user_id = 42) AND (status_code = 404))
Rows Removed by Filter: 333300
Planning Time: 0.188 ms
Execution Time: 126.120 ms
Note: Execution Time here reflects scanning 1 million rows each time—far too slow for production workloads.
Introducing Indexes
Indexes act like a book’s index, allowing the database to jump directly to relevant rows. For our query, a composite index on (user_id, status_code) is ideal.
-- Create a composite B-tree index
CREATE INDEX idx_user_status ON access_logs (user_id, status_code);
PostgreSQL builds the index in the background; once complete, re-running our query yields:
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM access_logs
WHERE user_id = 42
AND status_code = 404;
Aggregate (cost=6.67..6.68 rows=1 width=8) (actual time=0.130..0.131 rows=1 loops=1)
-> Index Only Scan using idx_user_status on access_logs (cost=0.42..6.42 rows=100 width=0) (actual time=0.113..0.121 rows=101 loops=1)
Index Cond: ((user_id = 42) AND (status_code = 404))
Heap Fetches: 0
Planning Time: 0.263 ms
Execution Time: 0.152 ms
The execution time drops from 126.120 ms to just 0.152 ms—over a 500× improvement. This dramatic gain illustrates the power of indexing when queries filter on indexed columns.
When to Use Composite vs. Single-Column Indexes
- Single-column indexes are ideal for queries filtering on one column.
- Composite indexes shine when queries filter on multiple columns in conjunction.
For instance, if we often query by access_time, a separate index helps:
CREATE INDEX idx_access_time ON access_logs (access_time);
But combining user_id and access_time into one composite index only benefits queries that include both filters:
CREATE INDEX idx_user_time ON access_logs (user_id, access_time);
Understanding query patterns is crucial: creating indexes without analyzing workloads leads to wasted disk space and slower writes.
Verifying Index Usage
Use EXPLAIN
(without ANALYZE) to confirm which index PostgreSQL chooses:
EXPLAIN
SELECT *
FROM access_logs
WHERE access_time >= NOW() - INTERVAL '1 hour';
A plan showing an Index Scan or Bitmap Index Scan on idx_access_time confirms effective use. If PostgreSQL sticks with a sequential scan, consider:
- Statistics outdated? Run ANALYZE access_logs;
- Low selectivity? Index may not be selective enough to justify use.
Best Practices for Index Management
- Monitor query performance regularly using pg_stat_statements.
- Maintain updated statistics with VACUUM ANALYZE.
- Avoid over-indexing—each index slows down INSERT/UPDATE/DELETE.
- Use partial indexes for highly skewed data:
CREATE INDEX idx_404_only ON access_logs (user_id)
WHERE status_code = 404;
Consider expression indexes for computed columns:
CREATE INDEX idx_lower_url ON access_logs (lower(url));
Conclusion
In this tutorial, we'ved used database indexes to speed up queries. Effectively leveraging PostgreSQL indexes is one of the most impactful ways to accelerate database queries. By analyzing query patterns, creating targeted single-column or composite indexes, and monitoring index utilization, we ensure our hosting infrastructure delivers responsive performance.
As a web hosting provider, our commitment to fast, reliable service starts with a finely tuned database layer—ensuring our clients enjoy seamless experiences even under heavy traffic loads. Implement these indexing strategies today to give your applications the speed boost they deserve.