Column order matters in Clustered Indexes
Microsoft SQL Server Index tuning SQLDemo scripts below.
I recently worked on a problem for a customer who experienced some performance issues with a job on their SQL Server database.
Because they use SQL Server Standard Edition, their solution included a home-rolled version of table partitioning, where two tables were created every week:
- one contained 10-second aggregate values, written in microbatches every 10 seconds
- the other similarly contained 5-minute values, written from the 10 second table every 5 minutes
All of these aggregates were based on raw measurements from their various systems, one measurement per device.
This was all well and good, but as more and more devices were added to the systems, eventually the 5-minute job started to struggle.
When the tables were created, the 5-minute job took a few seconds at most, but over the course of a week, the amount of data in the table caused the job to take over 30 minutes to complete.
The tables essentially looked something like this:
CREATE TABLE Table_10_Second (
DeviceID INT,
Timestamp DATETIME2(3),
Value FLOAT,
CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
CREATE TABLE Table_5_Minute (
DeviceID INT,
Timestamp DATETIME2(3),
Value FLOAT,
CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
Can you spot the issue?
Let’s look at how SQL Server stores the data.
Let’s say we have 50 devices, labelled 1-50.
By the end of the week starting on 2025-03-03, the Table_10_Second table would look something like this:
| DeviceID | Timestamp | Value |
|---|---|---|
| 1 | 2025-03-03 00:00:00 | 1 |
| 1 | 2025-03-03 00:00:10 | 2 |
| 1 | 2025-03-03 00:00:20 | 3 |
| … | … | … |
| 1 | 2025-03-10 23:59:40 | 1 |
| 1 | 2025-03-10 23:59:50 | 2 |
| 2 | 2025-03-03 00:00:00 | 36.32432 |
| 2 | 2025-03-03 00:00:10 | 37.23490 |
| 2 | 2025-03-03 00:00:20 | 35.98943 |
| … | … | … |
| 2 | 2025-03-10 23:59:40 | 31.3453 |
| 2 | 2025-03-10 23:59:50 | 32.4304 |
| 3 | 2025-03-03 00:00:00 | 15.340 |
| 3 | 2025-03-03 00:00:10 | 20.855 |
| 3 | 2025-03-03 00:00:20 | 23.023 |
| … | … | … |
| 50 | 2025-03-10 23:59:30 | 0.4392343 |
| 50 | 2025-03-10 23:59:40 | 0.4543563 |
| 50 | 2025-03-10 23:59:50 | 0.4765345 |
To get the data from the last 5 minutes, SQL Server has to look through every single row and check, if Timestamp is within the last 5 minutes.
We can solve this by adding a nonclustered index on Timestamp, as this will help our 5-minute job seek to the specific rows needed:
CREATE INDEX IX_Timestamp_Value
ON Table_10_Second (Timestamp) INCLUDE (Value);
and this would probably help out our 5-minute job.
But we can actually do better! The above index only concerns itself with the reading part - we can speed up the inserts too!
In the Table_10_Second table, during the first 20 seconds of 2025-03-03, the contents of table would evolve like this (inserted rows are highlighted):
First insert:
| DeviceID | Timestamp | Value |
|---|---|---|
| 1 | 2025-03-03 00:00:00 | 1 |
| 2 | 2025-03-03 00:00:00 | 36.32432 |
| 3 | 2025-03-03 00:00:00 | 15.340 |
| … | … | … |
After 10 seconds:
| DeviceID | Timestamp | Value |
|---|---|---|
| 1 | 2025-03-03 00:00:00 | 1 |
| 1 | 2025-03-03 00:00:10 | 2 |
| 2 | 2025-03-03 00:00:00 | 36.32432 |
| 2 | 2025-03-03 00:00:10 | 37.23490 |
| 3 | 2025-03-03 00:00:00 | 15.340 |
| 3 | 2025-03-03 00:00:10 | 20.855 |
| … | … | … |
After 20 seconds:
| DeviceID | Timestamp | Value |
|---|---|---|
| 1 | 2025-03-03 00:00:00 | 1 |
| 1 | 2025-03-03 00:00:10 | 2 |
| 1 | 2025-03-03 00:00:20 | 3 |
| 2 | 2025-03-03 00:00:00 | 36.32432 |
| 2 | 2025-03-03 00:00:10 | 37.23490 |
| 2 | 2025-03-03 00:00:20 | 35.98943 |
| 3 | 2025-03-03 00:00:00 | 15.340 |
| 3 | 2025-03-03 00:00:10 | 20.855 |
| 3 | 2025-03-03 00:00:20 | 23.023 |
| … | … | … |
Notice how all the new data also needs to be inserted by order of DeviceID, which means every time we insert new data, SQL Server needs to find the right page and put in the new data, which takes time and potentially causes page splits.
What if we just switch the order of the columns in the clustered index?
ALTER TABLE dbo.Table_10_Second
DROP CONSTRAINT PK_Table_10_Second;
ALTER TABLE Table_10_Second
ADD CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (Timestamp, DeviceID);
CREATE INDEX IX_DeviceID
ON dbo.Table_10_Second (DeviceID);
Now the above nonclustered index is no longer necessary, since data is already sorted by Timestamp. We add a nonclustered index on DeviceID, since the table is frequently queried for all the data of just a few devices. But now the same inserts as before will look like this:
First insert:
| Timestamp | DeviceID | Value |
|---|---|---|
| 2025-03-03 00:00:00 | 1 | 1 |
| 2025-03-03 00:00:00 | 2 | 36.32432 |
| 2025-03-03 00:00:00 | 3 | 15.340 |
| … | … | … |
| 2025-03-03 00:00:00 | 50 | 0.4392343 |
After 10 seconds:
| Timestamp | DeviceID | Value |
|---|---|---|
| 2025-03-03 00:00:00 | 1 | 1 |
| 2025-03-03 00:00:00 | 2 | 36.32432 |
| 2025-03-03 00:00:00 | 3 | 15.340 |
| … | … | … |
| 2025-03-03 00:00:00 | 50 | 0.4392343 |
| 2025-03-03 00:00:10 | 1 | 2 |
| 2025-03-03 00:00:10 | 2 | 37.23490 |
| 2025-03-03 00:00:10 | 3 | 20.855 |
| … | … | … |
| 2025-03-03 00:00:10 | 50 | 0.4543563 |
After 20 seconds:
| Timestamp | DeviceID | Value |
|---|---|---|
| 2025-03-03 00:00:00 | 1 | 1 |
| 2025-03-03 00:00:00 | 2 | 36.32432 |
| 2025-03-03 00:00:00 | 3 | 15.340 |
| … | … | … |
| 2025-03-03 00:00:00 | 50 | 0.4392343 |
| 2025-03-03 00:00:10 | 1 | 2 |
| 2025-03-03 00:00:10 | 2 | 37.23490 |
| 2025-03-03 00:00:10 | 3 | 20.855 |
| … | … | … |
| 2025-03-03 00:00:10 | 50 | 0.4543563 |
| 2025-03-03 00:00:20 | 1 | 3 |
| 2025-03-03 00:00:20 | 2 | 35.98943 |
| 2025-03-03 00:00:20 | 3 | 23.023 |
| … | … | … |
| 2025-03-03 00:00:20 | 50 | 0.4765345 |
What happened?
By storing the table in an order aligned with how the data is generated (i.e. the clustered index), inserting new rows now happens at the end of the clustered index, instead of arbitrarily inserting into existing pages, with a high risk of page splits when the existing data needs to be shuffled around.
After implementing this change, the 5-minute job started consistently running in under 10 seconds, throughout the week, and the customer reported that their 10-second job saw more consistent performance when inserting into the 10-second table.
Demo scripts
If you want to try to play around with the scenario yourself, I’ve included some scripts to reproduce the behaviour here:
Prepare environment
/* Generate a small number table */
CREATE TABLE dbo.Numbers (Int64 BIGINT PRIMARY KEY);
INSERT INTO dbo.Numbers (Int64)
SELECT TOP (50000) ROW_NUMBER() OVER(ORDER BY c1.object_id)
FROM sys.objects AS c1
CROSS JOIN sys.objects AS c2
CROSS JOIN sys.objects AS c3
CROSS JOIN sys.objects AS c4
CROSS JOIN sys.objects AS c5
GO
DROP TABLE IF EXISTS dbo.Table_10_Second;
GO
CREATE TABLE Table_10_Second (
DeviceID BIGINT,
Timestamp DATETIME2(3),
Value FLOAT,
Padding CHAR(972) DEFAULT (SPACE(0)) /* Ensure only 8 rows per page */,
CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
GO
DROP TABLE IF EXISTS dbo.Table_5_Minute;
GO
CREATE TABLE Table_5_Minute (
DeviceID INT,
Timestamp DATETIME2(3),
Value FLOAT,
CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
GO
CREATE OR ALTER PROCEDURE dbo.Generate_5_Minute_Values
AS
BEGIN
SET NOCOUNT ON
DECLARE @time_from DATETIME2(3);
DECLARE @time_to DATETIME2(3);
SET @time_from = DATEADD(MINUTE, 5, (SELECT MAX(Timestamp) FROM dbo.Table_5_Minute))
IF @time_from IS NULL
BEGIN
SET @time_from = (SELECT MIN(Timestamp) FROM dbo.Table_10_Second);
END
SET @time_to = DATEADD(MINUTE, 5, @time_from);
INSERT INTO dbo.Table_5_Minute (DeviceID, Timestamp, Value)
SELECT DeviceID,
@time_from,
SUM(Value) AS value_agg
FROM dbo.Table_10_Second
WHERE Timestamp >= @time_from
AND Timestamp < @time_to
GROUP BY DeviceID
END
GO
/* Add a table to keep track of our progress. */
CREATE TABLE dbo.Progress (MaxTimestamp DATETIME2(3));
Generate data (with SQLQueryStress)
Let’s try to recreate the situation.
The following query generates some random data, corresponding to 5 minutes worth of 10-second values for 1000 devices:
DECLARE @Date DATETIME2(3)
SET @Date = (SELECT MaxTimestamp FROM dbo.Progress);
IF @Date IS NULL
BEGIN
SET @Date = '2025-03-17';
END
INSERT INTO dbo.Table_10_Second (DeviceID, Timestamp, Value)
SELECT d.Int64 AS DeviceID,
/* Get the "10 second" part of the current date */
DATEADD(SECOND, n.Int64 * 10, @Date),
RAND(CHECKSUM(NEWID()))
FROM dbo.Numbers AS d
CROSS JOIN dbo.Numbers AS n
WHERE d.Int64 > 0
/* Let's say we have 1000 running devices */
AND d.Int64 <= 1000
/* Add 5 minutes worth of rows */
AND n.Int64 < 6 * 5;
EXEC Generate_5_Minute_Values;
UPDATE dbo.Progress
SET MaxTimestamp = DATEADD(MINUTE, 5, @Date);
GO
We can use SQLQueryStress (originally written by Adam Machanic) to simulate a workload on our test environment.
One 24-hour period consists of 288 5-minute intervals, so if we run the above query 288 times, we can simulate 24 hours of data generation.
To keep the baseline the same, I drop and recreate the tables between each run of SQLQueryStress before adding the fixes to test:
Fix 1
CREATE INDEX IX_Timestamp_Value
ON Table_10_Second (Timestamp) INCLUDE(Value);
Fix 2
ALTER TABLE dbo.Table_10_Second
DROP CONSTRAINT PK_Table_10_Second;
ALTER TABLE dbo.Table_10_Second
ADD CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (Timestamp, DeviceID);
ALTER TABLE dbo.Table_5_Minute
DROP CONSTRAINT PK_Table_5_Minute;
ALTER TABLE dbo.Table_5_Minute
ADD CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (Timestamp, DeviceID);
/* Add index to help reads */
CREATE INDEX IX_DeviceID ON dbo.Table_10_Second (DeviceID);
CREATE INDEX IX_DeviceID ON dbo.Table_5_Minute (DeviceID);
Results
These are the results from our three tests:
| Metric | Base clustered index | Nonclustered index | Swap clustered index |
|---|---|---|---|
| Elapsed Time: | 00:04:29.2804 | 00:01:38.7337 | 00:01:28.9010 |
| Number of Iterations: | 288 | 288 | 288 |
| Number of Threads: | 1 | 1 | 1 |
| Delay Between Queries (ms): | 50 | 50 | 50 |
| CPU Seconds/Iteration (Avg): | 1,5500 | 0,3329 | 0,3022 |
| Actual Seconds/Iteration (Avg): | 1,5818 | 0,3560 | 0,3190 |
| Client Seconds/Iteration (Avg): | 0,8786 | 0,2862 | 0,2519 |
| Logical Reads/Iteration (Avg): | 676.567,6563 | 149.431,8194 | 124.270,4236 |
Total runtime is never a great metric, but even so, it is clear from the table that simply adding the much-needed nonclustered index helps immensely.
I much prefer looking at something like logical reads, since this stays consistent for the workload. Here we can also really see the benefits of organizing the clustered index to align with data generation.