Column order matters in Clustered Indexes

2025-03-23 Microsoft SQL Server Index tuning SQL

Last modified: 2025-04-19

Demo scripts below.

I recently worked on a problem for a customer who experienced some performance issues with a job on their SQL Server database.

Because they use SQL Server Standard Edition, their solution included a home-rolled version of table partitioning, where two tables were created every week:

one contained 10-second aggregate values, written in microbatches every 10 seconds
the other similarly contained 5-minute values, written from the 10 second table every 5 minutes

All of these aggregates were based on raw measurements from their various systems, one measurement per device.

This was all well and good, but as more and more devices were added to the systems, eventually the 5-minute job started to struggle.

When the tables were created, the 5-minute job took a few seconds at most, but over the course of a week, the amount of data in the table caused the job to take over 30 minutes to complete.

The tables essentially looked something like this:

CREATE TABLE Table_10_Second (
    DeviceID INT,
    Timestamp DATETIME2(3),
    Value FLOAT,
    CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);

CREATE TABLE Table_5_Minute (
    DeviceID INT,
    Timestamp DATETIME2(3),
    Value FLOAT,
    CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);

Can you spot the issue?

Let’s look at how SQL Server stores the data.

Let’s say we have 50 devices, labelled 1-50.

By the end of the week starting on 2025-03-03, the Table_10_Second table would look something like this:

DeviceID	Timestamp	Value
1	2025-03-03 00:00:00	1
1	2025-03-03 00:00:10	2
1	2025-03-03 00:00:20	3
…	…	…
1	2025-03-10 23:59:40	1
1	2025-03-10 23:59:50	2
2	2025-03-03 00:00:00	36.32432
2	2025-03-03 00:00:10	37.23490
2	2025-03-03 00:00:20	35.98943
…	…	…
2	2025-03-10 23:59:40	31.3453
2	2025-03-10 23:59:50	32.4304
3	2025-03-03 00:00:00	15.340
3	2025-03-03 00:00:10	20.855
3	2025-03-03 00:00:20	23.023
…	…	…
50	2025-03-10 23:59:30	0.4392343
50	2025-03-10 23:59:40	0.4543563
50	2025-03-10 23:59:50	0.4765345

To get the data from the last 5 minutes, SQL Server has to look through every single row and check, if Timestamp is within the last 5 minutes.

We can solve this by adding a nonclustered index on Timestamp, as this will help our 5-minute job seek to the specific rows needed:

CREATE INDEX IX_Timestamp_Value
    ON Table_10_Second (Timestamp) INCLUDE (Value);

and this would probably help out our 5-minute job.

But we can actually do better! The above index only concerns itself with the reading part - we can speed up the inserts too!

In the Table_10_Second table, during the first 20 seconds of 2025-03-03, the contents of table would evolve like this (inserted rows are highlighted):

First insert:

DeviceID	Timestamp	Value
1	2025-03-03 00:00:00	1
2	2025-03-03 00:00:00	36.32432
3	2025-03-03 00:00:00	15.340
…	…	…

After 10 seconds:

DeviceID	Timestamp	Value
1	2025-03-03 00:00:00	1
1	2025-03-03 00:00:10	2
2	2025-03-03 00:00:00	36.32432
2	2025-03-03 00:00:10	37.23490
3	2025-03-03 00:00:00	15.340
3	2025-03-03 00:00:10	20.855
…	…	…

After 20 seconds:

DeviceID	Timestamp	Value
1	2025-03-03 00:00:00	1
1	2025-03-03 00:00:10	2
1	2025-03-03 00:00:20	3
2	2025-03-03 00:00:00	36.32432
2	2025-03-03 00:00:10	37.23490
2	2025-03-03 00:00:20	35.98943
3	2025-03-03 00:00:00	15.340
3	2025-03-03 00:00:10	20.855
3	2025-03-03 00:00:20	23.023
…	…	…

Notice how all the new data also needs to be inserted by order of DeviceID, which means every time we insert new data, SQL Server needs to find the right page and put in the new data, which takes time and potentially causes page splits.

What if we just switch the order of the columns in the clustered index?

ALTER TABLE dbo.Table_10_Second
DROP CONSTRAINT PK_Table_10_Second;

ALTER TABLE Table_10_Second
    ADD CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (Timestamp, DeviceID);

CREATE INDEX IX_DeviceID
    ON dbo.Table_10_Second (DeviceID);

Now the above nonclustered index is no longer necessary, since data is already sorted by Timestamp. We add a nonclustered index on DeviceID, since the table is frequently queried for all the data of just a few devices. But now the same inserts as before will look like this:

First insert:

Timestamp	DeviceID	Value
2025-03-03 00:00:00	1	1
2025-03-03 00:00:00	2	36.32432
2025-03-03 00:00:00	3	15.340
…	…	…
2025-03-03 00:00:00	50	0.4392343

After 10 seconds:

Timestamp	DeviceID	Value
2025-03-03 00:00:00	1	1
2025-03-03 00:00:00	2	36.32432
2025-03-03 00:00:00	3	15.340
…	…	…
2025-03-03 00:00:00	50	0.4392343
2025-03-03 00:00:10	1	2
2025-03-03 00:00:10	2	37.23490
2025-03-03 00:00:10	3	20.855
…	…	…
2025-03-03 00:00:10	50	0.4543563

After 20 seconds:

Timestamp	DeviceID	Value
2025-03-03 00:00:00	1	1
2025-03-03 00:00:00	2	36.32432
2025-03-03 00:00:00	3	15.340
…	…	…
2025-03-03 00:00:00	50	0.4392343
2025-03-03 00:00:10	1	2
2025-03-03 00:00:10	2	37.23490
2025-03-03 00:00:10	3	20.855
…	…	…
2025-03-03 00:00:10	50	0.4543563
2025-03-03 00:00:20	1	3
2025-03-03 00:00:20	2	35.98943
2025-03-03 00:00:20	3	23.023
…	…	…
2025-03-03 00:00:20	50	0.4765345

What happened?

By storing the table in an order aligned with how the data is generated (i.e. the clustered index), inserting new rows now happens at the end of the clustered index, instead of arbitrarily inserting into existing pages, with a high risk of page splits when the existing data needs to be shuffled around.

After implementing this change, the 5-minute job started consistently running in under 10 seconds, throughout the week, and the customer reported that their 10-second job saw more consistent performance when inserting into the 10-second table.

Demo scripts

If you want to try to play around with the scenario yourself, I’ve included some scripts to reproduce the behaviour here:

Prepare environment

/* Generate a small number table */
CREATE TABLE dbo.Numbers (Int64 BIGINT PRIMARY KEY);

INSERT INTO dbo.Numbers (Int64)
SELECT TOP (50000) ROW_NUMBER() OVER(ORDER BY c1.object_id)
FROM sys.objects AS c1
CROSS JOIN sys.objects AS c2
CROSS JOIN sys.objects AS c3
CROSS JOIN sys.objects AS c4
CROSS JOIN sys.objects AS c5
GO

DROP TABLE IF EXISTS dbo.Table_10_Second;
GO

CREATE TABLE Table_10_Second (
    DeviceID    BIGINT,
    Timestamp   DATETIME2(3),
    Value       FLOAT,
    Padding     CHAR(972) DEFAULT (SPACE(0)) /* Ensure only 8 rows per page */,
    CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
GO

DROP TABLE IF EXISTS dbo.Table_5_Minute;
GO

CREATE TABLE Table_5_Minute (
    DeviceID INT,
    Timestamp DATETIME2(3),
    Value FLOAT,
    CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (DeviceID, Timestamp)
);
GO

CREATE OR ALTER PROCEDURE dbo.Generate_5_Minute_Values
AS 
BEGIN
    SET NOCOUNT ON

    DECLARE @time_from DATETIME2(3);
    DECLARE @time_to DATETIME2(3);

    SET @time_from = DATEADD(MINUTE, 5, (SELECT MAX(Timestamp) FROM dbo.Table_5_Minute))
    IF @time_from IS NULL
    BEGIN
        SET @time_from = (SELECT MIN(Timestamp) FROM dbo.Table_10_Second);
    END

    SET @time_to = DATEADD(MINUTE, 5, @time_from);

    INSERT INTO dbo.Table_5_Minute (DeviceID, Timestamp, Value)
    SELECT DeviceID,
          @time_from,
          SUM(Value) AS value_agg
        FROM dbo.Table_10_Second
    WHERE Timestamp >= @time_from
        AND Timestamp < @time_to
    GROUP BY DeviceID
END
GO

/* Add a table to keep track of our progress. */
CREATE TABLE dbo.Progress (MaxTimestamp DATETIME2(3));

Generate data (with SQLQueryStress)

Let’s try to recreate the situation.

The following query generates some random data, corresponding to 5 minutes worth of 10-second values for 1000 devices:

DECLARE @Date DATETIME2(3)
SET @Date = (SELECT MaxTimestamp FROM dbo.Progress);

IF @Date IS NULL
BEGIN
    SET @Date = '2025-03-17';
END

INSERT INTO dbo.Table_10_Second (DeviceID, Timestamp, Value)
SELECT d.Int64 AS DeviceID,
    /* Get the "10 second" part of the current date */
    DATEADD(SECOND, n.Int64 * 10, @Date),
    RAND(CHECKSUM(NEWID()))
FROM dbo.Numbers AS d
CROSS JOIN dbo.Numbers AS n
WHERE d.Int64 > 0
    /* Let's say we have 1000 running devices */
    AND d.Int64 <= 1000
    /* Add 5 minutes worth of rows */
    AND n.Int64 < 6 * 5;

EXEC Generate_5_Minute_Values;

UPDATE dbo.Progress 
SET MaxTimestamp = DATEADD(MINUTE, 5, @Date);
GO

We can use SQLQueryStress (originally written by Adam Machanic) to simulate a workload on our test environment.

One 24-hour period consists of 288 5-minute intervals, so if we run the above query 288 times, we can simulate 24 hours of data generation.

To keep the baseline the same, I drop and recreate the tables between each run of SQLQueryStress before adding the fixes to test:

Fix 1

CREATE INDEX IX_Timestamp_Value
    ON Table_10_Second (Timestamp) INCLUDE(Value);

Fix 2

ALTER TABLE dbo.Table_10_Second
DROP CONSTRAINT PK_Table_10_Second;

ALTER TABLE dbo.Table_10_Second
ADD CONSTRAINT PK_Table_10_Second PRIMARY KEY CLUSTERED (Timestamp, DeviceID);

ALTER TABLE dbo.Table_5_Minute
DROP CONSTRAINT PK_Table_5_Minute;

ALTER TABLE dbo.Table_5_Minute
ADD CONSTRAINT PK_Table_5_Minute PRIMARY KEY CLUSTERED (Timestamp, DeviceID);

/* Add index to help reads */
CREATE INDEX IX_DeviceID ON dbo.Table_10_Second (DeviceID);
CREATE INDEX IX_DeviceID ON dbo.Table_5_Minute (DeviceID);

Results

These are the results from our three tests:

Metric	Base clustered index	Nonclustered index	Swap clustered index
Elapsed Time:	00:04:29.2804	00:01:38.7337	00:01:28.9010
Number of Iterations:	288	288	288
Number of Threads:	1	1	1
Delay Between Queries (ms):	50	50	50
CPU Seconds/Iteration (Avg):	1,5500	0,3329	0,3022
Actual Seconds/Iteration (Avg):	1,5818	0,3560	0,3190
Client Seconds/Iteration (Avg):	0,8786	0,2862	0,2519
Logical Reads/Iteration (Avg):	676.567,6563	149.431,8194	124.270,4236

Total runtime is never a great metric, but even so, it is clear from the table that simply adding the much-needed nonclustered index helps immensely.

I much prefer looking at something like logical reads, since this stays consistent for the workload. Here we can also really see the benefits of organizing the clustered index to align with data generation.