=========================
== Morten Ankerstjerne ==
=========================
Swords and SQL

Repeat last known value

ETL SQL Window functions

Demo scripts below.

Disclaimer: This technique is by no means something I have come up with, I’ve just needed it enough times that I figured I might as well put it into my own words, for future reference.

Background

Suppose you have a collection of time series data, recording some status or other measurement at a point in time.

In an OLTP system we probably don’t want to repeat the same value every time we take a measurement, or we may only record changes made to a ledger when they occur.

Read more...

Stop auto-shrinking your staging database

Data Warehouse ETL Microsoft SQL Server

Background

If you’ve worked with data warehouse development on Microsoft SQL Server, you have probably had a run-in with your SAN-admin at some point, complaining about the size of your databases, and asking if you can do some cleanup to free up disk space.

What often ends up happening is, you take look at the database files and realize there’s a lot of free space that could be released, especially if you have a database dedicated to staging transformed data, before loading it into data marts.

Read more...

Ambiguity in daylight saving time

Coordinated Universal Time Dates Datetime Daylight saving time DST Summer time SQL UTC

Demo scripts below.

Background

Most data professionals will have to deal with dates at one point or another. Most often, we don’t have the luxury of choosing the data types in the source systems we work with, so some interesting challenges may come up.

One such challenge is daylight saving time (or summer time, in Europe), when people turn their clocks back or forward an hour to have more hours of sunlight during our daily life (this is not universal, but is observed throughout most of Europe and North America, among other places).

Read more...

Column order matters in Clustered Indexes

Microsoft SQL Server Index tuning SQL

Demo scripts below.

I recently worked on a problem for a customer who experienced some performance issues with a job on their SQL Server database.

Because they use SQL Server Standard Edition, their solution included a home-rolled version of table partitioning, where two tables were created every week:

  • one contained 10-second aggregate values, written in microbatches every 10 seconds
  • the other similarly contained 5-minute values, written from the 10 second table every 5 minutes

All of these aggregates were based on raw measurements from their various systems, one measurement per device.

Read more...
1 of 1