Idempotent DB Update Scripts

An idempotent function gives the same result even if it is applied several times. That is exactly how a database update script should behave. It shouldn’t matter if it is run on or multiple times. The result should be the same.

A database update script should be made to first check the state of the database and then apply the changes needed. If the script is done this way, several operations can be combined into one script that works on several databases despite the databases being at different (possibly unknown) state to start with.

For the database schema itself I usually use Visual Studio 2010 database projects that handles updates automatically (in VS2012 the functionality has been changed significantly). Even with the schema updates handled automatically, there are always things that need manual handling. One common case is lookup tables that need initialization.

Lookup Table Init Script

I use a combination of a temp table and a MERGE clause to init lookup tables.

CREATE TABLE #Colours
(
  ColourId  INT NOT NULL,
  Name      NVARCHAR(10) NOT NULL
)
 
INSERT #Colours VALUES 
(1, N'Red'),
(2, N'Green'),
(3, N'Blue')
 
MERGE Colours dst
USING #Colours src
ON (src.ColourId = dst.ColourId)
WHEN MATCHED THEN
UPDATE SET dst.ColourId = src.ColourId
WHEN NOT MATCHED THEN
INSERT VALUES (src.ColourId, src.Name)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
 
DROP TABLE #Colours

I think that the temp table approach is great because it gives a clear overview in the script of what the final values will be. It also works regardless of what the current values are. Sometimes it is relevant to keep old values, which can be done by removing the last two lines of the MERGE clause. It is also possible to flag records as inactive instead of deleting them.

MERGE...
...
WHEN NOT MATCHED BY SOURCE THEN
SET dst.Active = 0;

Checking Current State

An idempotent script has to be able to check the current state and adopt its behaviour. The lookup table init script uses the MERGE clause for that, checking the actual values. In most cases it is possible to check the current state by inspecting the values of the table or through the sys meta data views.

If that’s not possible, a separate table can be used to log the scripts run. This method has the advantage of an easy way to check what scripts have been run. The disadvantage is that it violates the DRY Principle by keeping a separate log, which can get out of sync with the actual database schema. What happens when a script is partially run and then fails before writing the log entry? What will happen the next time the script is run?

This is where true idempotent script shines. Whenever there’s a doubt of the current state of the database the entire script can be run again, bringing the database to a known state.

  • Leave a Reply

    Your name as it will be displayed on the posted comment.
    Your e-mail address will not be published. It is only used if I want to get in touch during comment moderation.
    Your name will be a link to this address.
Software Development is a Job – Coding is a Passion

I'm Anders Abel, a systems architect and developer working for Kentor in Stockholm, Sweden.

profile for Anders Abel at Stack Overflow, Q&A for professional and enthusiast programmers

The complete code for all posts is available on GitHub.

Popular Posts

Archives

Series