An idempotent function gives the same result even if it is applied several times. That is exactly how a database update script should behave. It shouldn’t matter if it is run on or multiple times. The result should be the same.
A database update script should be made to first check the state of the database and then apply the changes needed. If the script is done this way, several operations can be combined into one script that works on several databases despite the databases being at different (possibly unknown) state to start with.
For the database schema itself I usually use Visual Studio 2010 database projects that handles updates automatically (in VS2012 the functionality has been changed significantly). Even with the schema updates handled automatically, there are always things that need manual handling. One common case is lookup tables that need initialization.
OR-Mappers are great for making data access easier, but they are not an excuse for learning SQL. As an application developer, I’m responsible for all the SQL queries issued by the application. If I use an ORM, it’s my responsibility to check the generated SQL.
I’ve already written another post named Always Check Generated SQL back in November last year, but this is not a duplicate. Back then I examined LINQ to SQL. In this post I test the same behaviour in Entity Framework (the code won’t be exactly the same, because my EF and LINQ to SQL test databases are not identical).
The behaviour that’s up to test is how LINQ’s group by
clause is translated into SQL. The problem for the SQL generator is that LINQ is more powerful than SQL. When doing a group by
in LINQ, nested collections are produced which SQL can’t handle.
from car in Cars
group car by car.Color into colors
select colors |
from car in Cars
group car by car.Color into colors
select colors
The query produces nested collections (graphics generated in LINQPad). When faced with that query, LINQ to SQL creates a horrible SQL translation. First the colors are fetched, then the set of cars is fetched in a separate query for each car.
Entity Framework clearly has a better query generator. It creates one single query.
In the Prevent EF Migrations from Creating or Changing the Database post I showed how to prevent the application from automatically creating or updating the database. Instead I want the installation program to do that. With a Web Setup Project for the installation an MSI Custom Action is needed.
The actual work of updating the database is done by the migrate.exe
tool. To make the MSI package run it properly turned out to be a bit of a challenge. I first included migrate.exe
in the installation package to have it deployed to the bin
directory together with the assemblies of the system. There is support for running an exe file as a custom action in the web setup projects. Ufortunately I couldn’t get migrate.exe
to work unless the working directory was set to the bin
directory. The working directory for custom actions is c:\windows\system32
by default. To handle that, a small vb-script was used.
When designing a database, I have a standard of always creating a clustered primary key in each table of type INT IDENTITY(1,1) NOT NULL
.
CREATE TABLE Cars(
ID INT IDENTITY(1,1) NOT NULL
CONSTRAINT PK_Cars PRIMARY KEY CLUSTERED,
Brand NVARCHAR(20) NOT NULL,
RegistrationNumber NVARCHAR(10) NOT NULL,
MadeIn NVARCHAR(20) NOT NULL
) |
CREATE TABLE Cars(
ID INT IDENTITY(1,1) NOT NULL
CONSTRAINT PK_Cars PRIMARY KEY CLUSTERED,
Brand NVARCHAR(20) NOT NULL,
RegistrationNumber NVARCHAR(10) NOT NULL,
MadeIn NVARCHAR(20) NOT NULL
)
It is simple to create and for 90% of all tables this will give good performance and a small storage footprint. Those 9% of tables that are link tables for many-to-many relationships get a composite primary key, made up of the primary keys from the two related tables. Two simple rules handles 99% of the cases for primary keys. In this post I will go through some considerations for primary keys and show why an INT IDENTITY(1,1) NOT NULL
field almost always is a good choice for a clustered primary key. The examples and technical details in this article are written for MS SQL Server, but the concepts should apply to other database engines as well.
When maintaining a system that is in production there are often issues that need to be solved by running custom SQL scripts directly in the production database. There can be different reasons for this:
- Updating a lookup table to add a new option to dropdown boxes, where there is no user interface for handling the table.
- Adjusting data in a way not permitted by the normal work flow.
- Reverting an incorrect operation performed by a user.
Whenever it is time to run SQL directly in the live database I tend to become nervous. One single error and a lot of data could be lost or made incorrect. Usually I make such an update in three steps:
- Find the data that should be changed.
- Create the update statement and validate it
- Run the complete statement.