In my last post I advocated for using LINQ to SQL for data access. Today I am going to show an example of how the greater power of LINQ compared to SQL sometimes results in terrible performance when LINQ to SQL does it best to work around the limitations of SQL.
var q = from car in Cars where car.MadeIn == "Sweden" group car by car.Brand into brands select brands;
The question looks quite simple and straight forward, but has a hidden performance problem; it will send an extra SQL query to the server for each group.
Running the query in LINQPad gives the expected result: a nested list where each item corresponds to a brand and contains an inner list with the cars of that brand.
The query and the results It looks innocent enough. However it isn’t. Trying to write the same query in SQL reveals the problem:
SELECT * FROM Cars GROUP BY Brand
Trying to run this question gives an error:
Column ‘Cars.ID’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This is a case where LINQ is more powerful than SQL. SQL can’t return nested sets like LINQ can. Still LINQ to SQL manages to produce the right output. How does LINQ to SQL translate the query? Using SQL Server Profiler (available on the Tools menu of Sql Management Studio) we can look at the generated queries. The first one looks just fine, except that it only returns the brands that are made in Sweden.
EXEC sp_executesql N'SELECT [t0].[Brand] AS [Key] FROM [Cars] AS [t0] WHERE [t0].[MadeIn] = @p0 GROUP BY [t0].[Brand]', N'@p0 nvarchar(4000)',@p0=N'Sweden'
The next query reveals the problem:
EXEC sp_executesql N'SELECT [t0].[ID], [t0].[Brand], [t0].[RegistrationNumber], [t0].[MadeIn] FROM [Cars] AS [t0] WHERE (@x1 = [t0].[Brand]) AND ([t0].[MadeIn] = @p0)', N'@p0 nvarchar(4000),@x1 nvarchar(4000)',@p0=N'Sweden',@x1=N'Saab'
Then there is one such statement for each brand returned by the first query. If there are hundreds or even thousands of groups there will be hundreds or thousands of database queries. With some latency between the application server and the database server this is a performance nightmare.
Luckily there is a simple solution to the problem:
var q1 = from car in Cars where car.MadeIn == "Sweden" select car; var q2 = from car in q1.ToList() group car by car.Brand into brands select brands;
First filter out the relevant records with one query. Then retrieve the results of that query to memory (that’s what
ToList() does). Finally do the grouping in memory, creating the nested list that SQL cannot express. Looking at the generated SQL query in SQL Server Profiler there is only one query now:
EXEC sp_executesql N'SELECT [t0].[ID], [t0].[Brand], [t0].[RegistrationNumber], [t0].[MadeIn] FROM [Cars] AS [t0] WHERE [t0].[MadeIn] = @p0',N'@p0 nvarchar(4000)',@p0=N'Sweden'
When I’m working on applications based on LINQ to SQL I often keep the SQL Profiler running, preferably on a secondary monitor. When I’m doing the last tests of a function before going on to the next task, I keep an eye on the profiler to make sure the number of queries isn’t exploding.
LINQ and LINQ to SQL are powerful tools in the .NET environment that can increase productivity a lot if used right. Using it right requires understanding of how it works under the hood. The limitations must be known and the most important workarounds must be understood. With a broad understanding of the technology, the full power of Language INtegrated Queries can be unleashed.