Always Check Generated SQL

In my last post I advocated for using LINQ to SQL for data access. Today I am going to show an example of how the greater power of LINQ compared to SQL sometimes results in terrible performance when LINQ to SQL does it best to work around the limitations of SQL.

var q = from car in Cars
        where car.MadeIn == "Sweden"
        group car by car.Brand into brands
        select brands;

The question looks quite simple and straight forward, but has a hidden performance problem; it will send an extra SQL query to the server for each group.

Running the query in LINQPad gives the expected result: a nested list where each item corresponds to a brand and contains an inner list with the cars of that brand.

The query and the results It looks innocent enough. However it isn’t. Trying to write the same query in SQL reveals the problem:

SELECT * FROM Cars
GROUP BY Brand

Trying to run this question gives an error:

Column ‘Cars.ID’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

This is a case where LINQ is more powerful than SQL. SQL can’t return nested sets like LINQ can. Still LINQ to SQL manages to produce the right output. How does LINQ to SQL translate the query? Using SQL Server Profiler (available on the Tools menu of Sql Management Studio) we can look at the generated queries. The first one looks just fine, except that it only returns the brands that are made in Sweden.

EXEC sp_executesql N'SELECT [t0].[Brand] AS [Key]
FROM [Cars] AS [t0]
WHERE [t0].[MadeIn] = @p0
GROUP BY [t0].[Brand]',
N'@p0 nvarchar(4000)',@p0=N'Sweden'

The next query reveals the problem:

EXEC sp_executesql N'SELECT [t0].[ID], [t0].[Brand],
[t0].[RegistrationNumber], [t0].[MadeIn]
FROM [Cars] AS [t0]
WHERE (@x1 = [t0].[Brand]) AND ([t0].[MadeIn] = @p0)',
N'@p0 nvarchar(4000),@x1 nvarchar(4000)',@p0=N'Sweden',@x1=N'Saab'

Then there is one such statement for each brand returned by the first query. If there are hundreds or even thousands of groups there will be hundreds or thousands of database queries. With some latency between the application server and the database server this is a performance nightmare.

Luckily there is a simple solution to the problem:

var q1 = from car in Cars
         where car.MadeIn == "Sweden"
         select car;
 
var q2 = from car in q1.ToList()
         group car by car.Brand into brands
         select brands;

First filter out the relevant records with one query. Then retrieve the results of that query to memory (that’s what ToList() does). Finally do the grouping in memory, creating the nested list that SQL cannot express. Looking at the generated SQL query in SQL Server Profiler there is only one query now:

EXEC sp_executesql N'SELECT [t0].[ID], [t0].[Brand],
[t0].[RegistrationNumber], [t0].[MadeIn]
FROM [Cars] AS [t0]
WHERE [t0].[MadeIn] = @p0',N'@p0 nvarchar(4000)',@p0=N'Sweden'

When I’m working on applications based on LINQ to SQL I often keep the SQL Profiler running, preferably on a secondary monitor. When I’m doing the last tests of a function before going on to the next task, I keep an eye on the profiler to make sure the number of queries isn’t exploding.

LINQ and LINQ to SQL are powerful tools in the .NET environment that can increase productivity a lot if used right. Using it right requires understanding of how it works under the hood. The limitations must be known and the most important workarounds must be understood. With a broad understanding of the technology, the full power of Language INtegrated Queries can be unleashed.

  • Leave a Reply

    Your name as it will be displayed on the posted comment.
    Your e-mail address will not be published. It is only used if I want to get in touch during comment moderation.
    Your name will be a link to this address.
Software Development is a Job – Coding is a Passion

I'm Anders Abel, a systems architect and developer working for Kentor in Stockholm, Sweden.

profile for Anders Abel at Stack Overflow, Q&A for professional and enthusiast programmers

The complete code for all posts is available on GitHub.

Popular Posts

Archives

Series

Powered by WordPress with the Passion for Coding theme.