Always Check Generated SQL

OR-Mappers are great for making data access easier, but they are not an excuse for learning SQL. As an application developer, I’m responsible for all the SQL queries issued by the application. If I use an ORM, it’s my responsibility to check the generated SQL.

I’ve already written another post named Always Check Generated SQL back in November last year, but this is not a duplicate. Back then I examined LINQ to SQL. In this post I test the same behaviour in Entity Framework (the code won’t be exactly the same, because my EF and LINQ to SQL test databases are not identical).

The behaviour that’s up to test is how LINQ’s group by clause is translated into SQL. The problem for the SQL generator is that LINQ is more powerful than SQL. When doing a group by in LINQ, nested collections are produced which SQL can’t handle.

from car in Cars
group car by car.Color into colors
select colors

The query produces nested collections (graphics generated in LINQPad). When faced with that query, LINQ to SQL creates a horrible SQL translation. First the colors are fetched, then the set of cars is fetched in a separate query for each car.
Entity Framework clearly has a better query generator. It creates one single query.

[Project2].[C1] AS [C1], 
[Project2].[Color] AS [Color], 
[Project2].[C2] AS [C2], 
[Project2].[CarId] AS [CarId], 
[Project2].[BrandId] AS [BrandId], 
[Project2].[RegistrationNumber] AS [RegistrationNumber], 
[Project2].[TopSpeed] AS [TopSpeed], 
[Project2].[Color1] AS [Color1]
	[Distinct1].[Color] AS [Color], 
	1 AS [C1], 
	[Extent2].[CarId] AS [CarId], 
	[Extent2].[BrandId] AS [BrandId], 
	[Extent2].[RegistrationNumber] AS [RegistrationNumber], 
	[Extent2].[TopSpeed] AS [TopSpeed], 
	[Extent2].[Color] AS [Color1], 
		[Extent1].[Color] AS [Color]
		FROM [dbo].[Cars] AS [Extent1] ) AS [Distinct1]
	LEFT OUTER JOIN [dbo].[Cars] AS [Extent2] ON [Distinct1].[Color] = [Extent2].[Color]
)  AS [Project2]
ORDER BY [Project2].[Color] ASC, [Project2].[C2] ASC

Running that query creates a flat result set, that EF obviously internally groups into the final result. That’s a clear win for Entity Framework over LINQ to SQL.

When it’s too much for Entity Framework

It is of course possible to create LINQ statements that become too much for Entity Framework too. While doing some test queries, I noticed that EF fails to eager loaded related entities if the results are grouped. Let’s first look at a working query, that utilizes eager load.

var cars = 
(from car in Cars.Include(c => c.Brand)
select car
var brands = cars.Select(car => car.Brand.Name).ToList();

First a query fetches cars and requests that the related brand information should be brought in with the same question. The ToList() call forces the question to be executed against the database. In the next statement, when the names of the cars’ brands are fetched, the required information is already prefetched. EF handles this situation excellently, hitting the database only once.

By introducing some more complexity, it is possible to bring EF to a failure too.

var carsByColor  = (from car in Cars.Include(c => c.Brand)
group car by car.Color into colors
select colors).ToList();
var brands = carsByColor.SelectMany(color => color.Select(car => car.Brand.Name)).ToList();

This generates three queries. The first one fetches the cars and then the two brands present in the result are fetched separately in one query each. To me, this was unexpected. EF handles the nested collections created by LINQ’s group by and it handles eager loading, but when the two cases are combined, it becomes too much.

This once again shows that while an ORM simplifies the data access, it is still necessary to always check generated SQL.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.