Don’t use Linq’s Join. Navigate!

One of the greatest benefits of LINQ to SQL and LINQ to Entities is navigation properties that allows queries across several tables, without the need to use explicit joins. Unfortunately LINQ queries are often written as a direct translation of a SQL query, without taking advantage of the richer features offered by LINQ to SQL and LINQ to Entities.

It is not uncommon to see code doing a join manually.

from p in ctx.Persons
join c in ctx.Cities
on p.BornIn equals c.CityID
select new
{
    p.FirstName,
    c.Name
};

I think that using navigation properties makes the code much easier to read.

from p in ctx.Persons
select new
{
    p.FirstName,
    p.BornInCity.Name
};

Explicit Joins Taking Over

The first result for the google search for “linq-to-sql join” shows how to do several types of joins, but never mentions navigation properties.

On Stack Overflow there are plenty of questions about joins. The sad thing is that most of them are answered with help on doing explicit joins. Navigation properties are hardly ever mentioned.

A More Convincing Example

If there still is anyone in doubt of the benefits of navigation properties I’ve created another example.

from p in ctx.Persons
where p.ID == personId
join bornIn in ctx.Cities
on p.BornIn equals bornIn.CityID
join livesIn in ctx.Cities
on p.LivesIn equals livesIn.CityID
join s in ctx.Sexes
on p.SexID equals s.ID
select new PersonInfo
{
    Name = p.FirstName + " " + p.LastName,
    BornIn = bornIn.Name,
    LivesIn = livesIn.Name,
    Gender = s.Name,
    CarsOwnedCount = ctx.Cars.Where(c => c.OwnerID == p.ID).Count()
}

Especially note the double join against the City table. I very much prefer the query with navigation properties.

from p in ctx.Persons
where p.ID == personId
select new PersonInfo
{
    Name = p.FirstName + " " + p.LastName,
    BornIn = p.BornInCity.Name,
    LivesIn = p.LivesInCity.Name,
    Gender = p.Sex.Name,
    CarsOwnedCount = p.Cars.Count(),
}

It is worth noting that both these queries generate the same SQL when executed (to be honest, the left and right side of three comparisons are swapped, so the text is actually different). There is no performance impact of using navigation properties.

Navigation Properties and Foreign Keys

As long as there are proper foreign key constraints in the database, the navigation properties will be created automatically. It is also possible to manually add them in the ORM designer. As with all LINQ to SQL usage I think that it is best to focus on getting the database right and have the code exactly reflect the database structure. With the relations properly specified as foreign keys the code can safely make assumptions about referential integrity between the tables.

Please Promote Navigation Properties

Please help me promote navigation properties and discourage use of explicit joins unless really needed (yes, there are cases where navigation properties are not suitable, but they are a small minority). If you are on Stack Overflow answering questions, please point out whenever navigation properties can be used instead of just providing the join.

41 comments

  1. In your first example, you haven’t been consistent in the properties BornIn and BornInCity. Is this deliberate? I would assume BornIn is an integer (unless you’ve overridden equality with an int), and BornInCity is a city object, in which case traversal wouldn’t be an option. I do agree though that traversal should be used as a preference, and use joins in more complicated scenarios.

    1. Both examples were run against the same Person table, which (simplified by removing INotifyPropertyChanged code etc.) looks something like this:

      [Table(Name="dbo.Persons")]
      public class Person
      {
       
        // Other fields goes here...
       
        [Column]
        public int BornIn { get; set; }
       
        [Association(Name="City_Person", ThisKey="BornIn", 
          OtherKey="CityID, IsForeignKey=true)]
        public City BornInCity { get; set; }  
      }

      As you’ve guessed BornIn is an int – it’s mapped against the actual foreign key field in the database. BornInCity is the navigation property built on top of the foreign key. It is a City object reference. Using the BornInCity navigation property makes LINQ to SQL take care of creating the join automatically.

      I don’t really understand what you mean with not being consistent but I hope the explanation above will help. If not, please try to describe a bit more detailed what you think is inconsistent.

      1. Thanks for clearing that up, I did not know that the Entity Framework creates both properties for you. Therefore it seemed like you were solving two separate problems (data linked by integer, and linked by object), rather than comparing two options to the same problem.

      2. Hi Anders, great article, very useful.
        I am using LINQ to entities for building a WCF RIA Service for a VS LightSwitch App, wich uses Entity Framework. Refactoring the lamda expression after reading your post, the navigation properties are not recognised by default. Do I have to create them explicitly, as in the above example for BornIn? If so, can you provide where I can learn more about [Association]?

        Thank you. I will definitely help you spread the word.

      3. I haven’t used WCF RIA Servies myself so I don’t know the details. From this Stack Overflow post it looks like you have to add some extra attributes to expose navigation properties over the RIA Service.

        I you need more assistance, I would suggest posting a question on Stack Overflow to get help from someone that knows RIA Services.

  2. Will this approach work in the same way for all eager, lazy and explicit loading? I have a sneaky feeling that it works only with lazy – and most web applications use POCO entities with eager loading.

    1. In the specific examples in this post, I’m only selecting primitive types (ints and strings) directly into an specific type (anonymous in the first example, PersonInfo in the second) so there is no related entities that could possibly be lazy loaded. All data is fetched at once.

      In a more general example, if we would first bring in complete Person objects and then use the BornIn navigation property to get the city object the city would be lazy loaded by default, but both LINQ to SQL and LINQ to Entities supports eager loading. In LINQ to SQL it’s done through the LoadOptions property on the data context. In EF it’s done by a call to Include() in the query itself.

  3. Here’s a real-life example that is currently in use on my website (njtheater.com):

    var productions = from prod in db.Productions
            where prod.StartDate  DateTime.Now 
            orderby prod.Play.Title 
            select new  
                   { 
                       Title = prod.Play.Title, 
                       Troupe = prod.Troupe.Name, 
                       Venue = prod.Venue.Name, 
                       City = prod.Venue.City, 
                       StartDate = prod.StartDate , 
                       EndDate = prod.EndDate
                   };

    That requires joining four table, all handled automatically by LINQ. I wrote about it a few years ago on my blog.

  4. You should also mention that the “Navigation” only is available if the tables are related in the data model and that you only can navigate up so to speak… i.e. they are actually EntitySet’s of the parent.

    If you want to “navigate” down, then you would need something like:
    CarsOwnedCount = ctx.Cars.Count() , as you wrote.

    But this creates an overhead if you don’t use the DataLoadOptions :)

    1. You’re right in that following a foreign key “backwards” will give many results. However, there will be no overhead if the Count() is called within the query. If I would have kept the entire list of cars the SQL performance would be horrible – as described in Always Check Generated SQL.

      Here’s the SQL generated for the last query:

      SELECT [t5].[VALUE] AS [Name], [t5].[Name] AS [BornIn], [t5].[Name2] AS [LivesIn], [t5].[Name3] AS [Gender], [t5].[value2] AS [CarsOwnedCount]
      FROM (
          SELECT ([t0].[FirstName] + @p0) + [t0].[LastName] AS [VALUE], [t1].[Name], [t2].[Name] AS [Name2], [t3].[Name] AS [Name3], (
              SELECT COUNT(*)
              FROM [dbo].[Cars] AS [t4]
              WHERE [t4].[OwnerID] = [t0].[ID]
              ) AS [value2], [t0].[ID]
          FROM [dbo].[Persons] AS [t0]
          INNER JOIN [dbo].[City] AS [t1] ON [t1].[CityID] = [t0].[BornIn]
          INNER JOIN [dbo].[City] AS [t2] ON [t2].[CityID] = [t0].[LivesIn]
          INNER JOIN [dbo].[Sexes] AS [t3] ON [t3].[ID] = [t0].[SexID]
          ) AS [t5]
      WHERE [t5].[ID] = @p1

      As you can see, it actually calculates the count directly in the query. There will be no lazy loading required in this case.

  5. This doesn’t work for tables where the foreign key may be null, any suggestions?

    1. I haven’t had any problems with nullable foreign keys, navigation properties work for them as well. There must be something different to how you use the navigation properties compared to how I do. To be able to help, I would need some example code together with some info on where it fails. For that kind of questions, Stack Overflow is better than comments here, so please post a question there with some example code.

      1. Thanks for the reply, Its sorted now The model was not updated from the db correctly.

  6. Thanks! This makes joins SO MUCH easier… just the thing I was looking for..

    Now I’m worried its TOO easy, I may forget the sql joins.

    I need to see the glass as half full :)

  7. I would like not to use explicit joins wherever it is possible, but I have problem with one to many relationships. Like in your example I would like to get Person and its car model

    from p in ctx.Persons
    join c in ctx.Cars on p.ID equals c.OwnerID 
    where p.ID == personId && c.Brand == brandName
    select new PersonInfo
    {
        Name = p.FirstName + " " + p.LastName,
        CarModel = c.Model
    }

    is it possible to write this query without this horrible join?

    1. Yes it is, but with LINQ it’s important to start with the right table, which isn’t the case in plain SQL. If you want a resulting object for each car in the database (which is what you will get from your query), then you have to start with the car:

      from c in ctx.Cars
      where c.PersonId == personId && c.Brand == brandName
      select new PersonInfo
      {
        Name = c.Person.FirstName + " " + c.Person.LastName,
        CarModel = c.Model
      }
  8. This is bad advice. No real world application has lazy loading enabled or uses eager loading if they care about performance and one of those is required in order to use navigation properties as you’re suggesting. Even the smallest of applications I create care about performance no matter how small the dataset. Loading potential thousands of related entities just for the ease of using navigation properties is bad advice and makes for lazy developers not to mention horrible performance.

    1. I think you miss the point. In the example in the post, The SQL is identical.

      It is worth noting that both these queries generate the same SQL when executed (to be honest, the left and right side of three comparisons are swapped, so the text is actually different). There is no performance impact of using navigation properties.

      There is no difference in performance. The performance characteristics is determined by the select statement/method. As long as the select requests exactly the data needed (no more, no less) it will be executed as one efficient query against the database.

      I’d suggest that you use a profiler to examine the actual SQL generated if you are in doubt. See my posts about Linq-to-SQL and Entity Framework for details.

    2. TheDude, you’re absolutely right, this causes terrible performance problems. I see code written by novices being plagued by this, performance issues due to unnecessary and convoluted sql generated by orms. I suggest everyone to craft their queries with care and use as few metafeatures as possible. ORMs generate SQL and that is terrible, case closed. There is a small subset for which to use navigation properties but generally avoid them. They may look magic and cool but think twice. I suggest novices to learn from someone with experience not toolheads. And learn SQL ffs!!

      1. ORM can generate as effective queries as manual SQL. You can use projections, effective joins, and so on. I know some developers don’t check what’s get loaded/generated (what select is there), this is the main reason of failure.

        Mapping entities via ORM leads then to code that has types checked. So IQueryable is something worth – especially when changes might occur in DB.

        I used SQL nicely to do stuff’s ORM can’t to get the best of the 2 worlds. Unlike you, i won’t argue against SQL, despite it is error prone due to “strings” and custom mapping code.

        Each tool is only as much good, as the developer it uses!

        BTW:
        500k select – ado.net adapter – around 1 sec, ado.net reader arounf 0,5 sec, EF6 without using properly 10 sec, and EF6 with proper setttings and use: 1.6sec (and this one had already converted the data to my structure). Yeah, it was slower, but i think data integrity, clean and manageable code matters more than 1-2 additional milliseconds to get a single row…

  9. Hi. Nice article. I have a question. How to use Navigation Properties in place od conditional joins. For example I ususaly have deleteflag in the tables and write SQLs like:

    SELECT * 
       FROM client c
          JOIN contract cnt on(c.client_id = cnt.client_id and cnt.deleteflag = 0)
     WHERE c.deleteflag = 0
    

    how to do the …and cnt.deleteflag = 0 part with Navigation Properties?

    1. If you really want an inner join as in your example, you would have to start with the contract when building the query:

      var contracts = from c in ctx.Contracts
                              where c.deleteflag == 0
                              select c;
       
      foreach(c in contracts)
      {
        var client = c.client;
      }

      You can also start with the client, but that will expand to a left outer join.

      foreach(c in ctx.Clients)
      {
        var contracts = c.Contracts.Where(cnt => cnt.deleteflag == 0)
      }

      Please note that both examples will rely heavily on lazy loading and have bad performance for large data volumes. In real life add a select to get only the properties you need. That will avoid the lazy loading.

  10. Thanks for your interesting article. In your examples you talk about a main table with one or several directly related tables. But what if have three tables A, B, C. A is related to B and B is related to C. For example i have a users table, a user’s groups table (user can belong to several groups), and a procedures groups table (There is several procedures for each group). I want to display the procedures for a given user. In this case is it mandatory to use joins ?

    1. A colleague gave me the solution. I share this solution as i hope it will help other people :

      List procedures = (ctx.Users.Single(u => u.UserID == userID)
          .Groups.SelectMany(g => g.Procedures)
          .Where(p => p.Version==1)).ToList();
      1. That’s a valid solution, but I would like to show another way of thinking that often makes LINQ queries more easy to create. When doing a traditional join in SQL (or in LINQ) it doesn’t matter what entity you start with, but when using navigation properties it does matter. It’s often easiest to start with the entity that is “most many”, which will let you avoid getting sub collections (which your colleague handled with SelectMany().

        List procedures = ctx.Procedures
            .Where(p => p.Version == 1 && p.Group.User.UserID == userId).ToList();
    2. Thanks for your help. The problem is that in ‘p.Groups’, i don’t have access to the property Users. In my entities schema, the Groups entity has a Users property, but in my Linq queries, i can access only to Groups. When i try to make Linq queries, i have access to the navigation properties of the ‘main table’ but i can’t navigate through tables.
      If i try to run you query, i’ve the following error : Error 1 ‘System.Collections.Generic.ICollection’ does not contain a definition for ‘Users’ and no extension method ‘Users’ accepting a first argument of type ‘System.Collections.Generic.ICollection’ could be found (are you missing a using directive or an assembly reference?)

      1. How about the following

        List procedures = ctx.Procedures
        .Where(p => p.Version == 1 && p.Groups.Any(g => g.User.UserID == userId)).ToList();

  11. Hi..
    I have been trying to implement left outer join on Employee and Commission Tables (with sum on bonus attribute of second table)..
    Any Idea how to implement it using navigation?

    1. If I understand what you want to achieve, you want a list of employees – both those with one or more commisions and those without. That’s actually the default behaviour for navigation properties, as long as you build the query based on the Employee table.

      For an example using the Cars database in this post, consider this query:

      Brands.Select(b => new 
      {
        b.Name,
        Count = b.Cars.Count(),
        TotalSeats = b.Cars.Sum(c => (int?)c.Seats) ?? 0
      });

      That will give the result:

      Volvo	2	12
      Saab	1	4
      Audi	1	2
      Ford	0	0
      

      As you can see, Ford is in the list too, even though the Ford brand has no associated cars.

      Finally, the (int?) cast done for sum is a workaround, which I’ve described in another blog post.

      1. Sorry for late.. Anders you got my point. As I am a beginner in WPF race, currently working on a real client project using WPF with MVC architectural pattern. What I am creating a list of Employees, obviously that can be printed, used DataGrid to display like DataGridName.ItemSource= QueryResult. and on .XAML I explicitly binded each column of grid to table attributes(basically tables in my DB are actually classes in my project).

        Query implemented:

        public static IEnumerable EmpListToPrint()
        {
          dc = new SolarDatabaseDataContext();
          var result = dc.Employees.Select(b => new
          {
            b.EmpName,
            b.Salary,
            Bonus = b.CommissionDetails.Sum(c => (int?)(int.Parse(c.CommissionRs))) ?? 0,
            b.Deduction,
          });
          return result;
        }
         
        DataGridName.ItemSource = result;

        The thing is still query not working as it gives me exception as
        {SQL Server Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, , >= or sub query used as an expression. }

        Any idea how to deal with that?

      2. You should check the actual SQL that gets executed and see if you can figure out what’s wrong. Try using SQL Server Profiler to see what actual SQL is run.

  12. Problem solved after trying again and again.. Dear Anders Abel I have a request to made… can you post a full tutorial on Linq to Sql Joins with navigation and with combined most common aggregate functions that will be a real bonus for me any many of others who have difficulty in using joins and want to do things in simple navigation pattern.. thanks in advance .. If not possible then kindly post here one to one examples.. I’ll do it my self..

  13. Why does querying a navigation property seem to result in the enumerated items being reset to null?

    I’ve a property that checks to see if any child items – the navigation property is a list of parts – have been used or not. However, if you do a Parts.Where(function(p) p.IsUsed).Any and trace it, the list which is populated seems full of null objects.

    1. Can’t say anything about that unless I have more info such as whether this is linq-to-sql or linq-to-ef. Please post a question on Stack Overflow instead.

    2. It’s likely the debugger enumerating all those members. If you avoid the debugger and use old-fashioned “print” statements, you would probably see the list as your program sees it. I was going through the same headaches while trying to troubleshoot a problem with lazily loading object properties. I eventually figured out that it wasn’t my code forcing the elements to be evaluated and initializing them, but the debugger’s enumeration of the object accessing the properties’ backing fields, which initialized them.

  14. Hi Anders,

    first of all, thank you for this article. Its really as you pointed out: If one googles for joining the most common SO answers dealing with join-query-improvements. From now on I will try to contribute to the “Nav-Prop-Promo” ;)

    But I think, you should talk to Julie Lerman regarding this issue.. She has made almost every Pluralsight-Course for EF and she strongly recommends the usage of FK-Properties (which is the first step to not use Navigation properties I think). Check https://app.pluralsight.com/player?course=entity-framework-6-getting-started&author=julie-lerman&name=entity-framework-6-getting-started-m2&clip=5&mode=live (starting at sec 45) and https://app.pluralsight.com/player?course=entity-framework-6-getting-started&author=julie-lerman&name=entity-framework-6-getting-started-m2&clip=5&mode=live (starting at 3:15).

    Pluralsight is a very important resource and a trusted source for me. I think I am not the only developer who uses Pluralsight for learning. Her voice is much louder than mine, so for promo you may ask her too.. :)

    1. Thanks for bringing Julie’s course up. As far as I understand it, she’s talking about some cases where having only a navigation property in the class is a problem, especially when saving/updating entities. Using navigation properties in that case would force you to load the referenced entity from the database – just to be able to reference it. That’s a lot of wasted database queries. Having access to the foreign key instead can increase performance a lot.

      But when reading and joining data, I don’t see any contradiction between this post and her course.

  15. Hello Anders! Nice advice. I love you point of view, and I would like to apply it, but I find myself spoiled too often by the fact that I don’t know how to start a query.

    For instance, I have just asked this in Stack Overflow: https://stackoverflow.com/questions/47642935/complex-query-with-filters-in-entity-framework-core

    Another question is, how to proceed when the query involves filtering over navigation properties? for instance: get the cars of a user which their company is not from Japan. How could we write that query?

    Big thanks!!

  16. You are incorrct about joins.
    Joins keep the releationships in the business layer where it belongs!
    You start messing with navigation props or adding more in the data layer you affect the whole app.
    Also Joins give you flatter results with less performance impact if you do need all the navigvation prop data and its esp performance limiting with many columns and lots of data. Please dont talk about things you are clueless about and dont make bold statements without thinking!

    1. You are confusing data access with data manipulation. Joins take advantage of the associations you make between data in the database – that’s the data layer. A business layer is for manipulating and deriving more complex data structures or enforcing conditions among multiple data values. You lose performance by moving unnecessary things into the business layer.
      Perhaps you should heed your own advice and be careful of making bold statements before thinking.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.