Don’t use Linq’s Join. Navigate!

One of the greatest benefits of LINQ to SQL and LINQ to Entities is navigation properties that allows queries across several tables, without the need to use explicit joins. Unfortunately LINQ queries are often written as a direct translation of a SQL query, without taking advantage of the richer features offered by LINQ to SQL and LINQ to Entities.

It is not uncommon to see code doing a join manually.

from p in ctx.Persons
join c in ctx.Cities
on p.BornIn equals c.CityID
select new
{
    p.FirstName,
    c.Name
};

I think that using navigation properties makes the code much easier to read.

from p in ctx.Persons
select new
{
    p.FirstName,
    p.BornInCity.Name
};

Explicit Joins Taking Over

The first result for the google search for “linq-to-sql join” shows how to do several types of joins, but never mentions navigation properties.

On Stack Overflow there are plenty of questions about joins. The sad thing is that most of them are answered with help on doing explicit joins. Navigation properties are hardly ever mentioned.

A More Convincing Example

If there still is anyone in doubt of the benefits of navigation properties I’ve created another example.

from p in ctx.Persons
where p.ID == personId
join bornIn in ctx.Cities
on p.BornIn equals bornIn.CityID
join livesIn in ctx.Cities
on p.LivesIn equals livesIn.CityID
join s in ctx.Sexes
on p.SexID equals s.ID
select new PersonInfo
{
    Name = p.FirstName + " " + p.LastName,
    BornIn = bornIn.Name,
    LivesIn = livesIn.Name,
    Gender = s.Name,
    CarsOwnedCount = ctx.Cars.Where(c => c.OwnerID == p.ID).Count()
}

Especially note the double join against the City table. I very much prefer the query with navigation properties.

from p in ctx.Persons
where p.ID == personId
select new PersonInfo
{
    Name = p.FirstName + " " + p.LastName,
    BornIn = p.BornInCity.Name,
    LivesIn = p.LivesInCity.Name,
    Gender = p.Sex.Name,
    CarsOwnedCount = p.Cars.Count(),
}

It is worth noting that both these queries generate the same SQL when executed (to be honest, the left and right side of three comparisons are swapped, so the text is actually different). There is no performance impact of using navigation properties.

Navigation Properties and Foreign Keys

As long as there are proper foreign key constraints in the database, the navigation properties will be created automatically. It is also possible to manually add them in the ORM designer. As with all LINQ to SQL usage I think that it is best to focus on getting the database right and have the code exactly reflect the database structure. With the relations properly specified as foreign keys the code can safely make assumptions about referential integrity between the tables.

Please Promote Navigation Properties

Please help me promote navigation properties and discourage use of explicit joins unless really needed (yes, there are cases where navigation properties are not suitable, but they are a small minority). If you are on Stack Overflow answering questions, please point out whenever navigation properties can be used instead of just providing the join.

  • Lee on 2012-06-14

    In your first example, you haven’t been consistent in the properties BornIn and BornInCity. Is this deliberate? I would assume BornIn is an integer (unless you’ve overridden equality with an int), and BornInCity is a city object, in which case traversal wouldn’t be an option. I do agree though that traversal should be used as a preference, and use joins in more complicated scenarios.

    • Anders Abel on 2012-06-15

      Both examples were run against the same Person table, which (simplified by removing INotifyPropertyChanged code etc.) looks something like this:

      [Table(Name="dbo.Persons")]
      public class Person
      {
       
        // Other fields goes here...
       
        [Column]
        public int BornIn { get; set; }
       
        [Association(Name="City_Person", ThisKey="BornIn", 
          OtherKey="CityID, IsForeignKey=true)]
        public City BornInCity { get; set; }  
      }

      As you’ve guessed BornIn is an int – it’s mapped against the actual foreign key field in the database. BornInCity is the navigation property built on top of the foreign key. It is a City object reference. Using the BornInCity navigation property makes LINQ to SQL take care of creating the join automatically.

      I don’t really understand what you mean with not being consistent but I hope the explanation above will help. If not, please try to describe a bit more detailed what you think is inconsistent.

      • Lee on 2012-06-18

        Thanks for clearing that up, I did not know that the Entity Framework creates both properties for you. Therefore it seemed like you were solving two separate problems (data linked by integer, and linked by object), rather than comparing two options to the same problem.

      • Nicolas Lope de Barrios on 2014-03-21

        Hi Anders, great article, very useful.
        I am using LINQ to entities for building a WCF RIA Service for a VS LightSwitch App, wich uses Entity Framework. Refactoring the lamda expression after reading your post, the navigation properties are not recognised by default. Do I have to create them explicitly, as in the above example for BornIn? If so, can you provide where I can learn more about [Association]?

        Thank you. I will definitely help you spread the word.

      • Anders Abel on 2014-03-22

        I haven’t used WCF RIA Servies myself so I don’t know the details. From this Stack Overflow post it looks like you have to add some extra attributes to expose navigation properties over the RIA Service.

        I you need more assistance, I would suggest posting a question on Stack Overflow to get help from someone that knows RIA Services.

  • Sander on 2012-06-27

    Will this approach work in the same way for all eager, lazy and explicit loading? I have a sneaky feeling that it works only with lazy – and most web applications use POCO entities with eager loading.

    • Anders Abel on 2012-06-27

      In the specific examples in this post, I’m only selecting primitive types (ints and strings) directly into an specific type (anonymous in the first example, PersonInfo in the second) so there is no related entities that could possibly be lazy loaded. All data is fetched at once.

      In a more general example, if we would first bring in complete Person objects and then use the BornIn navigation property to get the city object the city would be lazy loaded by default, but both LINQ to SQL and LINQ to Entities supports eager loading. In LINQ to SQL it’s done through the LoadOptions property on the data context. In EF it’s done by a call to Include() in the query itself.

  • James Curran on 2012-06-27

    Here’s a real-life example that is currently in use on my website (njtheater.com):

    var productions = from prod in db.Productions
            where prod.StartDate  DateTime.Now 
            orderby prod.Play.Title 
            select new  
                   { 
                       Title = prod.Play.Title, 
                       Troupe = prod.Troupe.Name, 
                       Venue = prod.Venue.Name, 
                       City = prod.Venue.City, 
                       StartDate = prod.StartDate , 
                       EndDate = prod.EndDate
                   };

    That requires joining four table, all handled automatically by LINQ. I wrote about it a few years ago on my blog.

  • Janus007 on 2012-07-05

    You should also mention that the “Navigation” only is available if the tables are related in the data model and that you only can navigate up so to speak… i.e. they are actually EntitySet’s of the parent.

    If you want to “navigate” down, then you would need something like:
    CarsOwnedCount = ctx.Cars.Count() , as you wrote.

    But this creates an overhead if you don’t use the DataLoadOptions :)

    • Anders Abel on 2012-07-05

      You’re right in that following a foreign key “backwards” will give many results. However, there will be no overhead if the Count() is called within the query. If I would have kept the entire list of cars the SQL performance would be horrible – as described in Always Check Generated SQL.

      Here’s the SQL generated for the last query:

      SELECT [t5].[VALUE] AS [Name], [t5].[Name] AS [BornIn], [t5].[Name2] AS [LivesIn], [t5].[Name3] AS [Gender], [t5].[value2] AS [CarsOwnedCount]
      FROM (
          SELECT ([t0].[FirstName] + @p0) + [t0].[LastName] AS [VALUE], [t1].[Name], [t2].[Name] AS [Name2], [t3].[Name] AS [Name3], (
              SELECT COUNT(*)
              FROM [dbo].[Cars] AS [t4]
              WHERE [t4].[OwnerID] = [t0].[ID]
              ) AS [value2], [t0].[ID]
          FROM [dbo].[Persons] AS [t0]
          INNER JOIN [dbo].[City] AS [t1] ON [t1].[CityID] = [t0].[BornIn]
          INNER JOIN [dbo].[City] AS [t2] ON [t2].[CityID] = [t0].[LivesIn]
          INNER JOIN [dbo].[Sexes] AS [t3] ON [t3].[ID] = [t0].[SexID]
          ) AS [t5]
      WHERE [t5].[ID] = @p1

      As you can see, it actually calculates the count directly in the query. There will be no lazy loading required in this case.

  • DaveB on 2013-03-07

    This doesn’t work for tables where the foreign key may be null, any suggestions?

    • Anders Abel on 2013-03-08

      I haven’t had any problems with nullable foreign keys, navigation properties work for them as well. There must be something different to how you use the navigation properties compared to how I do. To be able to help, I would need some example code together with some info on where it fails. For that kind of questions, Stack Overflow is better than comments here, so please post a question there with some example code.

      • DaveB on 2013-03-11

        Thanks for the reply, Its sorted now The model was not updated from the db correctly.

  • Effahid on 2013-07-13

    Thanks! This makes joins SO MUCH easier… just the thing I was looking for..

    Now I’m worried its TOO easy, I may forget the sql joins.

    I need to see the glass as half full :)

  • gorczas on 2013-12-10

    I would like not to use explicit joins wherever it is possible, but I have problem with one to many relationships. Like in your example I would like to get Person and its car model

    from p in ctx.Persons
    join c in ctx.Cars on p.ID equals c.OwnerID 
    where p.ID == personId && c.Brand == brandName
    select new PersonInfo
    {
        Name = p.FirstName + " " + p.LastName,
        CarModel = c.Model
    }

    is it possible to write this query without this horrible join?

    • Anders Abel on 2013-12-11

      Yes it is, but with LINQ it’s important to start with the right table, which isn’t the case in plain SQL. If you want a resulting object for each car in the database (which is what you will get from your query), then you have to start with the car:

      from c in ctx.Cars
      where c.PersonId == personId && c.Brand == brandName
      select new PersonInfo
      {
        Name = c.Person.FirstName + " " + c.Person.LastName,
        CarModel = c.Model
      }

  • TheDude on 2014-01-10

    This is bad advice. No real world application has lazy loading enabled or uses eager loading if they care about performance and one of those is required in order to use navigation properties as you’re suggesting. Even the smallest of applications I create care about performance no matter how small the dataset. Loading potential thousands of related entities just for the ease of using navigation properties is bad advice and makes for lazy developers not to mention horrible performance.

    • Anders Abel on 2014-01-10

      I think you miss the point. In the example in the post, The SQL is identical.

      It is worth noting that both these queries generate the same SQL when executed (to be honest, the left and right side of three comparisons are swapped, so the text is actually different). There is no performance impact of using navigation properties.

      There is no difference in performance. The performance characteristics is determined by the select statement/method. As long as the select requests exactly the data needed (no more, no less) it will be executed as one efficient query against the database.

      I’d suggest that you use a profiler to examine the actual SQL generated if you are in doubt. See my posts about Linq-to-SQL and Entity Framework for details.

    • nick cave on 2016-10-31

      TheDude, you’re absolutely right, this causes terrible performance problems. I see code written by novices being plagued by this, performance issues due to unnecessary and convoluted sql generated by orms. I suggest everyone to craft their queries with care and use as few metafeatures as possible. ORMs generate SQL and that is terrible, case closed. There is a small subset for which to use navigation properties but generally avoid them. They may look magic and cool but think twice. I suggest novices to learn from someone with experience not toolheads. And learn SQL ffs!!

      • b on 2016-11-09

        ORM can generate as effective queries as manual SQL. You can use projections, effective joins, and so on. I know some developers don’t check what’s get loaded/generated (what select is there), this is the main reason of failure.

        Mapping entities via ORM leads then to code that has types checked. So IQueryable is something worth – especially when changes might occur in DB.

        I used SQL nicely to do stuff’s ORM can’t to get the best of the 2 worlds. Unlike you, i won’t argue against SQL, despite it is error prone due to “strings” and custom mapping code.

        Each tool is only as much good, as the developer it uses!

        BTW:
        500k select – ado.net adapter – around 1 sec, ado.net reader arounf 0,5 sec, EF6 without using properly 10 sec, and EF6 with proper setttings and use: 1.6sec (and this one had already converted the data to my structure). Yeah, it was slower, but i think data integrity, clean and manageable code matters more than 1-2 additional milliseconds to get a single row…

  • Evgeni on 2014-02-03

    Hi. Nice article. I have a question. How to use Navigation Properties in place od conditional joins. For example I ususaly have deleteflag in the tables and write SQLs like:

    SELECT * 
       FROM client c
          JOIN contract cnt on(c.client_id = cnt.client_id and cnt.deleteflag = 0)
     WHERE c.deleteflag = 0
    

    how to do the …and cnt.deleteflag = 0 part with Navigation Properties?

    • Anders Abel on 2014-02-04

      If you really want an inner join as in your example, you would have to start with the contract when building the query:

      var contracts = from c in ctx.Contracts
                              where c.deleteflag == 0
                              select c;
       
      foreach(c in contracts)
      {
        var client = c.client;
      }

      You can also start with the client, but that will expand to a left outer join.

      foreach(c in ctx.Clients)
      {
        var contracts = c.Contracts.Where(cnt => cnt.deleteflag == 0)
      }

      Please note that both examples will rely heavily on lazy loading and have bad performance for large data volumes. In real life add a select to get only the properties you need. That will avoid the lazy loading.

  • Philippe on 2014-04-02

    Thanks for your interesting article. In your examples you talk about a main table with one or several directly related tables. But what if have three tables A, B, C. A is related to B and B is related to C. For example i have a users table, a user’s groups table (user can belong to several groups), and a procedures groups table (There is several procedures for each group). I want to display the procedures for a given user. In this case is it mandatory to use joins ?

    • Philippe on 2014-04-02

      A colleague gave me the solution. I share this solution as i hope it will help other people :

      List procedures = (ctx.Users.Single(u => u.UserID == userID)
          .Groups.SelectMany(g => g.Procedures)
          .Where(p => p.Version==1)).ToList();

      • Anders Abel on 2014-04-02

        That’s a valid solution, but I would like to show another way of thinking that often makes LINQ queries more easy to create. When doing a traditional join in SQL (or in LINQ) it doesn’t matter what entity you start with, but when using navigation properties it does matter. It’s often easiest to start with the entity that is “most many”, which will let you avoid getting sub collections (which your colleague handled with SelectMany().

        List procedures = ctx.Procedures
            .Where(p => p.Version == 1 && p.Group.User.UserID == userId).ToList();

    • Philippe on 2014-04-03

      Thanks for your help. The problem is that in ‘p.Groups’, i don’t have access to the property Users. In my entities schema, the Groups entity has a Users property, but in my Linq queries, i can access only to Groups. When i try to make Linq queries, i have access to the navigation properties of the ‘main table’ but i can’t navigate through tables.
      If i try to run you query, i’ve the following error : Error 1 ‘System.Collections.Generic.ICollection’ does not contain a definition for ‘Users’ and no extension method ‘Users’ accepting a first argument of type ‘System.Collections.Generic.ICollection’ could be found (are you missing a using directive or an assembly reference?)

      • Bob Vale on 2016-05-11

        How about the following

        List procedures = ctx.Procedures
        .Where(p => p.Version == 1 && p.Groups.Any(g => g.User.UserID == userId)).ToList();

  • Dango on 2015-12-14

    Hi..
    I have been trying to implement left outer join on Employee and Commission Tables (with sum on bonus attribute of second table)..
    Any Idea how to implement it using navigation?

    • Anders Abel on 2015-12-14

      If I understand what you want to achieve, you want a list of employees – both those with one or more commisions and those without. That’s actually the default behaviour for navigation properties, as long as you build the query based on the Employee table.

      For an example using the Cars database in this post, consider this query:

      Brands.Select(b => new 
      {
        b.Name,
        Count = b.Cars.Count(),
        TotalSeats = b.Cars.Sum(c => (int?)c.Seats) ?? 0
      });

      That will give the result:

      Volvo	2	12
      Saab	1	4
      Audi	1	2
      Ford	0	0
      

      As you can see, Ford is in the list too, even though the Ford brand has no associated cars.

      Finally, the (int?) cast done for sum is a workaround, which I’ve described in another blog post.

      • Dango on 2015-12-18

        Sorry for late.. Anders you got my point. As I am a beginner in WPF race, currently working on a real client project using WPF with MVC architectural pattern. What I am creating a list of Employees, obviously that can be printed, used DataGrid to display like DataGridName.ItemSource= QueryResult. and on .XAML I explicitly binded each column of grid to table attributes(basically tables in my DB are actually classes in my project).

        Query implemented:

        public static IEnumerable EmpListToPrint()
        {
          dc = new SolarDatabaseDataContext();
          var result = dc.Employees.Select(b => new
          {
            b.EmpName,
            b.Salary,
            Bonus = b.CommissionDetails.Sum(c => (int?)(int.Parse(c.CommissionRs))) ?? 0,
            b.Deduction,
          });
          return result;
        }
         
        DataGridName.ItemSource = result;

        The thing is still query not working as it gives me exception as
        {SQL Server Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, , >= or sub query used as an expression. }

        Any idea how to deal with that?

      • Anders Abel on 2015-12-19

        You should check the actual SQL that gets executed and see if you can figure out what’s wrong. Try using SQL Server Profiler to see what actual SQL is run.

  • Dango on 2015-12-20

    Problem solved after trying again and again.. Dear Anders Abel I have a request to made… can you post a full tutorial on Linq to Sql Joins with navigation and with combined most common aggregate functions that will be a real bonus for me any many of others who have difficulty in using joins and want to do things in simple navigation pattern.. thanks in advance .. If not possible then kindly post here one to one examples.. I’ll do it my self..

  • Richard on 2016-06-10

    Why does querying a navigation property seem to result in the enumerated items being reset to null?

    I’ve a property that checks to see if any child items – the navigation property is a list of parts – have been used or not. However, if you do a Parts.Where(function(p) p.IsUsed).Any and trace it, the list which is populated seems full of null objects.

    • Anders Abel on 2016-06-14

      Can’t say anything about that unless I have more info such as whether this is linq-to-sql or linq-to-ef. Please post a question on Stack Overflow instead.

  • Leave a Reply

    Your name as it will be displayed on the posted comment.
    Your e-mail address will not be published. It is only used if I want to get in touch during comment moderation.
    Your name will be a link to this address.
Software Development is a Job – Coding is a Passion

I'm Anders Abel, a systems architect and developer working for Kentor in Stockholm, Sweden.

profile for Anders Abel at Stack Overflow, Q&A for professional and enthusiast programmers

Code for most posts is available on my GitHub account.

Popular Posts

Archives

Series

Powered by WordPress with the Passion for Coding theme.