Code CSS

Thursday, February 23, 2012

Avoid NULL checks on outer joins in LINQ to SQL

LINQ to SQL sometimes behaves differently than LINQ to objects. I found a good example of this when converting some large T-SQL functions over to LINQ to SQL. Consider this LINQ example run against AdventureWorks:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        FirstName = c == null ? null : c.FirstName
    };        

When querying against objects, the null check for c when setting FirstName is required, because it is being linked to the Contacts list via what is essentially a LEFT OUTER JOIN. This means all employees will be returned, but in cases where there is no corresponding record in the Contacts table, c will be NULL. If c is NULL, the c.FirstName statement will give you a null reference exception when LINQ tries to enumerate the results. In LINQPad, you can see that the LINQ to SQL that is generated from the above query is this:

SELECT [t0].[EmployeeID], 
    (CASE 
        WHEN [t2].[test] IS NULL THEN NULL
        ELSE CONVERT(NVarChar(MAX),[t2].[FirstName])
     END) AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[ContactID], [t1].[FirstName]
    FROM [Person].[Contact] AS [t1]
    ) AS [t2] ON [t0].[ContactID] = [t2].[ContactID]

This runs in SQL Server with the expected results, but you may notice the odd "SELECT 1 as [test]" subquery in the T-SQL that is being used to test whether the outer join is returning a null or not. The SQL server query parser is smart enough to figure out that the subquery is basically joining to the Contact table and adding in a scalar computation, but it still isn't as efficient as the query could be. However, if we rewrite the LINQ query like this:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        c.FirstName
    };    

This LINQ query could fail at runtime if we were querying objects, but when using LINQ to SQL this is exactly what we want. The generated SQL becomes:

SELECT [t0].[EmployeeID], [t1].[FirstName] AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN [Person].[Contact] AS [t1] ON [t0].[ContactID] = [t1].[ContactID]

This is certainly easier to read and it actually performs better in SQL server. It is hard to see in a small example like this, and the query plans are almost identical, but for a very large query with several outer joins, this can double or triple performance in some cases over the previous version of the query.

No comments:

Post a Comment