Code CSS

Thursday, February 23, 2012

Avoid NULL checks on outer joins in LINQ to SQL

LINQ to SQL sometimes behaves differently than LINQ to objects. I found a good example of this when converting some large T-SQL functions over to LINQ to SQL. Consider this LINQ example run against AdventureWorks:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        FirstName = c == null ? null : c.FirstName
    };        

When querying against objects, the null check for c when setting FirstName is required, because it is being linked to the Contacts list via what is essentially a LEFT OUTER JOIN. This means all employees will be returned, but in cases where there is no corresponding record in the Contacts table, c will be NULL. If c is NULL, the c.FirstName statement will give you a null reference exception when LINQ tries to enumerate the results. In LINQPad, you can see that the LINQ to SQL that is generated from the above query is this:

SELECT [t0].[EmployeeID], 
    (CASE 
        WHEN [t2].[test] IS NULL THEN NULL
        ELSE CONVERT(NVarChar(MAX),[t2].[FirstName])
     END) AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[ContactID], [t1].[FirstName]
    FROM [Person].[Contact] AS [t1]
    ) AS [t2] ON [t0].[ContactID] = [t2].[ContactID]

This runs in SQL Server with the expected results, but you may notice the odd "SELECT 1 as [test]" subquery in the T-SQL that is being used to test whether the outer join is returning a null or not. The SQL server query parser is smart enough to figure out that the subquery is basically joining to the Contact table and adding in a scalar computation, but it still isn't as efficient as the query could be. However, if we rewrite the LINQ query like this:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        c.FirstName
    };    

This LINQ query could fail at runtime if we were querying objects, but when using LINQ to SQL this is exactly what we want. The generated SQL becomes:

SELECT [t0].[EmployeeID], [t1].[FirstName] AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN [Person].[Contact] AS [t1] ON [t0].[ContactID] = [t1].[ContactID]

This is certainly easier to read and it actually performs better in SQL server. It is hard to see in a small example like this, and the query plans are almost identical, but for a very large query with several outer joins, this can double or triple performance in some cases over the previous version of the query.

Thursday, February 16, 2012

GroupBy with Dynamic LINQ

Dynamic LINQ using System.Linq.Dynamic

First of all, when people talk about "Dynamic LINQ" there are any number of things they could be referring to. I want to mention a tidbit about the DynamicQuery sample that is delivered with Visual Studio. In Visual Studio 2010, the sample is in the following directory:

\Program Files (x86)\Microsoft Visual Studio 10.0\Samples\

Somewhere beneath that directory you will find the CSharpSamples.zip file, which contains a LinqSamples\DynamicQuery path with the dynamic querying capability I'm referring to.  I first learned about this from Scott Guthrie's blog.

The DynamicQuery example contains the Dynamic.cs file, which is a set of extension methods and an expression language that allow you to easily manipulate LINQ queries at runtime. This is incredibly powerful but is sparsely documented and introduces a few problems I'll discuss in later posts.

Group By with Dynamic LINQ

I had a really hard time finding examples of how to group results using Dynamic LINQ, so here's a simple example I came up with:

List<Data> datalist = new List<Data>();
datalist.Add(
    new Data() { Num1 = 1, Num2 = 2, Str1 = "A", Str2 = "Red" });
datalist.Add(
    new Data() { Num1 = 2, Num2 = 5, Str1 = "A", Str2 = "Blue" });
datalist.Add(
    new Data() { Num1 = 3, Num2 = 8, Str1 = "A", Str2 = "Red" });
datalist.Add(
    new Data() { Num1 = 4, Num2 = 12, Str1 = "B", Str2 = "Red" });

var test = datalist.AsQueryable()
    .GroupBy("new(Str1,Str2)", "new(Num2,Num1)");
    .Select("new(Key.Str1, Key.Str2, Average(Num2) as SumNum, Max(Num1) as MaxNum)");

foreach (object o in test)
    Console.WriteLine(o.ToString());
In the example, the "Data" class is just a class with three properties, Num1, Num2, Str1, and Str2 to simulate a table of data. I couldn't find documentation on the GroupBy method anywhere, so I hope this helps someone else.

Hello World

If you've found yourself here, you probably either know me or are searching for the obscure answer to some software development problem that has been bugging you for a few hours at least. This is why I decided to create this little blog, because I so frequently find myself scouring the web for examples and hints on how to do things that it sometimes feels like nobody else has done before.

In reality, these are things that people have done but probably not had the time or inclination to share, and I certaintly can't complain if I don't share them either once I figure it out. So, this will be my repository of tips and tricks that I generally spent hours or days trying to figure out. I hope this is helpful to some other developers out there, and at worst it will supplement my miserable memory when people ask me how to do something.

Enjoy!