Code CSS

Tuesday, June 12, 2012

SQL Server Trace Flag 4199

You know, sometimes as a software developer I'm forced to do things that feel wrong but necessary, like cobble together some kludge that barely works to meet an insane schedule that someone else came up with. I work at a fairly small-time operation as software companies go, though, so I thought that might be just a side effect of our small team and underdeveloped processes. Surely the big guys do it better, right?

Well I came across something a while back in SQL Server that makes me wonder: trace flag 4199. This obscure flag is set to "off" by default (as are all trace flags) but by turning it on, you enable numerous bug fixes to in the query optimizer. If I understand it correctly, this means no matter what patch or service pack you have added to SQL Server 2008 R2, you still don't get any bug fixes in the query optimizer without turning on trace flag 4199. Presumably this is so that a patch will not start producing different query plans for an established and highly-tuned system, thus causing a nasty performance problem. I get why they did this, but it sure seems like it would be more widely advertised. Frankly, I would have made the default to enable all fixes and provide a flag to turn off the fixes if you found that they affect performance. It seems like the vast majority of SQL Server users and administrators would want all the bug fixes they can get, right?

Anyway, check out the details on trace flag 4199 here, and if you are having trouble with some query plans, it might make sense for you to turn it on:
http://support.microsoft.com/kb/974006

In my case it didn't really help or hurt for most test cases, but it certainly DID create different query plans in some cases, and most of those that were different were better than the old ones.

UPDATE 1/2/2013:
Trace flag 4199 continues to exist in SQL Server 2012, but since I published this post I have encountered a query that actually won't run when it is turned on. One of my larger queries would simple run all day filling up tempdb (over 100GB before I cancelled it) when this flag was turned on in SQL 2012. It wouldn't even generate an estimated query plan in SSMS! Applying SP1 allowed me to view the query plan again, but the query still would fill up tempdb. I had to turn off trace flag 4199 to make the query run again.

There is certainly a lot of room for improvement in my query but this doesn't give me a lot of faith in the "bug fixes" that trace flag 4199 is supposed to include. It seems it probably creates as many bugs as it fixes. It honestly gives me the feeling that SQL Server query optimizer has turned into such a beast that the developers at Microsoft aren't confident in their ability to fix problems in it without creating massive new problems.

Wednesday, March 28, 2012

Converting IQueryable to IQueryable<T> with Dynamic LINQ

In a previous post I talked about using Dynamic LINQ. While extremely powerful, the results of all Dynamic LINQ queries are IQueryable, which loses the benefits of working with a strongly typed IQueryable<T>. Even worse, the elements in an IQueryable result of a Dynamic LINQ query are generated on the fly and derived from DynamicClass included in the sample, so you can't even build a static converter to get query results into your own type. I struggled with this off and on for weeks, because I needed a way to return an IQueryable<T> from a function that used Dynamic LINQ without enumerating the results.

I finally found the solution in a wonderful enhancement of Microsoft's Dynamic LINQ sample. Krzysztof Olczyk has worked some magic to add several handy features to Dynamic.cs, one of which allows the casting of query results back to a known type. So with his version of Dynamic.cs, you can do this:

IQueryable<MyType> Results = MyResults.Select<MyType>("new @out (Prop1, Prop2)");

This is great because it bridges the gap between the performance and flexibility of Dynamic LINQ and the type-safety of standard LINQ, and you get the best of both worlds!

UPDATE: The relevant code for the Select<T> function is below, but I highly recommend you check out the other enhancements he as added as well.

        private static IQueryable Select(Type type, IQueryable source, string selector, object[] values) {
            if (source == null) throw new ArgumentNullException("source");
            if (selector == null) throw new ArgumentNullException("selector");
            LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, type, selector, values);
            return source.Provider.CreateQuery(
                Expression.Call(
                    typeof(Queryable), "Select",
                    new Type[] { source.ElementType, lambda.Body.Type },
                    source.Expression, Expression.Quote(lambda)));
        }
                
                public static IQueryable Select(this IQueryable source, string selector, params object[] values) {
                        return Select(null, source, selector, values);
                }
                
                public static IQueryable<T> Select<T>(this IQueryable source, string selector, params object[] values) {
                        return Select(typeof(T), source, selector, values) as IQueryable<T>;
                }

Seek predicates on multi-column indexes

While trying to improve performance of some of my SQL server queries, I ran across an excellent article by Craig Freedman that explains how SQL server creates seek predicates when using an index seek to find rows in a table.

The key point I took away from this (and that I didn't know before) is that a multi-column index won't utilize the second column unless the first is searched as an equality. In my current project, we have two dates to represent when a row was added to the table and when it is removed. This is a great system for keeping unlimited change tracking and history data while keeping everything in one table, but it makes searches slow because only the first date is used as an index seek predicate. Most of our queries are something like this:

select FAID from ESP_FAs where Admit <= '3/28/2011 12:00:00' AND Retire > '3/28/2011 12:00:00'

This will get all the rows that exist on the specified date and haven't been deleted. The problem is that based on Craig's article above, only the first date is used in the seek predicate if Admit and Retire are both columns in an index. The retire clause becomes a residual predicate, so having it in the multi-column index isn't really helping us much.

So far the best solution I've found for this is to make sure that the first column included in a multi-column index that is not being compared to with equality is the most limiting. This reduces the number of rows SQL server has to sort through with the residual predicate.

Thursday, February 23, 2012

Avoid NULL checks on outer joins in LINQ to SQL

LINQ to SQL sometimes behaves differently than LINQ to objects. I found a good example of this when converting some large T-SQL functions over to LINQ to SQL. Consider this LINQ example run against AdventureWorks:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        FirstName = c == null ? null : c.FirstName
    };        

When querying against objects, the null check for c when setting FirstName is required, because it is being linked to the Contacts list via what is essentially a LEFT OUTER JOIN. This means all employees will be returned, but in cases where there is no corresponding record in the Contacts table, c will be NULL. If c is NULL, the c.FirstName statement will give you a null reference exception when LINQ tries to enumerate the results. In LINQPad, you can see that the LINQ to SQL that is generated from the above query is this:

SELECT [t0].[EmployeeID], 
    (CASE 
        WHEN [t2].[test] IS NULL THEN NULL
        ELSE CONVERT(NVarChar(MAX),[t2].[FirstName])
     END) AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[ContactID], [t1].[FirstName]
    FROM [Person].[Contact] AS [t1]
    ) AS [t2] ON [t0].[ContactID] = [t2].[ContactID]

This runs in SQL Server with the expected results, but you may notice the odd "SELECT 1 as [test]" subquery in the T-SQL that is being used to test whether the outer join is returning a null or not. The SQL server query parser is smart enough to figure out that the subquery is basically joining to the Contact table and adding in a scalar computation, but it still isn't as efficient as the query could be. However, if we rewrite the LINQ query like this:

var EmpResult = from e in Employees
    join c in Contacts on e.ContactID equals c.ContactID into ContactsJoin
    from c in ContactsJoin.DefaultIfEmpty()
    select new
    {
        e.EmployeeID,
        c.FirstName
    };    

This LINQ query could fail at runtime if we were querying objects, but when using LINQ to SQL this is exactly what we want. The generated SQL becomes:

SELECT [t0].[EmployeeID], [t1].[FirstName] AS [FirstName]
FROM [HumanResources].[Employee] AS [t0]
LEFT OUTER JOIN [Person].[Contact] AS [t1] ON [t0].[ContactID] = [t1].[ContactID]

This is certainly easier to read and it actually performs better in SQL server. It is hard to see in a small example like this, and the query plans are almost identical, but for a very large query with several outer joins, this can double or triple performance in some cases over the previous version of the query.

Thursday, February 16, 2012

GroupBy with Dynamic LINQ

Dynamic LINQ using System.Linq.Dynamic

First of all, when people talk about "Dynamic LINQ" there are any number of things they could be referring to. I want to mention a tidbit about the DynamicQuery sample that is delivered with Visual Studio. In Visual Studio 2010, the sample is in the following directory:

\Program Files (x86)\Microsoft Visual Studio 10.0\Samples\

Somewhere beneath that directory you will find the CSharpSamples.zip file, which contains a LinqSamples\DynamicQuery path with the dynamic querying capability I'm referring to.  I first learned about this from Scott Guthrie's blog.

The DynamicQuery example contains the Dynamic.cs file, which is a set of extension methods and an expression language that allow you to easily manipulate LINQ queries at runtime. This is incredibly powerful but is sparsely documented and introduces a few problems I'll discuss in later posts.

Group By with Dynamic LINQ

I had a really hard time finding examples of how to group results using Dynamic LINQ, so here's a simple example I came up with:

List<Data> datalist = new List<Data>();
datalist.Add(
    new Data() { Num1 = 1, Num2 = 2, Str1 = "A", Str2 = "Red" });
datalist.Add(
    new Data() { Num1 = 2, Num2 = 5, Str1 = "A", Str2 = "Blue" });
datalist.Add(
    new Data() { Num1 = 3, Num2 = 8, Str1 = "A", Str2 = "Red" });
datalist.Add(
    new Data() { Num1 = 4, Num2 = 12, Str1 = "B", Str2 = "Red" });

var test = datalist.AsQueryable()
    .GroupBy("new(Str1,Str2)", "new(Num2,Num1)");
    .Select("new(Key.Str1, Key.Str2, Average(Num2) as SumNum, Max(Num1) as MaxNum)");

foreach (object o in test)
    Console.WriteLine(o.ToString());
In the example, the "Data" class is just a class with three properties, Num1, Num2, Str1, and Str2 to simulate a table of data. I couldn't find documentation on the GroupBy method anywhere, so I hope this helps someone else.

Hello World

If you've found yourself here, you probably either know me or are searching for the obscure answer to some software development problem that has been bugging you for a few hours at least. This is why I decided to create this little blog, because I so frequently find myself scouring the web for examples and hints on how to do things that it sometimes feels like nobody else has done before.

In reality, these are things that people have done but probably not had the time or inclination to share, and I certaintly can't complain if I don't share them either once I figure it out. So, this will be my repository of tips and tricks that I generally spent hours or days trying to figure out. I hope this is helpful to some other developers out there, and at worst it will supplement my miserable memory when people ask me how to do something.

Enjoy!