LINQ is getting an upgrade with .Net 4.0. Features are being added that allow us to easily execute queries in parallel. It gives us another reason to drop those odious for and foreach statements and use LINQ. .Net 4.0 introduces the IParallelEnumerable, which will execute queries on it in the most parallel way that it can. In addition, IEnumerable has a new method, AsParallel(), that will return your IEnumerable as an IParallelEnumerable. This allows existing LINQ queries and existing data sources to quickly and easily be transformed into well-oiled query machines.
var numbers = new List<int> {1,2,3,4};
var squares = numbers.AsParallel().Select(n => n*n);
This query will use all available processor cores when it comes time for evaluation. This particular example is not large enough to really justify the use of parallel processing, and there is some potential pain with the use of parallel processing. When this finally evaluates, squares could contain any permutation of {1,4,9,16}, and it won’t be the same order every time it is run. If it is absolutely imperative that your query results are in the the same order as they would be in from a sequential expression, you can use the Ordered() method.
For benchmarking purposes, I created two queries. The first is a standard sequential LINQ query, the second uses ParallelEnumerable to generate the range of numbers and parallelize the query. They both find the count to force execution of the query.
int upper = 1000000;
Query 1:
var numbers = Enumerable.Range(0, upper).Select(n => (double)n).
Where(n => n % 2 == 0).
Select(n => n * (n / 2));
var count = numbers.Count();
Query 2:
var numbers = ParallelEnumerable.Range(0, upper).Select(n => (double)n).
Where(n => n % 2 == 0).
Select(n => n * (n / 2));
var count = numbers.Count();
In my testing (which was admittedly not exhaustive), Query 1 takes on average 77 seconds to run while Query 2 takes on average 56 seconds to run on a 2 core machine. The benefits of running parallel are self-evident, as is the ease.
Danger!
There is a big "gotcha" that can come out of these parallel statements, and since it is so easy to create a parallel expression it is also really easy to do something that won’t work as intended. The rule here is: do not write queries that will make changes to shared state. In general, your queries should not be changing state at all, but if you make changes to shared state you will end up with unintended behavior and possibly thrown exceptions. For example:
private bool testValue;
private void SharedStateTest()
{
var test = ParallelEnumerable.Range(1, 1001).Select(n => testValue = (n > 995));
test.Count();
Console.WriteLine(testValue);
}
If this were written as a sequential query, "True" would always be printed; however, this PLINQ query is non-deterministic and frequently outputs "False". The readability and maintenance of your code will be much better if you never make state changes in your LINQ expressions and you will avoid race conditions like this example.
