Five months ago, I wrote my C# Asynchronous Programming series, and part of that was an article about Common Mistakes in Asynchronous Programming with .NET. As it turns out, I missed a really common mistake in that article.
public async Task RunAsync() { foreach (var x in new[] { 1, 2, 3 }) { await DoSomethingAsync(x); } }
Whenever I see an await
within a foreach
(or other looping construct), I tend to get suspicious. That’s because a lot of the time, an external service (Web API, Redis, or whatever) is called repeatedly with different parameters. Imagine that DoSomethingAsync()
in the above code performs an HTTP GET request, passing in x
as a querystring parameter.
The above could potentially be optimised to run the requests in parallel, as described in Patterns for Asynchronous Composite Tasks in C#. But since each asynchronous call is being await
ed, this has the effect of waiting for each request-response cycle to complete before starting the next one.
To illustrate the point, we can implement DoSomethingAsync()
as a simple delay:
private async Task DoSomethingAsync(int x) { Console.WriteLine($"Doing {x}... ({DateTime.Now :hh:mm:ss})"); await Task.Delay(2000); Console.WriteLine($"{x} done. ({DateTime.Now :hh:mm:ss})"); }
Let’s run that:
That’s six seconds just to run three two-second delays, which did not depend on each other and which thus could have been run in parallel. In fact, let’s now change the code to do that:
public async Task RunAsync() { var tasks = new List<Task>(); foreach (var x in new[] { 1, 2, 3 }) { var task = DoSomethingAsync(x); tasks.Add(task); } await Task.WhenAll(); }
…and run it again:
That’s 2-3 seconds, which is much better. Note though that the operations have completed in a different order from that in which they were started; this is why it’s important that they don’t depend on each other.
Do you think that’s a lot of code? No problem. We can make it more concise, with some help from LINQ.
public async Task RunAsync() { var tasks = new[] { 1, 2, 3 }.Select(DoSomethingAsync); await Task.WhenAll(tasks); }
Having said all this, it is not good to be hasty and declare war against all await
s in foreach
es, because there are indeed legitimate cases for that. One example is when you have a list of commands which conform to the same interface, and they must be executed in sequence. This is perfectly fine:
public interface ICommand { Task ExecuteAsync(); } public async Task ExecuteAsync(List<ICommand> commands) { foreach (var command in commands) { await command.ExecuteAsync(); } }
When order of operations is important, running this sort of scenario in parallel can yield unexpected results.
My hope is that this will at least help to quickly identify potential performance bottlenecks due to an improper use of asynchrony. await
in foreach
should be eyed suspiciously as a potential code smell, but as with everything else, there is no hard and fast rule and it is perfectly fine if used correctly.
I would have thought making many http requests in parralel would be a risky business. Wouldn’t it overwhelm your available bandwidth? Likewise for writing many files in parralel bottlenecking IO. For these reasons I have always thought it safest in these two scenarios to do things in sequence!
It’s an interesting point that you raise. I’ve always been operating in a context where bandwidth wasn’t really an issue, especially if your requests are relatively small and your load isn’t extremely huge. When you’re trying to scale a system, performance usually becomes a problem long before bandwidth does.
Web servers are designed to handle a large number of requests in parallel, and I’m pretty sure the filesystem will likewise handle reasonable load. Having said that, in my experience I’ve seen there’s some limit beyond which parallelisation (even the async I/O kind) won’t give any further benefit, so whenever I need to process hundreds or thousands of items in parallel, I’ve found myself running batches of 50 or so in parallel at a time.
Also, if you have control over the receiving end, it’s good to provide an endpoint that can process multiple items in a single request, rather than just one item. That will save a lot of bandwidth in terms of roundtrips as well as HTTP header data per request.
I struggled with a nested foreach that had a list of results. I eventually wrote this which works, but seems a little crazy to me!
So if you take something like:
“`
List results = new List();
foreach(foo in fooList)
{
foreach(bar in barList)
{
var subResults = await GetFooBarResult(foo, bar);
results.AddRange(subResults);
}
}
return results;
“`
it can become:
“`
var tasks = fooList.Select( x => barList.Select( async(y) => await GetFooBarResult(foo, bar)));
var results = await Task.WhenAll(tasks);
return results.SelectMany(y=>y).ToList();
“`
Obviously only desirable when you want to run GetFooBarResult in parallel.
Yes, something like that should work. You probably don’t even need the inner async/await pair, so I think it can be simplified further.
Great thank you 🙂
Thanks for this helpful tip!
Is it competely fine to use nested await Task.WhenAll ?
I’ve been experiencing some TaskCancelled Exception resulting to inconsistent rest API calls with each Task
For example:
public async Task ExecuteParentAsync()
{
var tasks = new[] { 1, 2, 3 }.Select(RunChildAsync);
await Task.WhenAll(tasks);
}
public async Task RunChildAsync()
{
var tasks = new[] { 1, 2, 3 }.Select(DoSomethingAsync);
await Task.WhenAll(tasks);
}
May I know your thoughts/recommendations on these type of scenarios especially dealing with exceptions?
Thanks
Yes you can do that, as long as you understand the relationship between the parent and child tasks, and which can run in parallel and which need to run in sequence.
Also I recommend not running too many tasks in parallel at once – I’ve seen performance degradation beyond a certain threshold. It’s better to run batches and wait for one to complete before proceeding with the next.
An await in a loop is no problem at all.
Async and Parallel are two different things and I believe you try to optimize one with another. Buy miss the context.
There can be very good reasons for having awaits in a loop. And also, async and await is not performance. Parellel is.
In a loop the continuation will wait until the task is returned, so there is no problem having an await inside a loop here.
So in you first example, the loop will not continue before the await DoSomethingAsync(x) returns.
I’m afraid you are the one who has missed the point. Yes, await in a foreach has legitimate uses (as I specified towards the end), but it is unnecessary in situations where the requests can be fired in parallel and aggregated at the end.
Running operations in batches is a poor way to achieve throttling. A single long running operation may prolong the completion of a whole batch for no good reason. The correct way to do it is to have a number of worker-tasks (equal to the desired max degree of parallelism), that constantly dequeue and process the items of your workload. This is how it is implemented one of the best tools for the job, the TPL Dataflow library.
I had the same problem with my Monitoring project which retrieves data every 2 second from 100 devices over modbus/tcp.
This article helped me a lot
thank you.