By Darthg8r


2012-07-19 15:47:16 8 Comments

In a metro app, I need to execute a number of WCF calls. There are a significant number of calls to be made, so I need to do them in a parallel loop. The problem is that the parallel loop exits before the WCF calls are all complete.

How would you refactor this to work as expected?

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customers = new  System.Collections.Concurrent.BlockingCollection<Customer>();

Parallel.ForEach(ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

foreach ( var customer in customers )
{
    Console.WriteLine(customer.ID);
}

Console.ReadKey();

9 comments

@Vitaliy Ulantikov 2017-11-17 15:38:22

After introducing a bunch of helper methods, you will be able run parallel queries with this simple syntax:

const int DegreeOfParallelism = 10;
IEnumerable<double> result = await Enumerable.Range(0, 1000000)
    .Split(DegreeOfParallelism)
    .SelectManyAsync(async i => await CalculateAsync(i).ConfigureAwait(false))
    .ConfigureAwait(false);

What happens here is: we split source collection into 10 chunks (.Split(DegreeOfParallelism)), then run 10 tasks each processing its items one by one (.SelectManyAsync(...)) and merge those back into a single list.

Worth mentioning there is a simpler approach:

double[] result2 = await Enumerable.Range(0, 1000000)
    .Select(async i => await CalculateAsync(i).ConfigureAwait(false))
    .WhenAll()
    .ConfigureAwait(false);

But it needs a precaution: if you have a source collection that is too big, it will schedule a Task for every item right away, which may cause significant performance hits.

Extension methods used in examples above look as follows:

public static class CollectionExtensions
{
    /// <summary>
    /// Splits collection into number of collections of nearly equal size.
    /// </summary>
    public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, int slicesCount)
    {
        if (slicesCount <= 0) throw new ArgumentOutOfRangeException(nameof(slicesCount));

        List<T> source = src.ToList();
        var sourceIndex = 0;
        for (var targetIndex = 0; targetIndex < slicesCount; targetIndex++)
        {
            var list = new List<T>();
            int itemsLeft = source.Count - targetIndex;
            while (slicesCount * list.Count < itemsLeft)
            {
                list.Add(source[sourceIndex++]);
            }

            yield return list;
        }
    }

    /// <summary>
    /// Takes collection of collections, projects those in parallel and merges results.
    /// </summary>
    public static async Task<IEnumerable<TResult>> SelectManyAsync<T, TResult>(
        this IEnumerable<IEnumerable<T>> source,
        Func<T, Task<TResult>> func)
    {
        List<TResult>[] slices = await source
            .Select(async slice => await slice.SelectListAsync(func).ConfigureAwait(false))
            .WhenAll()
            .ConfigureAwait(false);
        return slices.SelectMany(s => s);
    }

    /// <summary>Runs selector and awaits results.</summary>
    public static async Task<List<TResult>> SelectListAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector)
    {
        List<TResult> result = new List<TResult>();
        foreach (TSource source1 in source)
        {
            TResult result1 = await selector(source1).ConfigureAwait(false);
            result.Add(result1);
        }
        return result;
    }

    /// <summary>Wraps tasks with Task.WhenAll.</summary>
    public static Task<TResult[]> WhenAll<TResult>(this IEnumerable<Task<TResult>> source)
    {
        return Task.WhenAll<TResult>(source);
    }
}

@Jay Shah 2018-05-09 22:46:16

An extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxDegreeOfParallelism = null)
    {
        if (maxDegreeOfParallelism.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

Sample Usage:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);

@ofcoursedude 2014-11-18 11:55:43

Wrap the Parallel.Foreach into a Task.Run() and instead of the await keyword use [yourasyncmethod].Result

(you need to do the Task.Run thing to not block the UI thread)

Something like this:

var yourForeachTask = Task.Run(() =>
        {
            Parallel.ForEach(ids, i =>
            {
                ICustomerRepo repo = new CustomerRepo();
                var cust = repo.GetCustomer(i).Result;
                customers.Add(cust);
            });
        });
await yourForeachTask;

@ygoe 2015-06-17 18:22:41

What's the problem with this? I'd have done it exactly like this. Let Parallel.ForEach do the parallel work, which blocks until all are done, and then push the whole thing to a background thread to have a responsive UI. Any issues with that? Maybe that's one sleeping thread too much, but it's short, readable code.

@Gusdor 2016-03-30 13:31:43

@LonelyPixel My only issue is that it calls Task.Run when TaskCompletionSource is preferable.

@Seafish 2016-07-13 14:34:08

@Gusdor Curious - why is TaskCompletionSource preferable?

@Gusdor 2016-07-13 15:48:47

@Seafish A good question that I wish I could answer. Must have been a rough day :D

@ygoe 2017-03-01 20:21:17

Just a short update. I was looking for exactly this now, scrolled down to find the simplest solution and found my own comment again. I used exactly this code and it works as expected. It only assumes that there is a Sync version of the original Async calls within the loop. await can be moved in the front to save the extra variable name.

@svick 2012-07-19 16:32:41

The whole idea behind Parallel.ForEach() is that you have a set of threads and each thread processes part of the collection. As you noticed, this doesn't work with async-await, where you want to release the thread for the duration of the async call.

You could “fix” that by blocking the ForEach() threads, but that defeats the whole point of async-await.

What you could do is to use TPL Dataflow instead of Parallel.ForEach(), which supports asynchronous Tasks well.

Specifically, your code could be written using a TransformBlock that transforms each id into a Customer using the async lambda. This block can be configured to execute in parallel. You would link that block to an ActionBlock that writes each Customer to the console. After you set up the block network, you can Post() each id to the TransformBlock.

In code:

var ids = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

var getCustomerBlock = new TransformBlock<string, Customer>(
    async i =>
    {
        ICustomerRepo repo = new CustomerRepo();
        return await repo.GetCustomer(i);
    }, new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
    });
var writeCustomerBlock = new ActionBlock<Customer>(c => Console.WriteLine(c.ID));
getCustomerBlock.LinkTo(
    writeCustomerBlock, new DataflowLinkOptions
    {
        PropagateCompletion = true
    });

foreach (var id in ids)
    getCustomerBlock.Post(id);

getCustomerBlock.Complete();
writeCustomerBlock.Completion.Wait();

Although you probably want to limit the parallelism of the TransformBlock to some small constant. Also, you could limit the capacity of the TransformBlock and add the items to it asynchronously using SendAsync(), for example if the collection is too big.

As an added benefit when compared to your code (if it worked) is that the writing will start as soon as a single item is finished, and not wait until all of the processing is finished.

@Norman H 2013-09-13 11:04:54

A very brief overview of async, reactive extensions, TPL and TPL DataFlow - vantsuyoshi.wordpress.com/2012/01/05/… for those like myself who might need some clarity.

@JasonLind 2015-12-16 22:23:26

I'm pretty sure this answer does NOT parallelize the processing. I believe you need to do a Parallel.ForEach over the ids and post those to the getCustomerBlock. At least that's what I found when I tested this suggestion.

@svick 2015-12-16 22:26:02

@JasonLind It really does. Using Parallel.ForEach() to Post() items in parallel shouldn't have any real effect.

@JasonLind 2015-12-16 22:35:02

@svick Ok I found it, The ActionBlock also needs to be in Parallel. I was doing it slightly differently, I didn't need a transform so I just used a bufferblock and did my work in the ActionBlock. I got confused from another answer on the interwebs.

@JasonLind 2015-12-16 22:49:07

By which I mean specifying MaxDegreeOfParallelism on the ActionBlock like you do on the TransformBlock in your example

@Liam - Reinstate Monica 2018-07-13 08:25:58

You need to include the System.Threading.Tasks.Dataflow NuGet package to run this

@Serge Semenov 2017-06-19 20:28:54

You can save effort with the new AsyncEnumerator NuGet Package, which didn't exist 4 years ago when the question was originally posted. It allows you to control the degree of parallelism:

using System.Collections.Async;
...

await ids.ParallelForEachAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
},
maxDegreeOfParallelism: 10);

Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.

@MiFreidgeim SO-stop being evil 2017-12-16 22:55:04

Sergey, you should disclose that you are an author of the library

@Serge Semenov 2018-02-24 18:01:23

ok, added the disclaimer. I'm not seeking any benefit from advertising it, just want to help people ;)

@Corniel Nobel 2018-06-30 10:02:11

Your library isn't compatible with .NET Core.

@Serge Semenov 2018-06-30 15:34:13

@CornielNobel, it is compatible with .NET Core - the source code on GitHub has a test coverage for both .NET Framework and .NET Core.

@Serge Semenov 2018-06-30 16:54:32

I bet that was AsyncEnumerable instead of AsyncEnumerator :)

@WBuck 2019-10-09 11:49:23

@SergeSemenov I've used your library a lot for its AsyncStreams and I've got to say it's excellent. Can't recommend this library enough.

@Serge Semenov 2019-10-09 16:17:27

I'm glad that you found it helpful @WBuck!

@Teoman shipahi 2016-11-30 16:30:38

I am a little late to party but you may want to consider using GetAwaiter.GetResult() to run your async code in sync context but as paralled as below;

 Parallel.ForEach(ids, i =>
{
    ICustomerRepo repo = new CustomerRepo();
    // Run this in thread which Parallel library occupied.
    var cust = repo.GetCustomer(i).GetAwaiter().GetResult();
    customers.Add(cust);
});

@John Gietzen 2014-12-05 21:48:55

This should be pretty efficient, and easier than getting the whole TPL Dataflow working:

var customers = await ids.SelectAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    return await repo.GetCustomer(i);
});

...

public static async Task<IList<TResult>> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector, int maxDegreesOfParallelism = 4)
{
    var results = new List<TResult>();

    var activeTasks = new HashSet<Task<TResult>>();
    foreach (var item in source)
    {
        activeTasks.Add(selector(item));
        if (activeTasks.Count >= maxDegreesOfParallelism)
        {
            var completed = await Task.WhenAny(activeTasks);
            activeTasks.Remove(completed);
            results.Add(completed.Result);
        }
    }

    results.AddRange(await Task.WhenAll(activeTasks));
    return results;
}

@Paccc 2014-12-14 04:02:25

Shouldn't the usage example use await like: var customers = await ids.SelectAsync(async i => { ... });?

@Ohad Schneider 2014-09-16 19:37:14

Using DataFlow as svick suggested may be overkill, and Stephen's answer does not provide the means to control the concurrency of the operation. However, that can be achieved rather simply:

public static async Task RunWithMaxDegreeOfConcurrency<T>(
     int maxDegreeOfConcurrency, IEnumerable<T> collection, Func<T, Task> taskFactory)
{
    var activeTasks = new List<Task>(maxDegreeOfConcurrency);
    foreach (var task in collection.Select(taskFactory))
    {
        activeTasks.Add(task);
        if (activeTasks.Count == maxDegreeOfConcurrency)
        {
            await Task.WhenAny(activeTasks.ToArray());
            //observe exceptions here
            activeTasks.RemoveAll(t => t.IsCompleted); 
        }
    }
    await Task.WhenAll(activeTasks.ToArray()).ContinueWith(t => 
    {
        //observe exceptions in a manner consistent with the above   
    });
}

The ToArray() calls can be optimized by using an array instead of a list and replacing completed tasks, but I doubt it would make much of a difference in most scenarios. Sample usage per the OP's question:

RunWithMaxDegreeOfConcurrency(10, ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

EDIT Fellow SO user and TPL wiz Eli Arbel pointed me to a related article from Stephen Toub. As usual, his implementation is both elegant and efficient:

public static Task ForEachAsync<T>(
      this IEnumerable<T> source, int dop, Func<T, Task> body) 
{ 
    return Task.WhenAll( 
        from partition in Partitioner.Create(source).GetPartitions(dop) 
        select Task.Run(async delegate { 
            using (partition) 
                while (partition.MoveNext()) 
                    await body(partition.Current).ContinueWith(t => 
                          {
                              //observe exceptions
                          });

        })); 
}

@Stefanvds 2015-07-30 07:29:48

Love the Eli Arbel option. Have a few follow-up questions: I'd love to keep track of progress. I added a ref int done to the method, and then ContinueWith done++ but "Cannot use ref or out parameter inside an anonymous method, lambda expression, or query expression... any idea how to track progress?

@Stefanvds 2015-07-30 07:43:38

nevermind, i can just stick done++ in the foreachasync code

@Shaamaan 2015-11-13 15:04:34

I found the ForEachAsync code doesn't work as expected. At least, not always, for some reason or other (I cannot currently explain what's going on). With a dop = 5 I'd get different results when calling the code (I should always get the same - the data isn't changed)! Beware!

@xx1xx 2016-09-23 01:16:49

Eli Arbel's option is nice to read, but the RunWithMaxDegreeOfConcurrency implementation will run faster. This is because Eli's splits the tasks upfront, so if some tasks run for longer in 1 partition, you still have to wait for the slowest partition to finishing running before the whole process is completed. RunWithMaxDegreeOfConcurrency runs in a chain fashion, so it should be the fastest to complete. Because the tasks are not partitioned upfront.

@Ohad Schneider 2016-10-01 23:07:35

@RichardPierre actually this overload of Partitioner.Create uses chunk partitioning, which provides elements dynamically to the different tasks so the scenario you described will not take place. Also note that static (pre-determined) partitioning may be faster in some cases due to less overhead (specifically synchronization). For more information see: msdn.microsoft.com/en-us/library/dd997411(v=vs.110).aspx.

@Terry 2016-10-10 22:05:28

@OhadSchneider In the // observe exceptions, if that throws an exception, will it bubble up to the caller? For example, if I wanted the entire enumerable to stop processing/fail if any part of it failed?

@Ohad Schneider 2016-10-11 15:14:48

@Terry it will bubble up to the caller in the sense that the top-most task (created by Task.WhenAll) will contain the exception (inside an AggregateException), and consequentially if said caller used await, an exception would be thrown in the call site. However, Task.WhenAll will still wait for all tasks to complete, and GetPartitions will dynamically allocate elements when partition.MoveNext is called until no more elements are left to process. This means that unless you add your own mechanism for stopping the processing (e.g. CancellationToken) it won't happen on its own.

@Mark Gibbons 2017-08-28 01:03:33

Following on from what @OhadSchneider said above, it is important to note this implementation will tend to slow down over time and get closer and closer to running synchronously because you are going to be limited by the slowest running Task in the partition.

@Mark Gibbons 2017-08-28 03:33:30

@OhadSchneider let's say most of the time the tasks take roughly the same amount of time to complete - 1 second. But occasionally it takes 5 seconds. If you choose a degree of parallelism of 4, it will execute 4 of those tasks at the same time, and they should be finished about a second later. But that occasional 5 second task will mean that it won't start another 3 tasks (for the 3 that completed after 1 second) until that 5 second task is finished. So depending on how often a single task runs much longer than the others, you will be blocking execution for the longest running task.

@Ohad Schneider 2017-08-28 17:23:38

@gibbocool I'm still not sure I follow. Suppose you have a total of 7 tasks, with the parameters you specified in your comment. Further suppose that the first batch takes the occasional 5 second task, and three 1 second tasks. After about a second, the 5-second task will still be executing whereas the three 1-second tasks will be finished. At this point the remaining three 1-second tasks will start executing (they would be supplied by the partitioner to the three "free" threads) .

@MiFreidgeim SO-stop being evil 2017-12-19 03:27:19

How can I observe exceptions in ForEachAsync? If body function failed for customer Id , can I log error, specifying, that it happened for particular Id?

@Ohad Schneider 2017-12-26 15:22:03

@MichaelFreidgeim Not sure what you mean. You can add whatever code you want where it says //observe exceptions, including exception handling code (check t.Exception). I assume your customer ID would be embedded in partition.Current.

@MiFreidgeim SO-stop being evil 2017-12-26 22:20:46

But t.Exception doesn’t include partition.Current nor customer Id, so I can’t determine, which id caused exception

@Ohad Schneider 2017-12-27 11:21:33

@MichaelFreidgeim you can do something like var current = partition.Current before await body and then use current in the continuation (ContinueWith(t => { ... }).

@MiFreidgeim SO-stop being evil 2017-12-29 07:34:32

Thanks, I’ve asked subsequent question stackoverflow.com/questions/48019075/…, will appreciate your opinion.

@Stephen Cleary 2012-07-19 16:47:05

svick's answer is (as usual) excellent.

However, I find Dataflow to be more useful when you actually have large amounts of data to transfer. Or when you need an async-compatible queue.

In your case, a simpler solution is to just use the async-style parallelism:

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

var customerTasks = ids.Select(i =>
  {
    ICustomerRepo repo = new CustomerRepo();
    return repo.GetCustomer(i);
  });
var customers = await Task.WhenAll(customerTasks);

foreach (var customer in customers)
{
  Console.WriteLine(customer.ID);
}

Console.ReadKey();

@svick 2012-07-19 16:50:56

If you wanted to manually limit parallelism (which you most likely do in this case), doing it this way would be more complicated.

@svick 2012-07-19 16:53:10

But you're right that Dataflow can be quite complicated (for example when compared with Parallel.ForEach()). But I think it's currently the best option to do almost any async work with collections.

@James Manning 2012-07-19 18:36:34

@svick - IMHO it's a function of whether you're trying to set up 'streams' of processing (use Dataflow) or just parallelizing the processing of a collection (Parallel.ForEach, or create collections of Tasks and WhenAll on them as Stephen does here). If you're not setting up blocks to run for an extended period of time, Dataflow feels like overkill IMHO. :)

@James Manning 2012-07-19 23:29:53

ParallelOptions lets you limit parallelism, FWIW, in case others run across this thread and don't already know about it

@Ohad Schneider 2014-09-15 21:17:42

@JamesManning how is ParallelOptions going to help? It's only applicable to Parallel.For/ForEach/Invoke, which as the OP established are of no use here.

@Shyju 2016-05-04 18:02:12

@StephenCleary If the GetCustomer method is returning a Task<T>, Should one be using Select(async i => { await repo.GetCustomer(i);}); ?

@Stephen Cleary 2016-05-04 20:07:51

@Shyju: No, you should use Select(i => repo.GetCustomer(i)).

@Shyju 2016-05-04 20:08:26

Thanks. Just wanted to confirm.

@batmaci 2016-12-06 13:47:18

@StephenCleary why shouldnt we use async if it supports async? I thought you should go async all the way. does it not apply in this case? I have an old function just reviewed i was doing it with async and Its been working just fine all the time.

@Stephen Cleary 2016-12-06 15:03:10

@batmaci: Parallel.ForEach doesn't support async.

@batmaci 2016-12-06 15:07:31

@StephenCleary ah, I thought your comment is for using WhenAll

@MikeT 2019-09-04 12:15:24

if you care about degree of Parallelism then you can do ids.AsParallel().WithDegreeOfParallelism(15).Select(async i => await Task);

@Stephen Cleary 2019-09-04 17:24:08

@MikeT: That will not work as expected. PLINQ doesn't understand asynchronous tasks, so that code will parallelize only the starting of the async lambda.

Related Questions

Sponsored Content

17 Answered Questions

[SOLVED] Using async/await with a forEach loop

5 Answered Questions

[SOLVED] Parallel foreach with asynchronous lambda

4 Answered Questions

[SOLVED] How can I limit Parallel.ForEach?

21 Answered Questions

[SOLVED] How and when to use ‘async’ and ‘await’

5 Answered Questions

[SOLVED] Using async/await for multiple tasks

2 Answered Questions

[SOLVED] Parallel execution of a loop that uses async

2 Answered Questions

[SOLVED] Track progress when using Parallel.ForEach

Sponsored Content