Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader & scoped dependencies concurrent access with parallel execution strategy #3680

Open
br3aker opened this issue Jul 31, 2023 · 6 comments · May be fixed by #3866
Open

DataLoader & scoped dependencies concurrent access with parallel execution strategy #3680

br3aker opened this issue Jul 31, 2023 · 6 comments · May be fixed by #3866
Assignees
Labels
data loader Relates to GraphQL data loaders
Milestone

Comments

@br3aker
Copy link

br3aker commented Jul 31, 2023

Description

DataLoader does not work as expected with scoped dependencies. Documentation provides rather misleading example when scoped dependency can be accessed from the DataLoader, i.e. database access with Entity Framework.

Current server implementation can guarantee single thread access if and only if:

  1. loaderKey is set, i.e. only one dataloader exists for such access
  2. maxBatchSize property is less or equal to dataloader 'load requests', i.e. only one dataloader exists for such access

.Resolve().WithScope() can't help here as their scope is created for entire node execution time, problem is that IDataLoaderResult are resolved after regular node execution -> scope is destroyed -> ObjectDisposedException or any other problems with dependencies from disposed scope.

We also can't use 'root' scope for entire HTTP call, IDataLoaderResult are resolved asynchronously, this call is not awaited at the execution strategy level -> concurrent access exception from Entity Framework DbContext, for example.

The more of a hack than a solution that comes to mind is manually creating scope before the actual resolver execution and disposing it in the dataloader Then chained call. Something like this:

Simple schema for reproductino
public static class FieldBuilderExtensions
{
    public static FieldBuilder<TSourceType, IDataLoaderResult<TReturnType>> ResolveWithDataLoader<TSourceType, TReturnType, T1>(
        this FieldBuilder<TSourceType, IDataLoaderResult<TReturnType>> builder,
        Func<IResolveFieldContext<TSourceType>, DataLoaderContext, T1, IDataLoaderResult<TReturnType>> func)
    {
        Func<IResolveFieldContext<TSourceType>, IDataLoaderResult<TReturnType>> resolver = context =>
        {
            IServiceScope? scope = default;
            try
            {
                scope = context.RequestServices.CreateScope();
                var dataLoader = scope.ServiceProvider
                    .GetRequiredService<IDataLoaderContextAccessor>()
                    .Context ?? throw new InvalidOperationException("Unable to get DataLoader context");

                var funcResult = func(
                    context,
                    dataLoader,
                    scope.ServiceProvider.GetRequiredService<T1>());
                return funcResult
                    .Then(result =>
                    {
                        scope.Dispose();
                        return result;
                    });
            }
            catch (Exception)
            {
                scope?.Dispose();
                throw;
            }
        };

        return builder.Resolve(resolver);
    }
}

Steps to reproduce

Simple schema for reproduction
public class Query : ObjectGraphType<object>
{
    public Query(IDataLoaderContextAccessor accessor, Defer<ScopedDependency> rootScopeData)
    {
        Name = "Query";

        Field<ListGraphType<IntGraphType>>("ints")
            .Argument<string>("id")
            .Resolve(context =>
            {
                var loader = accessor.Context.GetOrAddCollectionBatchLoader<string, int>(
                    "ints",
                    rootScopeData.Value.GetByIdBatchedAsync,
                    maxBatchSize: 2);
                return loader.LoadAsync(context.GetArgument<string>("id"));
            });

        Field<ListGraphType<IntGraphType>>("ints_batchkey")
            .Argument<string>("id")
            .Argument<string>("batchKey")
            .Resolve(context =>
            {
                var loader = accessor.Context.GetOrAddCollectionBatchLoader<string, int>(
                    context.GetArgument<string>("batchKey"),
                    rootScopeData.Value.GetByIdBatchedAsync);
                return loader.LoadAsync(context.GetArgument<string>("id"));
            });
    }
}

public class ScopedDependency : IDisposable
{
    private bool _disposed;

    public async Task<ILookup<string, int>> GetByIdBatchedAsync(IEnumerable<string> ids)
    {
        Console.WriteLine($"GetByIdBatchedAsync start at thread {Environment.CurrentManagedThreadId}");
        await Task.Delay(1000);
        WarnIfUsedDisposed();

        Console.WriteLine($"GetByIdBatchedAsync end at thread {Environment.CurrentManagedThreadId}");
        return ids.ToLookup(x => x, int.Parse);
    }

    public void Dispose()
    {
        _disposed = true;
    }

    private void WarnIfUsedDisposed()
    {
        if (_disposed)
        {
            ObjectDisposedException.ThrowIf(_disposed, this);
        }
    }
}

Expected result

No concurrent access to a single scoped dependency - serial IDataLoaderResult resolution.
OR
Separate scope dependency copy for each dataloader instance - a hook to 'dispose' resolvers which create scope or return IDataLoaderResult.

Actual result

Concurrent scoped dependency access from multiple threads due to asynchronous IDataLoader value resolution

@br3aker
Copy link
Author

br3aker commented Jul 31, 2023

Now I'm actually not so sure about asynchronous dataloader resolution due to this comment:

/// <summary>
/// An abstract asynchronous function to load the values for a given list of keys.
/// None of the keys will be <see langword="null"/>.
/// </summary>
/// <remarks>
/// This may be called on multiple threads if IDataLoader.LoadAsync is called on multiple threads.
/// It will never be called for the same list of items.
/// </remarks>

@Shane32
Copy link
Member

Shane32 commented Jul 31, 2023

If you configure the schema to use serial execution, none of the data loaders will run in parallel with any other code (or each other). See this link:

services.AddGraphQL(b => b
    .AddExecutionStrategy<SerialExecutionStrategy>(GraphQLParser.AST.OperationType.Query)

This is the simplest answer, and you won't have any issues with EF, either within any field resolver or within any data loader. I recommend this if your project uses Entity Framework. If you want to continue using a parallel execution strategy, then as you have experienced, support for service scope is a bit of a manual process. The data loader is going to run outside of the field resolver context, so .WithScoped() will dispose of the service scope before the data loader's fetch function runs. You will need to create a service scope within the data loader 'fetch' delegate, and if using .Then(), you'll also need to create a service scope within the 'Then' delegate.

Note that there is no guarantee that the .Then portion of the code will execute -- either due to an exception occurring or the client disconnecting. So disposing of the service scope within a .Then block is discouraged, although technically the garbage collector should clean up lost service scope references eventually.

@Shane32
Copy link
Member

Shane32 commented Jul 31, 2023

Feel free to provide a PR to update our documentation. All of the docs' source files are here:

https://github.com/graphql-dotnet/graphql-dotnet/tree/master/docs2/site/docs

@Shane32
Copy link
Member

Shane32 commented Jul 31, 2023

No concurrent access to a single scoped dependency - serial IDataLoaderResult resolution.

Provided for by SerialExecutionStrategy

Separate scope dependency copy for each dataloader instance - a hook to 'dispose' resolvers which create scope or return IDataLoaderResult.

No existing mechanism. The execution strategy is not aware of the fact that the resolver has created a service scope (not even when .WithScope() is used), and so there is no practical way to hold the scope and dispose of it later. It simply runs the configured delegate. The data loader design that uses the IDataContextAccessor will never favorably work in this scenario because it uses local variables from the field resolver. So when the scope is disposed, it's going to reference a disposed scope. Finally, the data loader could be registered within DI (as described in the documentation) -- but with .WithScope() it would be pulled from the scope of the field resolver, creating the same problem with the fetch delegate. Registering the data loader within DI as a singleton leads to out-of-memory errors unless caching is turned off.

If it were me, and I wanted parallel execution with scoped data loaders, I'd make a wrapper class like this:

class ScopedBatchDataLoader<TFetcher, TKey, T> : BatchDataLoader<TKey, T>
    where TFetcher : IFetcher<TKey, T>
{ /* ... */ }

interface IFetcher<TKey, T>
{
    Task<IDictionary<TKey, T>> FetchAsync(IEnumerable<TKey> keys, CancellationToken token);
}

and the fetch delegate would do this:

await using var scope = serviceProvider.CreateAsyncScope();
var service = scope.GetRequiredService<TFetcher>();
return await service.FetchAsync(keys, token);

Then I'd register data loader instances as singletons (being sure to have caching turned off), and since they are now singletons, they can be injected directly into the graph type if desired. I wouldn't use the IDataContextAccessor.

@Shane32 Shane32 added the data loader Relates to GraphQL data loaders label Jul 31, 2023
@Shane32
Copy link
Member

Shane32 commented Jul 31, 2023

We also can't use 'root' scope for entire HTTP call, IDataLoaderResult are resolved asynchronously, this call is not awaited at the execution strategy level -> concurrent access exception from Entity Framework DbContext, for example.

With a parallel execution strategy, this analysis is correct. Parallel execution strategy assumes that you are only using singletons or are managing service scopes as-needed, including within data loader fetch delegates.

@br3aker
Copy link
Author

br3aker commented Jul 31, 2023

If you configure the schema to use serial execution, none of the data loaders will run in parallel with any other code

A big no-no - what's the point then? :)

If it were me, and I wanted parallel execution with scoped data loaders, I'd make a wrapper class like this:

What do you mean by 'IDataLoaderContextAccessor will never favorably work in this scenario because it uses local variables from the field resolver'?
As far as I understand, IDataLoaderContextAccessor provides a 'scoped' (per request) storage for data loaders, what's bad in implementation like this:

Field<ListGraphType<IntGraphType>>("ints_batchkey")
    .Argument<string>("id")
    .Argument<string>("batchKey")
    .Resolve(context =>
    {
        var loader = accessor.Context.GetOrAddCollectionBatchLoader<string, int>(
            context.GetArgument<string>("batchKey"),
            async keys =>
            {
                using var scope = context.RequestServices.CreateScope();
                var service = scope.ServiceProvider.GetRequiredService<ScopedDependency>();
                return await service.GetByIdBatchedAsync(keys);
            });

        return loader.LoadAsync(context.GetArgument<string>("id"));
    });

The only problem I see is that this code relies on DI infrastructure. Solution with singleton dataloader instance which fetches data using scoped service(s) defined at the DI level is a lot better, thanks!

I have a question though, why can't IFetcher<TKey, T> be a part of Microsoft.DI project? It simply abstracts DataLoader from specific service(s) implementation, sounds like a generic interface suitable for such project.
I'm certanly planning to participate in some way, Dataloader docs might be a good place to start.

@Shane32 Shane32 linked a pull request Dec 31, 2023 that will close this issue
@Shane32 Shane32 added this to the 8.0 milestone Jan 1, 2024
@Shane32 Shane32 self-assigned this Jan 1, 2024
@Shane32 Shane32 linked a pull request Jan 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data loader Relates to GraphQL data loaders
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants