Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Optimizer Annotate not working when used in nested query #549

Open
jaydensmith opened this issue Jun 11, 2024 · 11 comments
Open

Query Optimizer Annotate not working when used in nested query #549

jaydensmith opened this issue Jun 11, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@jaydensmith
Copy link
Contributor

jaydensmith commented Jun 11, 2024

Describe the Bug

Consider a schema like the following.

@strawberry.django.type(models.Product)
class ProductType(relay.Node):
    id: auto
    name: auto
    price: auto
    tax_rate: auto
    group: 'GroupType'


@strawberry.django.type(models.Group)
class GroupType(relay.Node):
    id: auto
    name: auto
    product_count: int = strawberry.django.field(
        annotate=Count('products')
    )

Querying the list of groups including the productCount field works as expected, but when querying the group like the following, there is a database error Cannot resolve keyword 'products' into field. Choices are: name, price, tax_rate, group_id, id:

query GetItems {
    products {
        edges {
            node {
                id
                name
                price
                taxRate
                group {
                    id
                    name
                    productCount
                }
            }
        }
    }
}

System Information

  • Operating system: MacOS
  • Strawberry version (if applicable): 0.234.2

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@jaydensmith jaydensmith added the bug Something isn't working label Jun 11, 2024
@bellini666
Copy link
Member

Thanks for the report and the example! Will take a look at that during the week :)

For curiosity, did that work before the nested optimizer feature from https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.44.0 or the issue predates that?

@jaydensmith
Copy link
Contributor Author

Thanks for the report and the example! Will take a look at that during the week :)

For curiosity, did that work before the nested optimizer feature from https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.44.0 or the issue predates that?

Thanks @bellini666, I do not believe so. I updated to 0.44.0 to see if that fixed the issue. I first noticed the issue on 0.43.0.

@bellini666
Copy link
Member

@jaydensmith can you test with 0.41.1 to see if the issue still happens? Because https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.42.0 changed the way we collect sub-fields, so I want to make sure that it is not a regression from that release as well

@jaydensmith
Copy link
Contributor Author

@bellini666 no worries, I just tested and the issue is present on that version too.

@bellini666
Copy link
Member

@bellini666 no worries, I just tested and the issue is present on that version too.

Ahh good to know that it is not a regression, it is probably a corner case that was not considered when annotate was implemented :)

Will take a look at that soon in the next couple of days.

@bellini666
Copy link
Member

Hey @jaydensmith ,

Was investigating this and I think what is going to...

The issue is the join in the group. What the optimizer is trying to do in your case is:

Product.objects.all().select_related(
    "group",
).annotate(
    products_count=Count("group__products"),
)

If you look at that queryset, there's no way currently for Django to annotate products_count inside product.group object, the annotation will end up inside product.

A way to workaround this would be to convert the select related to a prefetch related when it is nested and contains an annotate config. That way it would look like this:

Product.objects.all().prefetch_related(
    Prefetch(
        "group",
        Group.objects.all().annotate(
            products_count=Count("products"),
        )
    )
)

The downside of this is that instead of a simple join, now we need an extra query. Which is not a big deal, but something to be aware of

I'll need some more time on how to do that properly, as it will require some more complex changes to the optimizer.

In the meantime, your best bet is to use data loaders to solve situations like this.

Let me know if you have any doubts/issues regarding that

@diesieben07
Copy link

It would be great if in the meantime the Optimizer could throw a descriptive error message when this happens instead of producing an invalid QuerySet.

@bellini666
Copy link
Member

It would be great if in the meantime the Optimizer could throw a descriptive error message when this happens instead of producing an invalid QuerySet.

Great callout!

@jaydensmith
Copy link
Contributor Author

Hey @jaydensmith ,

Was investigating this and I think what is going to...

The issue is the join in the group. What the optimizer is trying to do in your case is:

Product.objects.all().select_related(
    "group",
).annotate(
    products_count=Count("group__products"),
)

If you look at that queryset, there's no way currently for Django to annotate products_count inside product.group object, the annotation will end up inside product.

A way to workaround this would be to convert the select related to a prefetch related when it is nested and contains an annotate config. That way it would look like this:

Product.objects.all().prefetch_related(
    Prefetch(
        "group",
        Group.objects.all().annotate(
            products_count=Count("products"),
        )
    )
)

The downside of this is that instead of a simple join, now we need an extra query. Which is not a big deal, but something to be aware of

I'll need some more time on how to do that properly, as it will require some more complex changes to the optimizer.

In the meantime, your best bet is to use data loaders to solve situations like this.

Let me know if you have any doubts/issues regarding that

I'm not sure if there's a way to resolve the main issue, but my problem with the prefetch related solution is that it makes the types a lot less versatile if I have to consider how the data is fetched whenever nesting it on another type. If there was a way to make the annotate function work, even if it resulted in another query, that would be preferable to me.

@diesieben07
Copy link

I think what @bellini666 was saying is that the optimizer should automatically convert a select_related into a prefetch_related if it detects this situation. For you as the user it would be transparent, the optimizer would issue the best query possible given the circumstances.

@bellini666
Copy link
Member

I'm not sure if there's a way to resolve the main issue, but my problem with the prefetch related solution is that it makes the types a lot less versatile if I have to consider how the data is fetched whenever nesting it on another type. If there was a way to make the annotate function work, even if it resulted in another query, that would be preferable to me.

The idea is exactly what @diesieben07 said

In the meantime, I think data loaders would be the way to go for you. E.g.

@strawberry.django.type(models.Group)
class GroupType(relay.Node):
    id: auto
    name: auto

    @strawberry_django.field(only=["pk"])
    async def product_count(self, root: Group, info:Info):
        return await info.context.data_loaders.load_product_count(root.pk)

# and the dataloader definition
async def load_products_count(pks: list[int]) -> list[int]:
    count_dict = {}

    async for group in Group.objects.filter(
        pk__in=pks,
    ).only(
        "pk",
    ).annotate(
        product_count=Count('products'),
    ):
        count_dict[group.pk] = group.product_count

    return [count_dict.get(pk, 0) for pk in pks]

If you need help with that, feel free to ping me in our discord channel and I can help you with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants