Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DescribeTable call hidden in read_items #2396

Open
mooreniemi opened this issue Jul 18, 2023 · 3 comments
Open

DescribeTable call hidden in read_items #2396

mooreniemi opened this issue Jul 18, 2023 · 3 comments

Comments

@mooreniemi
Copy link

I'm using read_items but inside that call is another API call. Can this be avoided?

table_key_schema = get_table(table_name=table_name, boto3_session=boto3_session).key_schema

To:

To:

dynamodb_table = dynamodb_resource.Table(table_name)

When I enable debug logging (boto3.set_stream_logger(name='botocore') and call table.key_schema I see:

2023-07-18 09:19:42,633 botocore.hooks [DEBUG] Event request-created.dynamodb.DescribeTable: calling handler <function add_retry_headers at 0x1a316ec20>

This means every read_items call is doing this extra lookup. Could an option be added to provide the key_schema explicitly? Only so many items can be provided to read_items (100 max) so it's guaranteed that for someone using this library they're going to make this extra call 1% of their requests - if they're doing massive read traffic in the millions this adds up.

@LeonLuttenberger
Copy link
Contributor

Hey,

More than 100 items can be returned by read_items, so this extra call to DescribeTable shouldn't happen that frequently.

However, I do see the case for allowing a customer to provide key_schema manually. I will discuss this with the rest of the team.

Best regards,
Leon

@jaidisido
Copy link
Contributor

@mooreniemi can you expand on this please:

Only so many items can be provided to read_items (100 max)

as Leon mentioned read_items is not limited in size and returns as much as the call reads

@a-slice-of-py
Copy link
Contributor

a-slice-of-py commented Jul 28, 2023

@mooreniemi can you expand on this please:

Only so many items can be provided to read_items (100 max)

as Leon mentioned read_items is not limited in size and returns as much as the call reads

Jump in just to (hopefully) clarify what @mooreniemi is referring to.

The usage of wr.dynamodb.read_items is actually bounded by the intrinsic limit of underlying boto3 client iff explicitly provided with a list of partition/sort values with more than 100 items, as stated here

A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items.

A possible workaround is to implement a sort of manual pagination, and this might be addressed directly into aws-sdk-pandas itself, something like

CHUNK_SIZE = 100

partition_values = ... # list with more than 100 items
counter = 0
items = []

for _ in itertools.count():
    _partition_values = partition_values[counter : counter + CHUNK_SIZE]
    if _partition_values:
        items.extend(
            wr.dynamodb.read_items(
                table_name=table_name,
                partition_values=_partition_values,
            )
        )
        counter += CHUNK_SIZE
    else:
        break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants