Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data is not being totally written with append using awswrangler to_deltalake with multiple lambdas running in parallel #2771

Open
camposvinicius opened this issue Apr 11, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@camposvinicius
Copy link

camposvinicius commented Apr 11, 2024

Describe the bug

We created an empty delta table with pyspark so that it can be appended with awswrangler's to_deltalake method with lambdas running in parallel. But when we look at cloudwatch there is no error, and some data is written and others are not, without there really being any error.

How to Reproduce

wr.s3.to_deltalake(
            df=data,
            path="s3://bucket/delta",
            index=False,
            partition_cols=["a", "b"],
            overwrite_schema=False,
            s3_additional_kwargs={
                "AWS_ACCESS_KEY_ID": "...",
                "AWS_SECRET_ACCESS_KEY": "...",
                "AWS_REGION": "eu-west-1",
            },
            s3_allow_unsafe_rename=True,
            mode='append'
        )

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.8

AWS SDK for pandas version

3.6.0

Additional context

No response

@camposvinicius camposvinicius added the bug Something isn't working label Apr 11, 2024
@camposvinicius camposvinicius changed the title Data is not being written with append using awswrangler delta with multiple lambdas running in parallel Data is not being totally written with append using awswrangler delta with multiple lambdas running in parallel Apr 11, 2024
@camposvinicius camposvinicius changed the title Data is not being totally written with append using awswrangler delta with multiple lambdas running in parallel Data is not being totally written with append using awswrangler to_deltalake with multiple lambdas running in parallel Apr 11, 2024
@LeonLuttenberger
Copy link
Contributor

Hey,

When s3_allow_unsafe_rename is set to True, consistency will not be enforced between different simultaneous write operations. In order to make use of the locking mechanism, a DynamoDB table needs to be created and passed using the lock_dynamodb_table argument. More details can be found in the to_deltalake documentation.

Best regards,
Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants