Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxi-trip-execute.sh has poor performance and is duplicated six times #486

Open
raykrueger opened this issue Apr 4, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@raykrueger
Copy link
Contributor

Description

The spark-k8s-operator example rely on the taxi-trip-execute.sh script to populate test data in an S3 bucket. The current version of this script downloads a 36M file locally and duplicates it 100 times (3.6gb). It then uploads those copies to an S3 bucket. If this script is run from an EC2 instance it takes about 2 minutes. Run from a laptop on wifi, this takes forever.

Rather than doing a sync from local to S3 we can upload the file (local to s3) one time. Then do background S3 to S3 copies in about 40seconds without using the local wifi network at all.

Additionally, there are currently 6 duplicate copies of this script.

  • [ x] ✋ I have searched the open/closed issues and my issue is not listed.
@raykrueger raykrueger changed the title taxi-trip-execute.sh has poor performance and is duplicated many times taxi-trip-execute.sh has poor performance and is 6 many times Apr 4, 2024
@raykrueger raykrueger changed the title taxi-trip-execute.sh has poor performance and is 6 many times taxi-trip-execute.sh has poor performance and is duplicated six times Apr 4, 2024
@vara-bonthu vara-bonthu added the enhancement New feature or request label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants