Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureBlobTarget: use file basename for the temp download_file_location #3009

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

vbarbaresi
Copy link
Contributor

Motivation and context

When the target blob is inside a path (for instance "path/to/target") and we use download_when_reading=True,
the directories don't exist and reading the target fails with:

  File "src/luigi/luigi/contrib/azureblob.py", line 194, in __enter__
    self.client.download_as_file(self.container, self.blob, self.download_file_location)
  File "src/luigi/luigi/contrib/azureblob.py", line 101, in download_as_file
    return self.connection.get_blob_to_path(container, blob, location)
  File "src/luigi/venv/lib/python3.6/site-packages/azure/storage/blob/baseblobservice.py", line 1765, in get_blob_to_path
    with open(file_path, open_mode) as stream:
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/7q/l5knvjqx3pg569hwsdrzjw480000gn/T/2020-10-12 14:20:01.950869689rh5q2/path/to/movie-cheesy.txt'

Description

Use os.path.basename(blob) to keep only the blob name as file name instead of the full path (it's written in a temporary directory anyway)

I changed a test task target to "path/to/movie-cheesy.txt"instead of "movie-cheesy.txt"
I added a test that reproduces the issue.

When the target blob is inside a path (for instance "path/to/target") and we use `download_when_reading=True`,
the directory layout doesn't exist and it fails with:

```
  File "src/luigi/luigi/contrib/azureblob.py", line 194, in __enter__
    self.client.download_as_file(self.container, self.blob, self.download_file_location)
  File "src/luigi/luigi/contrib/azureblob.py", line 101, in download_as_file
    return self.connection.get_blob_to_path(container, blob, location)
  File "src/luigi/venv/lib/python3.6/site-packages/azure/storage/blob/baseblobservice.py", line 1765, in get_blob_to_path
    with open(file_path, open_mode) as stream:
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/7q/l5knvjqx3pg569hwsdrzjw480000gn/T/2020-10-12 14:20:01.950869689rh5q2/path/to/movie-cheesy.txt'
```

Use `os.path.basename(blob)` to keep only the blob name as file name instead of the full path (it's written in a temporary directory anyway)
dlstadther
dlstadther previously approved these changes Oct 13, 2020
@vbarbaresi
Copy link
Contributor Author

CI failed with:

Azure.common.AzureConflictHttpError: Conflict

<?xml version="1.0" encoding="utf-8"?><Error><Code>LeaseIdMismatchWithLeaseOperation</Code><Message>The lease ID specified did not match the lease ID for the blob/container.</Message></Error>

I don't know why, it's just during a file upload
By the way, I noticed that task output in luigi-test container are persistent. So after the test ran once, it won't re-run the test tasks again because the output exist.

I also noticed that numpy is not installed by tox, this error happens during the test:
ModuleNotFoundError: No module named 'numpy'
But it's unrelated to the failed test

@vbarbaresi
Copy link
Contributor Author

gentle bump on this, would it be possible to re-run CI? I'd like to see if the failing test still happens before trying to solve it
Unfortunately I don't have access to the blob storage container used in the CI, so it will be hard to debug.
Tests passed on my own container

@dlstadther
Copy link
Collaborator

restarted ci tests

@vbarbaresi
Copy link
Contributor Author

thanks, I was able to reproduce the failing test using Azurite locally and will work on a fix.
It seems to come from some weird behavior on Azurite when reuploading the same file. It doesn't happen on a real Azure container

@vbarbaresi
Copy link
Contributor Author

I reproduced the issue with Azurite, the failure seemed to be a bug with the lease when re-using the same blob name

I upgraded Azurite Docker image in the CI scripts from the deprecated https://github.com/arafato/azurite to the official https://github.com/azure/azurite and the issue is gone

@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added the wontfix label Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants