Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 scan range behaviour is inconsistent #16649

Open
sashahilton00 opened this issue Feb 17, 2023 · 1 comment
Open

S3 scan range behaviour is inconsistent #16649

sashahilton00 opened this issue Feb 17, 2023 · 1 comment

Comments

@sashahilton00
Copy link

The scan range behaviour in minio is not consistent with the official S3 behaviour. This was implemented in PR #14546, and was flagged during the review here, and was acknowledged but unchanged for an undisclosed reason - as it stands the implementation reads up until the end byte specified in the request.

AWS documentation of the feature available here.

Expected Behavior

From AWS docs:

An Amazon S3 Select scan range request runs across the byte range that you specify. A record that starts within the scan range specified but extends beyond the scan range will be processed by the query.

Current Behavior

Minio returns the range of bytes specified in the scan range.

Instead, if starting the read part way through a record, it should read until the delimiter, discard the partial record, then begin reading the remaining bytes, returning any records found in the remaining bytes. If the final byte is a delimiter, it should return at this point, otherwise it should continue the scan until the next delimiter is reached and return after that.

Possible Solution

The changes would need to be made to internal/s3select/select.go

Steps to Reproduce (for bugs)

Make any select-object-content request with a scan range where the end byte is in the middle of a record, and observe the partial response.

Context

More or less a requirement for dealing with large CSV files as chunks. Incorrect implementation makes the select feature materially less useful.

Regression

No

@stale
Copy link

stale bot commented Mar 25, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 15 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants