Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activating MINIO_API_SELECT_PARQUET not works #19721

Closed
masalinas opened this issue May 10, 2024 · 4 comments
Closed

Activating MINIO_API_SELECT_PARQUET not works #19721

masalinas opened this issue May 10, 2024 · 4 comments
Labels
community incomplete info Incomplete information

Comments

@masalinas
Copy link

masalinas commented May 10, 2024

NOTE

If this case is urgent, please subscribe to Subnet so that our 24/7 support team may help you faster.

I have and active Minio tenant. I activate the environment variable MINIO_API_SELECT_PARQUET from Minio Operator (see the capture) to read parquet files. But when try to parse the file from a python sample minio response:

An error occurred (InternalError) when calling the SelectObjectContent operation (reached max retries: 4): We encountered an internal error, please try again.: cause(parquet format parsing not enabled on server)

Captura de pantalla 2024-05-10 a las 23 12 05

This is the piece of code in python:

import boto3

s3 = boto3.client('s3',
                  endpoint_url='https://localhost:9000',
                  aws_access_key_id='BsvW9jlpYX8TvD9F',
                  aws_secret_access_key='HrGdJapKsXbKEcXABWNQ2CO15v3y9MMk',
                  verify=False,
                  region_name='us-east-1')

r = s3.select_object_content(
    Bucket='uniovi',
    Key='sample.parquet',
    ExpressionType='SQL',
    Expression="select * from s3object",
    InputSerialization={'Parquet': {}},
    OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print(statsDetails['BytesScanned'])
        print("Stats details bytesProcessed: ")
        print(statsDetails['BytesProcessed'])

Expected Behavior

Read parquet files

Current Behavior

I can not read parquet files

Your Environment

  • Version used (minio --version): Minio Operator 5.0.12
  • Server setup and configuration: minikube 0.041
  • Operating System and version (uname -a): Mac sonoma
  • Python Boto3 dependency
@jiuker
Copy link
Contributor

jiuker commented May 12, 2024

Post your pod yaml here please. @masalinas

@masalinas
Copy link
Author

@jiuker I used the helm chart to deploy minio operator in minikube like this

$ helm install \
  --namespace minio-operator \
  --create-namespace \
  operator minio-operator/operator

The problem is that when I created the minio tenant from the operator console I didn't add the MINIO_API_SELECT_PARQUET environment variable like this:

Captura de pantalla 2024-05-12 a las 20 56 52

But later editing the tenant I added this env variable like this, but the tenant not change, maybe the unique way to activate the parquet is removing the tenant and recreate again with this env variable?, what do you think?

Captura de pantalla 2024-05-12 a las 20 58 17

@harshavardhana
Copy link
Member

Can you provide your tenant yaml via kubectl describe pods and also exec into the pod and share env | grep -i minio_

@harshavardhana
Copy link
Member

Can you provide your tenant yaml via kubectl describe pods and also exec into the pod and share env | grep -i minio_

Please provide these details, closing this issue until then.

@harshavardhana harshavardhana added incomplete info Incomplete information and removed triage labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community incomplete info Incomplete information
Projects
None yet
Development

No branches or pull requests

3 participants