Unable to load S3 #453

AllamSudhakara · 2021-09-27T19:52:35Z

I have a very simple configuration file and Job file where in I select 20 rows from HADOOP System using HIVE Catalog and push it to S3 bucket. Job is populating the data frame and does not create file in S3. Could you please verify the following and provide me insight on what I am doing wrong? Thanks in advance for the help.

Command
spark-submit --conf spark.sql.catalogImplementation=hive --conf spark.hadoop.dfs.nameservices=mycluster --conf spark.hadoop.fs.s3a.fast.upload=True --conf spark.hadoop.fs.s3a.path.style.access=True --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.access.key= --conf spark.hadoop.fs.s3a.secret.key=<DEV-SECRET_KEY> --class com.yotpo.metorikku.Metorikku /home/myEdgenodePath/metorikku_2.11.jar -c /myHadoopFS/job-StraightLoad.yml

Job

metrics:

/myHadoopFA/metric-StraightLoad.yml

variables:
StartDate: 2021-09-01
EndDate: 2021-09-07
TrimmedDateFormat: yyyy-mm-dd

output:
file:
dir: s3a://dev-files-exchange/output

Metric

steps:

dataFrameName: MYMonthly
sql:
select * from mySchema.my_aggregate where exp_dt = ${EndDate} LIMIT 20
ignoreOnFailures: false

output:

dataFrameName: MYMonthly
outputType: Parquet
outputOptions:
saveMode: Overwrite
path: MYMonthly.parquet

lucabem · 2022-08-10T19:20:04Z

Hi @AllamSudhakara:

I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration:

  - dataFrameName: df_name
    outputType: File
    format: parquet
    outputOptions:
      saveMode: Overwrite
      path: s3a://<s3_bucket_name>/path/to/file

Looks like you are note building path correctly

AllamSudhakara · 2022-08-10T21:18:12Z

Hi Luis, Thanks for the reply. Would you know if there is any pipeline builder GUI developed based on Metorikku that generates .YML file and run the pipeline and visualize the progress and emit any errors? Please provide some details on if YotpoLtd has it and can supply under some license fee. It would be great If this GUI has the ability to read from Enterprise metadata for data scientists/analysts to build pipelines and progressively consolidate the Enterprise data assets. Regards, Sudhakar

…

On Wed, Aug 10, 2022 at 3:20 PM Luis Cabezon Manchado < ***@***.***> wrote: Hi @AllamSudhakara <https://github.com/AllamSudhakara>: I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration: - dataFrameName: df_name outputType: File format: parquet outputOptions: saveMode: Overwrite path: s3a://<s3_bucket_name>/path/to/file Looks like you are note building path correctly — Reply to this email directly, view it on GitHub <#453 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEBOADRN6NZF75WQ7QPGV6DVYP6G7ANCNFSM5E3KTAYA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load S3 #453

Unable to load S3 #453

AllamSudhakara commented Sep 27, 2021

lucabem commented Aug 10, 2022

AllamSudhakara commented Aug 10, 2022 via email

Unable to load S3 #453

Unable to load S3 #453

Comments

AllamSudhakara commented Sep 27, 2021

lucabem commented Aug 10, 2022

AllamSudhakara commented Aug 10, 2022 via email