Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

envOverrides have no effect #362

Open
sbernauer opened this issue Feb 23, 2024 · 0 comments
Open

envOverrides have no effect #362

sbernauer opened this issue Feb 23, 2024 · 0 comments

Comments

@sbernauer
Copy link
Member

Affected Stackable version

0.0.0-dev

Affected Apache Spark-on-Kubernetes version

3.5.0

Current and expected behavior

Setting envOverrides has no effect. I currently need to resort to podOverrides

Possible solution

Set them :)

Additional context

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: access-hdfs
spec:
  sparkImage:
    productVersion: 3.5.0
  mode: cluster
  mainApplicationFile: local:///stackable/spark/jobs/access-hdfs.py
  deps:
    packages:
      - org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3
  sparkConf:
    spark.driver.extraClassPath: /stackable/config/hdfs
    spark.executor.extraClassPath: /stackable/config/hdfs
    spark.hadoop.hive.metastore.kerberos.principal: hive/[email protected]
    spark.hadoop.hive.metastore.sasl.enabled: "true"
    spark.kerberos.keytab: /stackable/kerberos/keytab
    spark.kerberos.principal: spark/[email protected]
    spark.sql.catalog.lakehouse: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.lakehouse.type: hive
    spark.sql.catalog.lakehouse.uri: thrift://hive-iceberg:9083
    spark.sql.defaultCatalog: lakehouse
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  job:
    config:
      volumeMounts: &volumeMounts
        - name: script
          mountPath: /stackable/spark/jobs
        - name: hdfs-config
          mountPath: /stackable/config/hdfs
        - name: kerberos
          mountPath: /stackable/kerberos
        # Yes, I'm too lazy to fiddle around with JVM arguments... (-Djava.security.krb5.conf=/example/path/krb5.conf)
        - name: kerberos
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
    envOverrides: &envOverrides
      KERBEROS_REALM: KNAB.COM
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark-submit
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  driver:
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 100m
    envOverrides: *envOverrides
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  executor:
    replicas: 1
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 250m
    envOverrides: *envOverrides
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  volumes:
    - name: script
      configMap:
        name: access-hdfs-script
    - name: hdfs-config
      configMap:
        name: hdfs
    - name: kerberos
      ephemeral:
        volumeClaimTemplate:
          metadata:
            annotations:
              secrets.stackable.tech/class: kerberos
              secrets.stackable.tech/kerberos.service.names: spark
              secrets.stackable.tech/scope: service=spark
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "1"
            storageClassName: secrets.stackable.tech
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: access-hdfs-script
data:
  access-hdfs.py: |
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, LongType, ShortType, FloatType, DoubleType, BooleanType, TimestampType, MapType, ArrayType
    from pyspark.sql.functions import col, from_json, expr

    spark = SparkSession.builder.appName("access-hdfs").getOrCreate()

    spark.sql("show catalogs").show()
    spark.sql("show tables in lakehouse.default").show()

    spark.sql("SELECT * FROM lakehouse.customer_analytics.customers").show()

Environment

No response

Would you like to work on fixing this bug?

None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant