Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container resources are not passed through properly #363

Closed
sbernauer opened this issue Feb 23, 2024 · 9 comments · Fixed by #408
Closed

Container resources are not passed through properly #363

sbernauer opened this issue Feb 23, 2024 · 9 comments · Fixed by #408
Labels

Comments

@sbernauer
Copy link
Member

Affected Stackable version

0.0.0-dev

Affected Apache Spark-on-Kubernetes version

3.5.0

Current and expected behavior

I configure

  driver:
    config:
      resources:
        cpu:
          min: 100m

to not waste a whole core for my driver (the default is request = limit = 1, IMHO we should lower that to 250m and 1 ) and get the stack running on my Laptop.

However this is ignored and the driver still has

    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1" # <<< 1 instead of expected 100m
        memory: 1Gi

Same applies to the executor, have not checked the spark-submit Pod.
I can not fix this using podOverrides, (as spark overwrites what is in the pod-template)

Possible solution

Properly propagate resources.

Additional context

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: access-hdfs
spec:
  sparkImage:
    productVersion: 3.5.0
  mode: cluster
  mainApplicationFile: local:///stackable/spark/jobs/access-hdfs.py
  deps:
    packages:
      - org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3
  sparkConf:
    spark.driver.extraClassPath: /stackable/config/hdfs
    spark.executor.extraClassPath: /stackable/config/hdfs
    spark.hadoop.hive.metastore.kerberos.principal: hive/[email protected]
    spark.hadoop.hive.metastore.sasl.enabled: "true"
    spark.kerberos.keytab: /stackable/kerberos/keytab
    spark.kerberos.principal: spark/[email protected]
    spark.sql.catalog.lakehouse: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.lakehouse.type: hive
    spark.sql.catalog.lakehouse.uri: thrift://hive-iceberg:9083
    spark.sql.defaultCatalog: lakehouse
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  job:
    config:
      volumeMounts: &volumeMounts
        - name: script
          mountPath: /stackable/spark/jobs
        - name: hdfs-config
          mountPath: /stackable/config/hdfs
        - name: kerberos
          mountPath: /stackable/kerberos
        # Yes, I'm too lazy to fiddle around with JVM arguments... (-Djava.security.krb5.conf=/example/path/krb5.conf)
        - name: kerberos
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
    envOverrides: &envOverrides
      KERBEROS_REALM: KNAB.COM
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark-submit
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  driver:
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 100m
    envOverrides: *envOverrides
    podOverrides: &podOverrides
      spec:
        containers:
          - name: spark
            # As the envOverrides are not working
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  executor:
    replicas: 1
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 250m
    envOverrides: *envOverrides
    podOverrides: *podOverrides
  volumes:
    - name: script
      configMap:
        name: access-hdfs-script
    - name: hdfs-config
      configMap:
        name: hdfs
    - name: kerberos
      ephemeral:
        volumeClaimTemplate:
          metadata:
            annotations:
              secrets.stackable.tech/class: kerberos
              secrets.stackable.tech/kerberos.service.names: spark
              secrets.stackable.tech/scope: service=spark
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "1"
            storageClassName: secrets.stackable.tech
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: access-hdfs-script
data:
  access-hdfs.py: |
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, LongType, ShortType, FloatType, DoubleType, BooleanType, TimestampType, MapType, ArrayType
    from pyspark.sql.functions import col, from_json, expr

    spark = SparkSession.builder.appName("access-hdfs").getOrCreate()

    spark.sql("show catalogs").show()
    spark.sql("show tables in lakehouse.default").show()

    spark.sql("SELECT * FROM lakehouse.customer_analytics.customers").show()

Environment

No response

Would you like to work on fixing this bug?

None

@vasili439
Copy link

Any updates on this? Have the same issue. Resources set by sparkapplication manifest just ignored.

@adwk67
Copy link
Member

adwk67 commented May 21, 2024

We haven't planned this yet. Please feel free to add any information in the comments that will help us to prioritize this (e.g. how critical it is).

@vasili439
Copy link

Hi Andrew, thanks for the prompt reply. For us correct resource allocation is critical as we use number of continius jobs (up to 10-20) with no so huge load (100-200m will be enough for them) but now only 1 cpu can be allocated per pod which makes our AWS bill grow faster

@adwk67
Copy link
Member

adwk67 commented May 22, 2024

Hallo Vasily, this should be fixed now. The resource settings are passed through as-is for the pods, although a rounded value is still used for the level of parallelism.

@vasili439
Copy link

Thanks! When you going to make next release?

@adwk67
Copy link
Member

adwk67 commented May 22, 2024

The next release is scheduled for sometime in July, but the nightly/dev versions are available already.

@vasili439
Copy link

Do you have docker images for nightly/dev versions?

@adwk67
Copy link
Member

adwk67 commented May 22, 2024

Yes: please refer to this link for installing operators, and this one for the products. If you install the nightly operator, then it will by default install a product in the nightly version.

@vasili439
Copy link

Confirmed. Issue is fixed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants