Container resources are not passed through properly #363

sbernauer · 2024-02-23T08:12:05Z

Affected Stackable version

0.0.0-dev

Affected Apache Spark-on-Kubernetes version

3.5.0

Current and expected behavior

I configure

  driver:
    config:
      resources:
        cpu:
          min: 100m

to not waste a whole core for my driver (the default is request = limit = 1, IMHO we should lower that to 250m and 1 ) and get the stack running on my Laptop.

However this is ignored and the driver still has

    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1" # <<< 1 instead of expected 100m
        memory: 1Gi

Same applies to the executor, have not checked the spark-submit Pod.
I can not fix this using podOverrides, (as spark overwrites what is in the pod-template)

Possible solution

Properly propagate resources.

Additional context

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: access-hdfs
spec:
  sparkImage:
    productVersion: 3.5.0
  mode: cluster
  mainApplicationFile: local:///stackable/spark/jobs/access-hdfs.py
  deps:
    packages:
      - org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3
  sparkConf:
    spark.driver.extraClassPath: /stackable/config/hdfs
    spark.executor.extraClassPath: /stackable/config/hdfs
    spark.hadoop.hive.metastore.kerberos.principal: hive/[email protected]
    spark.hadoop.hive.metastore.sasl.enabled: "true"
    spark.kerberos.keytab: /stackable/kerberos/keytab
    spark.kerberos.principal: spark/[email protected]
    spark.sql.catalog.lakehouse: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.lakehouse.type: hive
    spark.sql.catalog.lakehouse.uri: thrift://hive-iceberg:9083
    spark.sql.defaultCatalog: lakehouse
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  job:
    config:
      volumeMounts: &volumeMounts
        - name: script
          mountPath: /stackable/spark/jobs
        - name: hdfs-config
          mountPath: /stackable/config/hdfs
        - name: kerberos
          mountPath: /stackable/kerberos
        # Yes, I'm too lazy to fiddle around with JVM arguments... (-Djava.security.krb5.conf=/example/path/krb5.conf)
        - name: kerberos
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
    envOverrides: &envOverrides
      KERBEROS_REALM: KNAB.COM
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark-submit
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  driver:
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 100m
    envOverrides: *envOverrides
    podOverrides: &podOverrides
      spec:
        containers:
          - name: spark
            # As the envOverrides are not working
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  executor:
    replicas: 1
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 250m
    envOverrides: *envOverrides
    podOverrides: *podOverrides
  volumes:
    - name: script
      configMap:
        name: access-hdfs-script
    - name: hdfs-config
      configMap:
        name: hdfs
    - name: kerberos
      ephemeral:
        volumeClaimTemplate:
          metadata:
            annotations:
              secrets.stackable.tech/class: kerberos
              secrets.stackable.tech/kerberos.service.names: spark
              secrets.stackable.tech/scope: service=spark
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "1"
            storageClassName: secrets.stackable.tech
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: access-hdfs-script
data:
  access-hdfs.py: |
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, LongType, ShortType, FloatType, DoubleType, BooleanType, TimestampType, MapType, ArrayType
    from pyspark.sql.functions import col, from_json, expr

    spark = SparkSession.builder.appName("access-hdfs").getOrCreate()

    spark.sql("show catalogs").show()
    spark.sql("show tables in lakehouse.default").show()

    spark.sql("SELECT * FROM lakehouse.customer_analytics.customers").show()

Environment

No response

Would you like to work on fixing this bug?

None

The text was updated successfully, but these errors were encountered:

vasili439 · 2024-05-20T16:08:26Z

Any updates on this? Have the same issue. Resources set by sparkapplication manifest just ignored.

adwk67 · 2024-05-21T08:52:40Z

We haven't planned this yet. Please feel free to add any information in the comments that will help us to prioritize this (e.g. how critical it is).

vasili439 · 2024-05-21T09:14:02Z

Hi Andrew, thanks for the prompt reply. For us correct resource allocation is critical as we use number of continius jobs (up to 10-20) with no so huge load (100-200m will be enough for them) but now only 1 cpu can be allocated per pod which makes our AWS bill grow faster

adwk67 · 2024-05-22T09:16:03Z

Hallo Vasily, this should be fixed now. The resource settings are passed through as-is for the pods, although a rounded value is still used for the level of parallelism.

vasili439 · 2024-05-22T09:26:01Z

Thanks! When you going to make next release?

adwk67 · 2024-05-22T09:44:30Z

The next release is scheduled for sometime in July, but the nightly/dev versions are available already.

vasili439 · 2024-05-22T10:56:00Z

Do you have docker images for nightly/dev versions?

adwk67 · 2024-05-22T11:17:51Z

Yes: please refer to this link for installing operators, and this one for the products. If you install the nightly operator, then it will by default install a product in the nightly version.

vasili439 · 2024-05-22T14:05:13Z

Confirmed. Issue is fixed. Thanks!

sbernauer added the type/bug label Feb 23, 2024

This was referenced May 21, 2024

Enable finer granularity when defining CPU resources #299

Closed

fix: allow fine-granular resource CPU settings #408

Merged

adwk67 closed this as completed in #408 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container resources are not passed through properly #363

Container resources are not passed through properly #363

sbernauer commented Feb 23, 2024

vasili439 commented May 20, 2024

adwk67 commented May 21, 2024

vasili439 commented May 21, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024

Container resources are not passed through properly #363

Container resources are not passed through properly #363

Comments

sbernauer commented Feb 23, 2024

Affected Stackable version

Affected Apache Spark-on-Kubernetes version

Current and expected behavior

Possible solution

Additional context

Environment

Would you like to work on fixing this bug?

vasili439 commented May 20, 2024

adwk67 commented May 21, 2024

vasili439 commented May 21, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024

adwk67 commented May 22, 2024

vasili439 commented May 22, 2024