Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement job restart policies #157

Open
razvan opened this issue Oct 11, 2022 · 3 comments
Open

Implement job restart policies #157

razvan opened this issue Oct 11, 2022 · 3 comments

Comments

@razvan
Copy link
Member

razvan commented Oct 11, 2022

Description

Some Spark jobs are idempotent and can be resubmitted without problems especially if they fail due to external constraints (like endpoints not being ready and such)

The user should be able to configure such jobs with a restart policy.

@sbernauer
Copy link
Member

At my previous company i simply put the Spark driver Pod within a Deployment and health-checked port 4040.
Worked pretty well

@razvan
Copy link
Member Author

razvan commented Oct 11, 2022

That requires spark.ui to be true right? Anyway, establishing the status of the job (for batch jobs) is not the problem (it's currently solved). The user has to specify whether restarting the job is safe or not.

For streaming jobs, we might want indeed to use a Deployment instead of a Job for the spark-submit Pod.

@sbernauer
Copy link
Member

Yes, it is. I mentioned it for the sake of completeness, and our approach is likely better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants