Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random "Could not chdir to home directory". #2636

Open
jiucenglou opened this issue Mar 29, 2024 · 3 comments
Open

Random "Could not chdir to home directory". #2636

jiucenglou opened this issue Mar 29, 2024 · 3 comments

Comments

@jiucenglou
Copy link

jiucenglou commented Mar 29, 2024

On PBS 18.1.2, many jobs can run without error but some randomly fail with Could not chdir to home directory in the error file. The job can run fine if resubmitted (till now). Could you help to suggest what could be the cause and how to fix ? Many thanks !

PS: There are a couple of occurrences of Could not chdir to home directory
in src/resmom/start_exec.c. I am still confused how to diagonalize the cause and fix ...

@zhenrong-wang
Copy link

Hi @jiucenglou

I am not an openPBS dev, but just went through the logic quickly. It seems that if you set the sandbox=PRIVATE, the openPBS would not change directory to the home directory, instead, it would change the directory to the job directory.

Maybe you can try this out.

@jiucenglou
Copy link
Author

sandbox=PRIVATE

Thank you ! I will try sandbox=PRIVATE when the error shows up again and, since the error occurs randomly, see what happens afterwards in one or two months.

@jiucenglou
Copy link
Author

jiucenglou commented Apr 2, 2024

sandbox=PRIVATE

I just saw the error again, and began to try sandbox=PRIVATE.

PS: (Fortunately) this time I was also running qstat -F json almost at the same time, and the command gave a corrupted json that jq refused to parse (other time jq can parse). I am wondering what could be the cause and how to diagnolize and workaround ? For example, is it possible to use qsub to ask pbs to "retry one more time" after "freezing/choking" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants