Training ML Model and pushing it back to repository #26183
-
Hey! I’m currently facing an issue I can’t seem to get around. I have a Github action which install the associated repository and the required dependencies, pull some training data from other sources and then train the repositorys ML model. After training, the model files are overwritten and my intention is to push the new model back to the repo. If I train the model for less than 100 epochs (less than 1 hours) then the action works fine, it trains the model and then pushes the newly trained model data back to the repository. If I train for more than 100 epochs then when I reach the push step it says :
I find it really weird, same code, same model, just different amount of epochs (and thus runtime) and I get this error. Any ideas? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 11 comments
-
Hey @amorenocb, you are not hitting the job timeout but the VM you are running on must forget the git credentials after an hour. Can you provide the yml file you are using for the workflow (making sure to omit any sensitive info)? |
Beta Was this translation helpful? Give feedback.
-
Hey @logankilpatrick! Thanks for you reply, I have the same feeling but haven’t been able to find any reference to “Github runner forgeting credentinals”. Below is my yaml file.
|
Beta Was this translation helpful? Give feedback.
-
Hey @logankilpatrick thanks for you reply! I have the same feeling, somehow the GH runner is forgetting the credentials after a while. Ill attach a the yml file below.
Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hi @amorenocb , This is by designed, the access token expires after 60 minutes. GitHub fetches a token for each job, before the job begins. When a workflow run or its jobs are queued for more than one hour, the token may expire before the job starts. You can find more details here and here. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi @weide-zhou! Is there any way to get around this? Maybe using a personal access token to pull and then push? |
Beta Was this translation helpful? Give feedback.
-
If there is a way to get around this, it’s using your PAT! I would give it a shot. |
Beta Was this translation helpful? Give feedback.
-
Hi @amorenocb , Check on my side, i used PAT , and add parameter ‘persist-credentials: false’ to actions/checkout@v2 to successfully complete the push. Code sample as below:
|
Beta Was this translation helpful? Give feedback.
-
Hey! I managed to get it working by manually doing the git pull/push instructions using the PAT. I’ll post it here if its of any use for someone in the future :slight_smile:
|
Beta Was this translation helpful? Give feedback.
-
Hey @weide-zhou it works! Although I noticed that now that I don’t use the github token after I push the trained model the action triggers itself again and again. Is there a way to specify a parameter with the push command so it does not trigger the action again? It’s not going to be a problem anyway, because the trigger of this action will be a schedule, not a push. But anyways it would be nice to know. Thank you for your response! |
Beta Was this translation helpful? Give feedback.
-
Hi @amorenocb , Glad to know it works for you! Is there a way to specify a parameter with the push command so it does not trigger the action again? >> Sorry, No, only Github token can aviod infinite loop for push event, you can find the answer in this link. using PAT to commit will always trigger the event, using schedule event instead is a good option. Could you please help to mark the answer? It will help other guys who encounter the same problem. Thanks. |
Beta Was this translation helpful? Give feedback.
-
I think so, I read about it, there are a lot of possibilities |
Beta Was this translation helpful? Give feedback.
Hi @amorenocb ,
Check on my side, i used PAT , and add parameter ‘persist-credentials: false’ to actions/checkout@v2 to successfully complete the push. Code sample as below: