Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moon2 : the test is finished but the session (and corresponding pod) still present #354

Open
MohamedBenighil opened this issue Oct 26, 2022 · 16 comments

Comments

@MohamedBenighil
Copy link

MohamedBenighil commented Oct 26, 2022

Hello,

I deployed moon2 on my azure kubernetes (AKS), and the QA tems lanched their tests every 20 minutes (the same tests) and most of times everything works fine.

However sometimes, the test is finished but the corresponfing session on moon ui and the pod get stuck and they are never deleted. I would like to know you have an idea why ?

I will provide some infos that may help.

  1. notice the date of creation of the pods (it is like one pod of moon get restarted)

freezeWatchMoon

  1. the stucked session has a blink vnc for a period of times, then it get disconnected:

2

and the logs of vnc-server container of the stuck pod in question are:

...
27/10/2022 07:53:34 webSocketsHandshake: unknown connection error
27/10/2022 07:53:34 Client 10.244.17.82 gone
27/10/2022 07:53:34 Statistics             events    Transmit/ RawEquiv ( saved)
27/10/2022 07:53:34  TOTALS              :      0 |         0/        0 (  0.0%)
27/10/2022 07:53:34 Statistics             events    Received/ RawEquiv ( saved)
27/10/2022 07:53:34  TOTALS              :      0 |         0/        0 (  0.0%)
27/10/2022 07:53:36 Got connection from client 10.244.17.82
27/10/2022 07:53:36   0 other clients
...
  1. some logs returned by the tests lanched by the QA team:
2022-10-25T22:41:55.2526026Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631A.tmp\Creation_Suppression_Compte_Ctm_224037_UTC\Creation_Suppression_Compte_Ctm_224037_UTC.har
2022-10-25T22:41:55.2526623Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631A.tmp\Creation_Suppression_Compte_Ctm_224037_UTC\Creation_Suppression_Compte_Ctm_224037_UTC.html
2022-10-25T22:41:55.2527212Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631A.tmp\Creation_Suppression_Compte_Ctm_224037_UTC\f6b4fe07-823b-43f4-b1f7-a918dbb5b292.png
2022-10-25T22:41:55.2554747Z Creation_Suppression_Compte_Ctm_224037_UTC <-> Duration:48811.6822ms
2022-10-25T22:41:55.2555331Z Creation_Suppression_Compte_Ctm_224037_UTC:SendReport :: Ok : Time:9928.7751ms
2022-10-25T22:41:55.8570602Z Creation_Suppression_Compte_Ctr_224037_UTC -- List Last : C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC
2022-10-25T22:41:55.8572101Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\05dda6c4-d1be-4bf4-be1f-830bab2dcaf1.png
2022-10-25T22:41:55.8572890Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\26967f59-e69c-4991-9c32-3d3289be27dd.png
2022-10-25T22:41:55.8573464Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\52183439-fe68-48f8-b9a7-dd9807cdcc09.png
2022-10-25T22:41:55.8574018Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\858c54c1-e843-45f6-ac06-e647fea4692f.png
2022-10-25T22:41:55.8574577Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\86ba40b7-b7b7-457b-be3c-dd91f60e9f9b.png
2022-10-25T22:41:55.8575161Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\9b9d7a60-7ed0-4a76-aa9e-b25c391a50ac.png
2022-10-25T22:41:55.8576952Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\be4e361b-84e3-40a4-8841-7d86d327ca6c.png
2022-10-25T22:41:55.8577551Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\Creation_Suppression_Compte_Ctr_224037_UTC.har
2022-10-25T22:41:55.8578130Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\Creation_Suppression_Compte_Ctr_224037_UTC.html
2022-10-25T22:41:55.8578719Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\f3005773-e4d2-4c9d-ad27-37ba5cddfe96.png
2022-10-25T22:41:55.8579268Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631D.tmp\Creation_Suppression_Compte_Ctr_224037_UTC\f896c908-5af5-45c2-ad32-70b8628655b1.png
2022-10-25T22:41:55.8658145Z Creation_Suppression_Compte_Ctr_224037_UTC <-> Duration:50991.8919ms
2022-10-25T22:41:55.8658970Z Creation_Suppression_Compte_Ctr_224037_UTC:SendReport :: Ok : Time:8634.8487ms
2022-10-25T22:43:06.8127636Z Microsoft.Playwright.PlaywrightException: WebSocket error: read ECONNRESET
2022-10-25T22:43:06.8130498Z =========================== logs ===========================
2022-10-25T22:43:06.8132940Z <ws connecting> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m
2022-10-25T22:43:06.8134057Z <ws error> error read ECONNRESET
2022-10-25T22:43:06.8135301Z <ws connect error> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m read ECONNRESET
2022-10-25T22:43:06.8138326Z <ws disconnected> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m code=1006 reason=
2022-10-25T22:43:06.8139666Z ============================================================
2022-10-25T22:43:06.8140222Z    at Microsoft.Playwright.Transport.Connection.InnerSendMessageToServerAsync[T](String guid, String method, Object args) in /_/src/Playwright/Transport/Connection.cs:line 163
2022-10-25T22:43:06.8140759Z    at Microsoft.Playwright.Transport.Connection.WrapApiCallAsync[T](Func`1 action, Boolean isInternal)
2022-10-25T22:43:06.8141323Z    at Microsoft.Playwright.Core.BrowserType.ConnectAsync(String wsEndpoint, BrowserTypeConnectOptions options) in /_/src/Playwright/Core/BrowserType.cs:line 161
2022-10-25T22:43:06.8142119Z    at Edenred.France.Automation.MyEdenred.PagesObject.SettingsPlaywrightMoon.BrowserManager.CreateAsync(Reporting reporting) in D:\a\1\s\Edenred.France.Automation.MyEdenred.PagesObject\SettingsPlaywrightMoon\BrowserManager.cs:line 43
2022-10-25T22:43:06.8262057Z [xUnit.net 00:02:31.59]     Edenred.France.Automation.MyEdenred.Xunit.TestsMoon.Creation_Suppression_Compte_CksCulture.CreationSuppressionCompteCksCulture [FAIL]
2022-10-25T22:43:08.2894976Z   Failed Edenred.France.Automation.MyEdenred.Xunit.TestsMoon.Creation_Suppression_Compte_CksCulture.CreationSuppressionCompteCksCulture [2 m 27 s]
2022-10-25T22:43:08.2895812Z   Error Message:
2022-10-25T22:43:08.2896332Z    Microsoft.Playwright.PlaywrightException : WebSocket error: read ECONNRESET
2022-10-25T22:43:08.2896852Z =========================== logs ===========================
2022-10-25T22:43:08.2897758Z <ws connecting> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m
2022-10-25T22:43:08.2898426Z <ws error> error read ECONNRESET
2022-10-25T22:43:08.2899142Z <ws connect error> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m read ECONNRESET
2022-10-25T22:43:08.2900249Z <ws disconnected> ws://xxx.xxx.xxx.xxx:4444/playwright/chromium/playwright-1.27.1?enableVNC=true&enableVideo=true&headless=false&videoName=Creation_Suppression_Compte_CksCulture_224039_UTC.mp4&name=Creation_Suppression_Compte_CksCulture_224039_UTC&sessionTimeout=10m code=1006 reason=
2022-10-25T22:43:08.2901238Z ============================================================
2022-10-25T22:43:08.2901473Z   Stack Trace:
2022-10-25T22:43:08.2901902Z      at Microsoft.Playwright.Transport.Connection.InnerSendMessageToServerAsync[T](String guid, String method, Object args) in /_/src/Playwright/Transport/Connection.cs:line 163
2022-10-25T22:43:08.2902488Z    at Microsoft.Playwright.Transport.Connection.WrapApiCallAsync[T](Func`1 action, Boolean isInternal)
2022-10-25T22:43:08.2903247Z    at Microsoft.Playwright.Core.BrowserType.ConnectAsync(String wsEndpoint, BrowserTypeConnectOptions options) in /_/src/Playwright/Core/BrowserType.cs:line 161
2022-10-25T22:43:08.2904031Z    at Edenred.France.Automation.MyEdenred.PagesObject.SettingsPlaywrightMoon.BrowserManager.CreateAsync(Reporting reporting) in D:\a\1\s\Edenred.France.Automation.MyEdenred.PagesObject\SettingsPlaywrightMoon\BrowserManager.cs:line 43
2022-10-25T22:43:08.2904960Z    at Edenred.France.Automation.MyEdenred.Xunit.TestsMoon.Creation_Suppression_Compte_CksCulture.CreationSuppressionCompteCksCulture() in D:\a\1\s\Edenred.France.Automation.MyEdenred.Playwright.Xunit\TestsMoon\Creation_Suppression_Compte_CksCulture.cs:line 46
2022-10-25T22:43:08.2905621Z --- End of stack trace from previous location ---
2022-10-25T22:43:10.0452047Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:EndTest :: Exception
2022-10-25T22:43:11.0581150Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Page.CloseAsync :: Start
2022-10-25T22:43:11.0586059Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Page.CloseAsync :: Ok : Time:0.4599ms
2022-10-25T22:43:11.0586900Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Context.CloseAsync :: Start
2022-10-25T22:43:11.0589391Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Context.CloseAsync :: OK : Time:0.3217ms
2022-10-25T22:43:11.0590221Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Browser.CloseAsync :: Start
2022-10-25T22:43:11.0592817Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Browser.CloseAsync :: Ok : Time:0.275ms
2022-10-25T22:43:11.0593883Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Context.DisposeAsync :: Start
2022-10-25T22:43:11.0595566Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Context.DisposeAsync :: Ok : Time:0.3196ms
2022-10-25T22:43:11.0596711Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Browser.DisposeAsync :: Start
2022-10-25T22:43:11.0601016Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Browser.DisposeAsync :: Ok : Time:0.2056ms
2022-10-25T22:43:11.0602143Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Playwright.Dispose :: Start
2022-10-25T22:43:11.0735176Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:Playwright.Dispose :: Ok : Time:13.6181ms
2022-10-25T22:43:11.0736099Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC:SendReport :: Start
2022-10-25T22:43:11.0736976Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC -- List First : C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC
2022-10-25T22:43:11.0738087Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\19e06821-9feb-4b5e-8396-ef79fe641899.png
2022-10-25T22:43:11.0739164Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\48264f2c-2baf-4a87-a43d-9b4d3b51341e.png
2022-10-25T22:43:11.0740252Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\5ee9d7ba-78a2-45cf-8eb9-92ef81c706be.png
2022-10-25T22:43:11.0741239Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\6b97710a-8879-4174-bd43-bb675ea21fb9.png
2022-10-25T22:43:11.0742375Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\6d88ad58-3f88-4c89-bc01-bae1ba7d0ba3.png
2022-10-25T22:43:11.0743466Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\7608ce7b-10a6-4d6b-96a9-6ab93a55c2b4.png
2022-10-25T22:43:11.0744501Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\cdc552a8-8962-4680-be18-93793550a91c.png
2022-10-25T22:43:11.0745478Z C:\Users\VssAdministrator\AppData\Local\Temp\tmp631C.tmp\Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC\fc3a6b66-3a75-4922-8a4d-c334c81f5959.png
2022-10-25T22:43:12.3065984Z Navigation_Ctr_Ajout_CksZenith_Ctm_224037_UTC -- List Last : 

Please let me know if you need more informations ?

Thank you for your help

@MohamedBenighil MohamedBenighil changed the title moon2 : the test is finished but the session (end corresponding pod) still present moon2 : the test is finished but the session (and corresponding pod) still present Oct 26, 2022
@vania-pooh
Copy link
Member

@MohamedBenighil Playwright pod is automatically deleted when respective web-socket connection is closed. Make sure your process actually closed this connection.

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Oct 28, 2022

@vania-pooh Actually, i discussed with QA team, and they said : the process is already closes the connection. And it is done by the following piece of code (at the end of each process/test) :

await Page?.CloseAsync();
await Context?.CloseAsync();
await Browser?.CloseAsync();
if (Context != null)
await Context.DisposeAsync();
if (Browser != null)
await Browser.DisposeAsync();
if (Playwright != null)
Playwright.Dispose();

Notice our QA team uses Nuget within C#, and playwright 1.27.1 to develope those processes/tests

I would like also to highlight somthing that may help:

  • we have 8 tests that are launched every 20mn since Tuesday 25th at 3pm, and all the 8 test worked fine until 00h40 (of the same day) where one of the session(and it's pod) get frozen until this day. So the same process worked fine before and after 00h40, and the corresponding seesion freezes JUST on that time

I would like if you have an idea why ?

If you need more inforamtions, please let me know

thank you for your help

@vania-pooh
Copy link
Member

vania-pooh commented Oct 28, 2022

@MohamedBenighil one more possible is reason could be Moon pod restart because of some maintenance in Kubernetes cluster. Make sure that number of restarts in kubectl get po -n moon for Moon pods is zero.

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Oct 28, 2022

@vania-pooh In fact, i have noticed that one of moon's pod get killed and recreated (Look to the AGE Column in first screenshot above).

PS: i am using kubernetes in Azure (AKS)

So, what should i do to avoid the frozen session/pod ? and why moon2 does not terminate the previous session if it restarted ?

Thank you

@vania-pooh
Copy link
Member

@MohamedBenighil Moon has no state, i.e. no list of sessions is being stored in Moon memory. In case of Playwright \ Puppeteer Moon pod is simply controlling web-socket connection state and deletes browser pod when connection is closed. When Moon is being killed from the outside, this information is lost and pod will never be deleted. This is why it is important to make sure that Moon is never restarted (which is usually the case when configured correctly).

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Oct 28, 2022

@vania-pooh So to make Moon is never restarted, should i ONLY edit the value from restartPolicy: Always to restartPolicy: Never in moon deployement object (ie: kubectl -nmoon edit deployment moon )? or there is additional parameter(s) to set ?

@vania-pooh
Copy link
Member

@MohamedBenighil this is not needed. You just need to make sure Moon is not restarted from the outside or restarted because of OOM.

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Oct 29, 2022

@vania-pooh The problem is : i have no idea why it restart? The pod of moon2 is killed, so when i do kubectl get po -nmoon the RESTART column is set to 0, so i can not check the logs of previous pod since it is lost. I would like to now if you have a "tip" to deal with this situation (since moon2 is stateless and playwright pod still present) ? (unless deploying logging system)

@aandryashin
Copy link
Member

aandryashin commented Oct 30, 2022 via email

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Oct 30, 2022

@aandryashin As i described on my just last comment, kubectl logs my-pod --previous does not work since the previous pod was killed (ie. it restarts BUT the RESTART column is 0 when you do : kubectl get po -nmoon ).

@aandryashin
Copy link
Member

aandryashin commented Oct 30, 2022 via email

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Nov 4, 2022

@vania-pooh i discoverd why the moon2's pod was killed : It is due to the downscacle of my AKS (Azure kubernetes).

I enabled the autoscaling feature on my AKS (from min=2 to max=10 nodes), so during the downscale process, the worker hosting the moon2's pod(s), gets deleted by AKS. So the moon2's pod get killed as well and recreated on the other available worker. Once the moon2's pod was created, the control of web-socket connection is lost since the new pod has no state to save the history, hence the playwright pod is never deleted. And this is what you said in this comment

I would like to know how can i configure moon2 to avoid to be killed during the downscale process of my k8s cluser ?

PS: i deployed it using Helm

Thank you in advance for your help

@aandryashin
Copy link
Member

aandryashin commented Nov 4, 2022 via email

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Nov 7, 2022

@aandryashin when you say : "you have to increase graceful shutdown period up to 6 min" do you mean terminationGracePeriodSeconds properity ? if "yes", i guess it is already set to 6min by default here in the moon2 chart If "No", please let me know how ?

@aandryashin
Copy link
Member

aandryashin commented Nov 7, 2022 via email

@MohamedBenighil
Copy link
Author

MohamedBenighil commented Nov 8, 2022

when you say : "i meant node should send kill signal to processes only after pods graceful shurdown period".

  1. Could you please tell me how can i do that ? should i add some yaml code to moon2's chart values.yml ? If "yes" what parameter(s) should i set ?
  2. or something else ?

I confirm also, the moon2 is not runnning on worker spot instance. But it was killed due to downscale process, because on my cluster (AKS), the autoscaling feature is enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants