Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agones-{extensions,allocator}: Make servers context aware #3845

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zmerlynn
Copy link
Collaborator

@zmerlynn zmerlynn commented May 31, 2024

  • adds an httpserver utility package to handle the Run function that controller/extensions use. Make that context aware using the same method as https.Run:
    go func() {
    <-ctx.Done()
    s.tls.Close() // nolint: errcheck,gosec
    }()
  • also plumbs context-awareness through the allocator run{Mux,REST,GRPC} functions.
  • adds a gRPC health server to the allocator, calls .Shutdown() on it during graceful termination - this seems to push the client off correctly.

Along the way, basically re-wrote the TestAllocatorAfterDeleteReplica test:

  • moved the GetAllocatorClient up because it's fairly disruptive (new certs, etc.), so I wanted the pod stability checks to happen after
  • changed agones-allocator stability checks to just grab the Deployment and wait for Replicas != AvailableReplicas.
  • changed the test to actually disrupt the Pods in the middle of the test (vs before it just nukes the pods then starts the test.

Tested with e2e in a loop.

Towards #3853

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: f71cd44c-de3c-458c-9dd6-aa9f3dd9b3ee

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@zmerlynn
Copy link
Collaborator Author

Hmm, upon further testing this still has a minor flake. Let me play with it a bit more, moving to draft.

@zmerlynn zmerlynn marked this pull request as draft May 31, 2024 16:25
@github-actions github-actions bot added the size/L label Jun 5, 2024
@zmerlynn zmerlynn marked this pull request as ready for review June 5, 2024 18:05
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 6355c010-ac46-4c04-92cf-5fee63fbdc1f

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: cb48cb0a-06e7-485a-87ca-f0a79097eb43

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3845/head:pr_3845 && git checkout pr_3845
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.42.0-dev-2d2e2e5-amd64

@@ -150,7 +150,7 @@ func main() {
agonesInformerFactory := externalversions.NewSharedInformerFactory(agonesClient, defaultResync)
kubeInformerFactory := informers.NewSharedInformerFactory(kubeClient, defaultResync)

server := &httpServer{}
server := &httpserver.Server{Logger: logger}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
server := &httpserver.Server{Logger: logger}
server := &http.Server{Logger: logger}

That wouldn't be redundant 😃

cmd/allocator/main.go Outdated Show resolved Hide resolved

// Package httpserver implements an http server that conforms to the
// controller runner interface.
package httpserver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package httpserver
package http

I'd be tempted to make the package http since we also have a https - not sure the server part is required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really didn't want to shadow a super commonly important base library. Calling this http would result in it getting imported as agoneshttp or something pretty often, since I would also probably not rename base http.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: ec4f214a-d587-406d-8ee0-20be8a5a2b15

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3845/head:pr_3845 && git checkout pr_3845
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.42.0-dev-acc03f2-amd64


go func() {
go func() {
<-ctx.Done()
grpcServer.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be: https://pkg.go.dev/google.golang.org/grpc#Server.GracefulStop (or maybe used in lieu of a lot of this code?)

Which might help with your multi-context issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this, and I have a version in a branch where I did exactly that. It made things worse.

Let me just go back to the drawing board here, but I think it may involve rewriting a lot of the handling basically everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no, it doesn't solve the two-context issue. The problem is that here (and in httpserver/https), we use a context to signal shutdown, but the service handler still needs a forward context to get work done to shut down. Let me see if I can just rework the listen handlers more.

Copy link
Member

@markmandel markmandel Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see your point. I think you are right that we'll want two contexts, but what we could do is something akin to:

<- listenerCtx.Done()
// graceful shutdown both the http and grpc listeners. 
grpcServer.GracefulStop()
server.Shutdown()
cancelInfrormer() // cancel the worker + Informer context, since now we know we aren't accepting any more connections.

This might require pulling the grpc + http listeners into a larger struct that has a Run(ctx) operation on it, and stops when the ctx is <- ctx.Done(), to provide a larger data structure to operate on (and then it could take a context, and manage it's own context for the workers and informers).

Did that make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could do something similar to the runner slices and have shutdown slices, perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could do something similar to the runner slices and have shutdown slices, perhaps?

Seems like you are rebuilding Contexts 😄

cmd/allocator/main.go Outdated Show resolved Hide resolved
…ealth check

* adds an `httpserver` utility package to handle the `Run` function
that controller/extensions use. Make that context aware using the same
method as https.Run:
https://github.com/googleforgames/agones/blob/dfa414e5e4da37798833bbf8c33919acb5f3c2ea/pkg/util/https/server.go#L127-L130

* also plumbs context-awareness through the allocator
run{Mux,REST,GRPC} functions.

* adds a gRPC health server to the allocator, calls .Shutdown() on it
during graceful termination - this seems to push the client off correctly.

Tested with e2e in a loop.

Towards googleforgames#3853
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 4e70c09c-b912-48d4-91fb-6af56029445b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3845/head:pr_3845 && git checkout pr_3845
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.42.0-dev-9d246a3-amd64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants