Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nhost (Next) JWT Token expire and apollo/nhostNext js client stops working #2681

Open
AlexanderMoroz opened this issue Apr 26, 2024 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@AlexanderMoroz
Copy link

AlexanderMoroz commented Apr 26, 2024

Describe the bug
Hi everyone. I've decided to open this bug once more because we have still issues related to JWT expiration and session refresh with next/apollo/nhost.

prev bug #2348

I've recorded a pretty detailed loom about this:
https://www.loom.com/share/d5f716675c924192b3f3cedc3fa32e0e

Basically, if we get an error in nhost-apollo client cannot read property type of undefined which we get when we got JWT token expired error message from hasura, apollo client and nhost native client stops sending any request at all. What is more annoying that there is no error and page refresh doesn't help at all. User have to sign-out at least. The only request which is successful after page load is request to v1/token.

We've recently updated to the latest versions of nhost and apollo libs:

"@apollo/client": "3.9.11",
"@nhost/apollo": "^6.2.1",
"@nhost/nextjs": "^2.1.10",
"@nhost/nhost-js": "^3.0.11",
"@nhost/react": "^3.4.1",
"@nhost/react-apollo": "^11.0.1",

I'd like to provide any possible help to solve this issue, because it's quite crucial for our application.

Looking forward to your response.
Thanks in advance

@AlexanderMoroz AlexanderMoroz added the bug Something isn't working label Apr 26, 2024
@onehassan
Copy link
Contributor

Thank you for providing a detailed issue. We will take a look at this as as soon as possible.

@mmmoli
Copy link

mmmoli commented Apr 26, 2024

Thanks @onehassan

This is crazy critical for us because it keeps triggering on prod every 10mins and we have no way to fix.

@dbarrosop
Copy link
Contributor

We are having troubles replicating and we don't know of any other user having this issue. All the reports we have heard so far from the users affected by the previous issue is that the issue is no longer a problem so we will need help finding a way to replicate. However, before that, would you mind checking your session settings (settings->Authentication->Session)? Which values do you have?

@AlexanderMoroz
Copy link
Author

@dbarrosop could you give me a hint we're are this settings exactly? Not quite sure I got what you meant

@dbarrosop
Copy link
Contributor

dbarrosop commented Apr 29, 2024

settings->Authentication->Session

Screenshot 2024-04-29 at 13 33 58

@AlexanderMoroz
Copy link
Author

AlexanderMoroz commented Apr 29, 2024

Hmmm.. this is not available during local development, but this is our settings from production:

image

But I've continue my debugging and found something interesting regarding this stale state of application. At least why apollo is not doing any request. We have a bit weird inconsistent state of our access token

I was mainly looking at this file:
https://github.com/nhost/nhost/blob/main/integrations/apollo/src/index.ts and how it works on my local machine.

So, problem:

apolloLink calls await getAuthHeaders()

 const httpLink = setContext(async (_, { headers }) => {
    return {
      headers: {
        ...headers,
        ...(await getAuthHeaders())
      }
    }
  }).concat(createHttpLink({ uri }))

getAuthHeaders under the hood calls await awaitValidTokenOrNull() which calls isTokenValidOrNull recursively

const awaitValidTokenOrNull = () => {
    if (isTokenValidOrNull()) {
      return Promise.resolve()
    }

    const waitForValidToken = () => {
      if (isTokenValidOrNull()) {
        return Promise.resolve(true)
      }
      return new Promise((resolve) => {
        setTimeout(() => waitForValidToken().then(resolve), 100)
      })
    }

    return waitForValidToken()
  }

but token which we have is never null or never valid. Specifically because JWT is not valid during bug is present.
If we take a look at values:
image

Our token expires in future (time of screenshoot is 14.15 local).
But if we decode JWT it's in the past
image
image

So as a result isTokenValidOrNull() is always false and waitForValidToken always return a promise which is replaced by another recursively

So I def continue debugging this and try to understand where is this inconsistency coming from (I assume this might be because we can use two nhost clients next and vanila (because of monorepo and some parts are need to be reused outside of next js)

But would be great to hear any feedback from you. Maybe it make sense to at least reset auth state if JWT is in the past or not valid, because otherwise you have infinite loop

@AlexanderMoroz
Copy link
Author

@dbarrosop @onehassan any updates?

@dbarrosop
Copy link
Contributor

No, I think first step will be for you to provide us with the means to reproduce. As mentioned previously, to the best of our knowledge this issue was fixed and it looks like this issue might be related to your setup so we need help reproducing (we also have a monorepo as you can see in our nhost/nhost repo and we don't have this issue)

@dbarrosop
Copy link
Contributor

Alternatively, a PR with the fix will be more than welcomed :)

@AlexanderMoroz
Copy link
Author

Ok, I'll try to update previous repo I sent you to reproduce this error. Or if I understand how exactly I can fix it I'll just create a PR

@mmmoli
Copy link

mmmoli commented May 1, 2024

Disappointed with the response from @dbarrosop

Reads like, "not our problem until you can prove that it is".

You're always going to be bombarded with issues that feel isolated like this, with every customer fighting for your attention.

From our pov, our entire stack relies on nhost being solid regardless of where the bug lives.

It's not solid.

@AlexanderMoroz has done a detailed investigation. We lack the knowledge of nhost internals to go much further and since we haven't received any further direction.

Instead, you propose we do more work to prove to you this is a bug?

Sure, ok.

Remind me why we should make a PR again? After all, it's our problem, right?

;)

@dbarrosop
Copy link
Contributor

dbarrosop commented May 3, 2024

Reads like, "not our problem until you can prove that it is".

I am sorry if it came across that way, it certainly wasn't my intention.

You're always going to be bombarded with issues that feel isolated like this, with every customer fighting for your attention.

Agreed. That's why we need to prioritize based on:

a. how many users are affected
b. reproducibility (hard to fix an issue if you can't replicate it or don't know the conditions that trigger it)
c. severity

This looks quite severe but, unfortunately, we don't know how to reproduce/trigger this issue. The excellent analysis done by @AlexanderMoroz seems to indicate there is something special in this setup so we need help replicating these conditions so we can look into it.

Instead, you propose we do more work to prove to you this is a bug?

No, nobody is saying you need to prove this is a bug. All I said is we need a way to reproduce it. It is very difficult to fix a problem if we don't understand that conditions that trigger it.

Remind me why we should make a PR again? After all, it's our problem, right?

Nobody said that. In all my communications all I have said over and over is that we need help reproducing. I personally think that an issue that invalidates the session every 10 minutes is a pretty serious issue and if you are running into it other users may run into it, so it is in our best interest to find a solution. But again, we are running blind here.

@mmmoli
Copy link

mmmoli commented May 5, 2024

@dbarrosop I do get it. It's a balance and it's on us to give you something more to work with.

Thank you for acknowledging the effort @AlexanderMoroz already put into this. We'll try to find the time to probe further.

@onehassan
Copy link
Contributor

It is expected that requests will stop firing when the JWT has expired, because otherwise you'll receive a 401 response.
Could you provide further insights into your setup and the initialization of the Nhost clients?.
Timing also looks weird to me because the expiration is set to 15mns but you mention that it happens consistently every 10mns!
Any details that would help me reproduce this using a minimal setup would be very helpful.

@chiemerieokorie
Copy link

chiemerieokorie commented May 10, 2024

Can confirm this issue on my end -- been trying to fix it for weeks.

Packages:

    "next": "14.1.3",
    
    "@nhost/apollo": "^6.2.1",
    "@nhost/nextjs": "^2.1.10",
    "@nhost/nhost-js": "^3.0.11",
    "@nhost/react": "^3.4.1",
    "@nhost/react-apollo": "^11.0.1",

@dbarrosop
Copy link
Contributor

dbarrosop commented May 10, 2024

Could you provide more details about your setup and the conditions in which it triggers? Any help/info is welcomed as we are having issues reproducing this issue.

Thanks

@AlexanderMoroz
Copy link
Author

AlexanderMoroz commented May 10, 2024

Guys, next week I'll have some extra time to sort this out, so hopefully will give more info how to reproduce it.

My gut feelings tell's me this happens when you have few (we have two) separate nhost client instances.
In our case we have one which we use outside of react-cycle:

image

and one which is in react created by same function in NhostApolloProvider as part of your nhost/react-apollo package

image

at the second image we use nhost client which was created from nhost/nextjs package.
at the first image it's from nhost/nhostjs package

image

I'll try to give you more info next week.

@dbarrosop
Copy link
Contributor

dbarrosop commented May 13, 2024

A couple of questions:

  1. Do they connect to the same backend? I guess they do but better to double check.
  2. Are both clients expected to run on the same domain/browser? Mostly wondering if both clients might be writing/reading the same session

I have a couple of ideas we could try to get more datapoints:

  1. Is it possible to consolidate the clients or remove one of them? Even if it's just for testing in development or similar. This would confirm your theory, which I am inclined to agree on as there might be some race condition as both clients interact with the same session, potentially triggering at different times and stepping on each other.
  2. If (1) isn't possible maybe setting a VERY HIGH refreshIntervalTime (i.e. 2 hours) in one of the clients and see if the problem goes away (or rather, takes 2h to reproduce).
  3. We could also try setting autoSignIn and autoRefreshToken to false on one of the clients. This might actually make the problem go away if it is an issue with both writing to the same session.

@AlexanderMoroz
Copy link
Author

AlexanderMoroz commented May 13, 2024

hey!

So, answers to the first questions.

  1. Yep, they do connect to the same nhost backend instance
  2. Yep, they're running in the same browser/client.

About ideas:

  1. Hard to achieve but I'll try. That was my plan actually.

Right now we have a quick fix which is a bit ugly but works perfectly as workaround:

we run this function for each nhost client instance as soon as we create it:

import { NhostClient } from '@nhost/nhost-js';
import { wait } from '@htch/utils/ts';
import { jwtDecode } from 'jwt-decode';

export function listenToJWT(nhost: NhostClient) {
  const run: () => Promise<void> = async () => {
    const interpreter = nhost?.auth.client.interpreter;
    if (!interpreter) {
      await wait(1000);
      return run();
    }

    interpreter?.onTransition(async (state, event) => {
      if (['SIGNOUT', 'SIGNED_IN', 'TOKEN_CHANGED'].includes(event.type)) {
        if (!state.context.accessToken.value) return;
        const jwt = jwtDecode(state.context.accessToken.value);
        if (!jwt.exp) return;
        if (jwt.exp * 1000 < Date.now()) {
          console.warn(
            'refreshing session, JWT is incorrect: ',
            new Date(jwt.exp * 1000).toString()
          );

          await nhost.auth.refreshSession();
        }
      }
    });
  };

  run();
}

this is definitely not a fix, but it works as a temporary solution

I'll try your hints one by one and come back with any feedback I have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants