Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"500 Internal Server Error" because of "Duplicate entry" after Upgrade to PI-3.9.1 #3810

Open
beckerr-rzht opened this issue Nov 11, 2023 · 4 comments
Assignees
Labels
Type: Possible bug Suspected bug by user

Comments

@beckerr-rzht
Copy link

We operate PrivacyIdea in production with a MariaDB Galera cluster consisting of 3 nodes.
Each database node is assigned a PI node, which communicates with the local MariaDB node (on localhost).
One of the nodes is selected as the master via keepalived.

This worked without any problems up to PI 3.8.1. Especially when the master was changed via keepalive.

However, after upgrading from PI-3.8.1 to PI-3.9.1, I now see many "duplicate key" errors.
Example:

pymysql.err.IntegrityError: (1062, "Duplicate entry '34685581' for key 'PRIMARY'")

The above exception was the direct cause of the following exception:
--
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry '34685581' for key 'PRIMARY'")
[SQL: INSERT INTO usercache (id, username, used_login, resolver, user_id, timestamp) VALUES (%(id)s, %(username)s, %(used_login)s, %(resolver)s, %(user_id)s, %(timestamp)s)]
[parameters: {'id': 34685581, 'username': 'XXXX', 'used_login': 'XXXX', 'resolver': 'LDAP', 'user_id': 'X-Y-Y-Y-X', 'timestamp': datetime.datetime(2023, 11, 10, 22, 47, 46, 752497)}]

This should actually prevented by the Galera cluster, as its configuration regarding autoincrement is correct:
https://mariadb.com/kb/en/auto_increment/#replication

SHOW VARIABLES LIKE 'auto_inc%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| auto_increment_increment | 3     |
| auto_increment_offset    | 1 (for node 1), 2 (for node 2), 3 (for node 3) |
+--------------------------+-------+

I have compared dumps of the PI database and see that many SEQUENCES have been added and suspect the cause of the problem there.

Example:

...
CREATE SEQUENCE `usercache_seq` start with 34685579 minvalue 1 maxvalue 9223372036854775806 increment by 0 cache 1000 nocycle ENGINE=InnoDB;
SELECT SETVAL(`usercache_seq`, 34688579, 0);

Since we first encountered the problem in the productive environment, I have this urgent question:

Is there a quick fix or workaround that gets around this problem?

@beckerr-rzht beckerr-rzht added the Type: Possible bug Suspected bug by user label Nov 11, 2023
@beckerr-rzht
Copy link
Author

It seems to be related to this commit:
e5e0027

And I understand that this commit should actually fix such problems. Strange.

@plettich
Copy link
Member

Hi @beckerr-rzht
Since we also support PostgreSQL (which always used sequences) we changed all the primary key columns to sequences in our database abstraction layer (SQLAlchemy). For databases which don't support sequences (MySQL) this would be translated to an AUTO INCREMENT statement.
Since MariaDB 10.6.1 it also supports sequences and so SQLAlchemy uses them as well (that is why we had to add all the missing sequences).
According to https://mariadb.com/kb/en/create-sequence/#arguments-to-create the sequences should use the auto_increment_increment value if it is created with an INCREMENT value of 0 (which we do).

You have always only one PI node writing to the db? So this error happens during a failover?

@beckerr-rzht
Copy link
Author

Good questions.
I would like to go into a little more detail and explain the cluster and the complete update process:

Cluster

There are three VMs with IDP (Shibboleth), PI and Mariadb Galera nodes (1:10.3.38-0ubuntu0.20.04.1).
Each IDP-Instance (Shibboleth) talks to the local PI-Instance, which talks itself to the local Galera cluster node.
Normally VM1 is designated as the master by keepalived.

Before the update to PI 3.9.1:

VM1 (Master) IDP1 -> PI1 (3.8.1) -> MARIADB-CLUSTER-NODE1
VM2          IDP2 -> PI2 (3.8.1) -> MARIADB-CLUSTER-NODE2
VM3          IDP3 -> Pi3 (3.8.1) -> MARIADB-CLUSTER-NODE3

Updating PI

Our seamess and quite well tested update procedure currently takes about 5 minutes and looks like this:

  1. Execute privacyidea-pi-update on VM3 and let alter the database

  2. Execute privacyidea-pi-update on VM2 and let alter the database (*)

  3. Switch over keepalive master from VM1 to VM2:

    VM1          IDP1 -> PI1 (3.8.1) -> MARIADB-CLUSTER-NODE1
    VM2 (Master) IDP2 -> PI2 (3.9.1) -> MARIADB-CLUSTER-NODE2
    VM3          IDP3 -> Pi3 (3.9.1) -> MARIADB-CLUSTER-NODE3
    
  4. Execute privacyidea-pi-update on VM1 and let alter the database (*)

  5. Switch back keepalive master from VM2 to VM1

    (*) no changes should be made to the database, as this was already done in step 1 (but in fact this is my assumption).

Result

VM1 (Master) IDP1 -> PI1 (3.8.1) -> MARIADB-CLUSTER-NODE1
VM2          IDP2 -> PI2 (3.9.1) -> MARIADB-CLUSTER-NODE2
VM3          IDP3 -> Pi3 (3.9.1) -> MARIADB-CLUSTER-NODE3

Errors

The first errors occurred in this state immediately after the update to PI 3.9.1.

usercache

sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry '34685581' for key 'PRIMARY'")
[SQL: INSERT INTO usercache (id, username, used_login, resolver, user_id, timestamp) VALUES (%(id)s, %(username)s, %(used_login)s, %(resolver)s, %(user_id)s, %(timestamp)s)]
[parameters: {'id': 34685581, 'username': 'XXXX', 'used_login': 'XXXX', 'resolver': 'XXXX', 'user_id': 'XXXX', 'timestamp': datetime.datetime(2023, 11, 10, 22, 47, 46, 752497)}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

The PI-API reports this error as HTTP 500:

2023-11-10 22:47:55,123 - 10.249.131.222 - ERROR [de.zedat.fudis.shibboleth.idp.plugin.authn.fudiscr.impl.predicate.impl.UserHasAnyTokenPredicate:69] - Predicate UserHasAnyTokenPredicate: Backend error for user 'XXXX', predicate returns true.
de.zedat.fudis.shibboleth.idp.plugin.authn.fudiscr.impl.client.BackendException: org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 INTERNAL SERVER ERROR: "{"detail": null, "id": 1, "jsonrpc": "2.0", "result": {"error": {"code": -500, "message": "500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."}, "status": false}, "time": 1699652875.044161, "version": "privacyIDEA 3.9", "signature": "XXXX"}"

To get rid of these errors, I switched off and emptied the usercache table.

privacyidea.lib.auditmodules.sqlaudit:328

The next morning I continued testing and connected my browser (via an entry in the /etc/hosts file) to VM3.
At this point, data from PI1 and PI3 was written to the database.
I have used this many times before, but have never run into errors, because both PI and Mariadb-Galera allow this.

This morning, however, I received errors of this type:

[2023-11-11 09:48:26,875][1436434][140022750586624][ERROR][privacyidea.lib.auditmodules.sqlaudit:328] exception IntegrityError('(pymysql.err.IntegrityError) (1062, "Duplicate entry \'36912748\' for key \'PRIMARY\'")')
[2023-11-11 09:48:26,876][1436434][140022750586624][ERROR][privacyidea.lib.auditmodules.sqlaudit:329] DATA: {'startdate': datetime.datetime(2023, 11, 11, 9, 48, 26, 148965), 'success': True, 'serial': None, 'user': 'XXXX', 'realm': 'XXXX', 'resolver': 'XXXX', 'token_type': None, 'client': '127.0.0.1', 'client_user_agent': None, 'privacyidea_server': 'idp-1', 'action': 'GET /token/', 'action_detail': '', 'thread_id': '140022750586624', 'info': 'realm: None', 'administrator': 'XXXX', 'policies': 'XXXX'}

A little later I created this issue.
Since then I have not done any failover via keepalive or connected my browser to VM3.

@nilsbehlen
Copy link
Member

hey @plettich did we fix this already? maybe in 3.9.3 or .2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Possible bug Suspected bug by user
Projects
None yet
Development

No branches or pull requests

3 participants