Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurent Requests for same User #4

Open
Akasch opened this issue Apr 19, 2020 · 5 comments
Open

Concurent Requests for same User #4

Akasch opened this issue Apr 19, 2020 · 5 comments

Comments

@Akasch
Copy link

Akasch commented Apr 19, 2020

At the moment it is possible that the code is concurrently updating the same user in parallel in response to either multiple requests with the same user or by running a request and the cron update at the same time.

I think nothing bad happens from it beside unnecessary requests to the OSM API.

It could be prevented by usage of transactions over the whole process for each user and SELECT FOR UPDATE on the changesets_walker_state entry for the user.

@westnordost
Copy link
Member

What do you mean with transaction over the whole process?

Anyway, regarding the cron, the update_incomplete_statistics.php will only touch users that have not been updated for 30 seconds.

@Akasch
Copy link
Author

Akasch commented Apr 21, 2020

I mean something like the following (very ugly pseudocode, mixing SQL and php:

[...]
"START TRANSACTION"
$status = "SELECT * FROM changesets_walker_state WHERE user_id=$user_id FOR UPDATE SKIP LOCKED"; // The skip locked bit seems to be relatively new in mysql, sice 8.0
$ok = True;
if (len($status) > 0 && $status['last_update']  < now() -30 && ...) {
    $ok = $changesets_walker->analyzeUser($status, $user_id);
}
if $ok {
    "COMMIT"
} else {
    "ROLLBACK"
    return error();
}
return get_user_statistics();

In this scenario the Database enforces that only one process can update a user at every given time. As mysql cant handle nested transactions the existing transactions would have to be rewritten using checkpoints with "SAVEPOINT name" and "ROLLBACK TO name"

The problem I meant with the update_incomplete_statistics.php is that while it is running the user could send a new request (or multiple of them) and it would be processed in parallel doing unnecessary work.

@Akasch
Copy link
Author

Akasch commented Apr 21, 2020

The code above never analse users not in the database as it is the same as a skip over an user in processing, so that have to be checked first. An alternative would be to just let the transaction wait on the lock to be released (remove SKIP LOCKED) but this would lead to many php processes waiting for the analyse step to be finished in the worst case, and if the OSM API is very slow to respond it could take some time and wherefore many waiting processes potentially.

@westnordost
Copy link
Member

// The skip locked bit seems to be relatively new in mysql, sice 8.0

Hmm I only have an older version available.

The problem I meant with the update_incomplete_statistics.php is that while it is running the user could send a new request (or multiple of them) and it would be processed in parallel doing unnecessary work.

Ok maybe it is possible to set a user-specific mutex/synchronized/semaphore in PHP? I did a short search and found so many completely different approaches (some only work on Linux, some are deprecated/removed, some are dependent on other libraries) that it overwhelmed me.

@mnalis
Copy link
Member

mnalis commented Feb 25, 2022

I don't see why SKIP LOCKED would be needed (or even correct) here @Akasch, shouldn't FOR UPDATE be enough?
e.g. https://stackoverflow.com/a/27868682/2600099

(Of course one has to be having InnoDB tables, and not myISAM)

Locking at SQL server level is most centralizes, as your data itself is there. There are of course many other ways if you insist on client-side locking. Most portable is flock, e.g. https://stackoverflow.com/a/55399220/2600099 but that assume single local filesystem access (so no NFS, SMBFS etc)

There are other, more performant ways, like the use local IPC methods like semaphores, but that also assume running GNU/Linux (or some other POSIX machine) and have their own caveats).

TL;DR: unless you plan to change the database, I'd stick with mysql innodb transactions + select for update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants