[WIP] scheduler for running operations subsequently #1095

Hoxo · 2023-07-31T11:56:02Z

No description provided.

github-actions · 2023-07-31T11:58:31Z

Unit Test Results

26 tests - 321 23 ✔️ - 318 15s ⏱️ - 15m 22s
  6 suites -   63   0 💤 -     6
  6 files -   63   2 ❌ +    2 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit 9559e47. ± Comparison against base commit d516052.

This pull request removes 340 and adds 19 tests. Note that renamed tests count towards both.

ai.lzy.allocator.test.AdminDaoTest ‑ emptyOnStart
ai.lzy.allocator.test.AdminDaoTest ‑ jupyterLab
ai.lzy.allocator.test.AdminDaoTest ‑ sync
ai.lzy.allocator.test.AdminDaoTest ‑ workers
ai.lzy.allocator.test.AllocatorAdminServiceTest ‑ adminAccess
ai.lzy.allocator.test.AllocatorAdminServiceTest ‑ noAccess
ai.lzy.allocator.test.AllocatorServiceCacheLimitsTest ‑ noLimits
ai.lzy.allocator.test.AllocatorServiceCacheLimitsTest ‑ poolLimit
ai.lzy.allocator.test.AllocatorServiceCacheLimitsTest ‑ userLimitMultipleSessions
ai.lzy.allocator.test.AllocatorServiceCacheLimitsTest ‑ userLimitSingleSession
…

ai.lzy.longrunning.OperationTaskDaoImplTest ‑ create
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ delete
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ deleteUnknown
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ getUnknown
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ lockPendingBatch
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ lockPendingBatchWithAllRunning
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ multiCreate
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ update
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ updateLease
ai.lzy.longrunning.OperationTaskDaoImplTest ‑ updateLeaseUnknown
…

♻️ This comment has been updated with latest results.

imakunin · 2023-07-31T13:40:01Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/dao/OperationTaskDaoImpl.java

+ @Override
+ public OperationTask get(long id, @Nullable TransactionHandle tx) throws SQLException {
+ return DbOperation.execute(tx, storage, c -> {
+ try (PreparedStatement ps = c.prepareStatement(SELECT_QUERY)) {


shouldn't we add FOR UPDATE if tx is not null?

in common scenario we do the next:

var tx = start_tx(); var some_state = dao.get(tx); ... some business logic ... dao.update(new_state, tx); <-- simple UPDATE, not CAS tx.commit();

if we do simple UPDATE in tx, then we should add FOR UPDATE to our SELECT query

Not sure if it's necessary because we don't have long-lasting transactions that require read-and-update. It may be useful, of course, if we want to ensure that operation_task hasn't been updated by other instance (in case of parallel execution which is not desirable). So I'll revise the code and think about this problem

imakunin · 2023-08-01T09:00:43Z

lzy/allocator/src/main/java/ai/lzy/allocator/task/MountDynamicDiskResolver.java

+
+ public MountDynamicDiskResolver(VmDao vmDao, DynamicMountDao dynamicMountDao, AllocationContext allocationContext,
+ OperationTaskDao operationTaskDao, OperationTaskScheduler taskScheduler,
+ Duration leaseDuration)


leaseDuration is not a bean

Correct, but it's just an example. It still requires fixes for circular dependencies and this configuration

Hoxo · 2023-08-01T10:44:12Z

lzy/allocator/src/main/resources/db/allocator/migrations/V10__operation_task.sql

@@ -0,0 +1,21 @@
+CREATE TYPE task_status AS ENUM ('PENDING', 'RUNNING', 'FAILED', 'FINISHED', 'STALE');
+
+CREATE TYPE task_type AS ENUM ('UNMOUNT', 'MOUNT');


There's should be one type per one action. So this enum could be extended in future migrations to support new types of actions.

Hoxo · 2023-08-01T11:15:43Z

lzy/allocator/src/main/resources/db/allocator/migrations/V10__operation_task.sql

+
+CREATE TYPE task_type AS ENUM ('UNMOUNT', 'MOUNT');
+
+CREATE TABLE IF NOT EXISTS operation_task(


The main idea is to provide DB as a single source of truth about order of task execution.
Here's short explaination of operation_task fields:

id - is bigserial and thus generated on insert of task. This is the main way to present order among certain tasks (see entity_id).

name - for debug and readability purposes

entity_id - this is the way to group tasks by some user-generated text id. Tasks with same entity_id are executed subsequently according the id field (in ascending order). Thus, task with smaller id will be executed first. Tasks with different entity_id can be executed in parallel.

type - is necessary to match code representation of a task

status - status of a task.

created_at, updated_at - self-explainatory, for debug purposes

metadata - JSON to keep task arguments and other useful information about the task. The content of this field is defined by user and parsed mainly depending by the type.

operation_id - an operation that is linked to a task. Contains all details about execution. There should be (0-1) <-> 1 relation between a task and an operation.

worker_id - name of the instance that captured a task. This is needed to ensure that a task is executed just once.

lease_till - deadline for scheduler instance to execute this task. Scheduler instance should update lease_till field. In case of instance death or any other reason that make instance impossible to finish a task, another scheduler instance can "capture" the task with expired lease_till deadline and replace worker_id field.

Hoxo · 2023-08-01T11:17:26Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/DispatchingOperationTaskResolver.java

+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class DispatchingOperationTaskResolver implements OperationTaskResolver {


Task resolver that can accept a list of different typed resolver to choose resolver by a task type.

Hoxo · 2023-08-01T11:20:00Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/OpTaskAwareAction.java

+
+import static ai.lzy.model.db.DbHelper.withRetries;
+
+public abstract class OpTaskAwareAction extends OperationRunnerBase {


Type of action that is connected to a task. All inheritants of this class will be executed by task scheduler.

Hoxo · 2023-08-01T11:20:47Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/OpTaskAwareAction.java

+ }
+
+ @Override
+ protected void beforeStep() {


New step in operation execution to update task lease deadline

Hoxo · 2023-08-01T11:23:22Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/OpTaskAwareAction.java

+ }
+
+ @Override
+ protected void notifyFinished() {


Task should be moved to a final status on operation finish

Hoxo · 2023-08-01T11:28:23Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/OperationTask.java

+ metadata, operationId, null, null);
+ }
+
+ public enum Status {


Assumed workflow:

┌─────────┐ ┌─────────┐ ┌──────────┐ │ PENDING ├─────► RUNNING ├──────► FINISHED │ └────┬────┘ └────┬────┘ └──────────┘ │ │ │ │ ┌───▼───┐ ┌───▼────┐ │ STALE │ │ FAILED │ └───────┘ └────────┘

Hoxo · 2023-08-01T11:29:22Z

lzy/long-running/src/main/java/ai/lzy/longrunning/task/OperationTaskResolver.java

+
+import java.sql.SQLException;
+
+public interface OperationTaskResolver {


Component that is used to match and create code representation to a task from DB.

Hoxo added 9 commits July 31, 2023 00:19

task queue draft

f14b23e

dao task queue test

d7a2b24

new naming

0afbb71

inc migration

0d7e4e5

and task quota per instance

2c601d8

add tests

486360b

renaming

d55a53e

few more tests

5d0a2b2

example

4c9049c

imakunin reviewed Jul 31, 2023

View reviewed changes

Merge branch 'master' into hoxo/serialize-mount-operations

9559e47

imakunin reviewed Aug 1, 2023

View reviewed changes

Hoxo commented Aug 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] scheduler for running operations subsequently #1095

[WIP] scheduler for running operations subsequently #1095

Hoxo commented Jul 31, 2023

github-actions bot commented Jul 31, 2023 •

edited

imakunin Jul 31, 2023

Hoxo Aug 1, 2023

imakunin Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

Hoxo Aug 1, 2023

		@@ -0,0 +1,21 @@
		CREATE TYPE task_status AS ENUM ('PENDING', 'RUNNING', 'FAILED', 'FINISHED', 'STALE');

		CREATE TYPE task_type AS ENUM ('UNMOUNT', 'MOUNT');


		CREATE TYPE task_type AS ENUM ('UNMOUNT', 'MOUNT');

		CREATE TABLE IF NOT EXISTS operation_task(


		import static ai.lzy.model.db.DbHelper.withRetries;

		public abstract class OpTaskAwareAction extends OperationRunnerBase {


		import java.sql.SQLException;

		public interface OperationTaskResolver {

[WIP] scheduler for running operations subsequently #1095

Are you sure you want to change the base?

[WIP] scheduler for running operations subsequently #1095

Conversation

Hoxo commented Jul 31, 2023

github-actions bot commented Jul 31, 2023 • edited

Unit Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2023 •

edited