HITL - Rearrange session handling #1965

0mdc · 2024-05-19T15:50:55Z

Motivation and Context

This changeset adds session handling in rearrange_v2.

A session is defined as a sequence of episodes to be done by a fixed set of users.
The session is the unit of work that is collected during a HITL experiment.

Additions:

Session management.
Episode loading from connection parameters.
Load screen.
Start screen.
End session screen.

state_machine_2-2024-05-19_09.44.03.mp4

How it works

The state machine is extended as such:

Reset (initial state):

Kicks all users.
Previous session is deleted.
Changes to Lobby when all users are disconnected.

Lobby:

Blank screen.
New connections are only allowed during this state.
Changes to Start Session when max_user_count are connected.

Start Session:

Parses the episodes field from the users connection parameters.
- This comes from the URL parameters in WebGL, or a config file on Android.
The Session object is created.
Switches to Load Episode when successful. Cancels session otherwise.

Load Episode:

Increment the session episode index.
If the next episode exists:
- Load it.
- Show a loading screen
- Switch to Start Screen
If the last episode is completed, change to End Session.
If any user disconnects, cancel the session.

Start Screen:

Show a top-down view of the scene with a "Start" button.
Change to RearrangeV2 when all users pressed the button.
If any user disconnects or the state times out, cancel the session.

RearrangeV2:

HITL application.
Change to Load Episode when all users signaled the episode as completed.
If any user disconnects or is AFK, cancel the session.

End Session:

Show a "Session ended" screen.
[Upcoming PR] Upload the session data to S3.
Switch to Reset when upload is finished.

How Has This Been Tested

Tested in multiplayer HITL application.

Types of changes

[Development]

Checklist

My code follows the code style of this project.
I have updated the documentation if required.
I have read the CONTRIBUTING document.
I have completed my CLA (see CONTRIBUTING)
I have added tests to cover my changes if required.

0mdc

Comments for reviewers.

0mdc · 2024-05-19T15:52:24Z

examples/hitl/rearrange_v2/app_state_start_screen.py

+ destination_mask=Mask.from_index(user_index),
+ )
+
+ # Server-only: Press numeric keys to start episode on behalf of users.


This is for easing local testing.

0mdc · 2024-05-19T15:55:38Z

examples/hitl/rearrange_v2/app_state_start_session.py

+ Attempt to get episodes from client connection parameters.
+ Episode IDs are indices within the episode sets.
+
+ Format: {lower_bound_inclusive}-{upper_bound_exclusive} (e.g. "100-110").


Note that the current episode sets we use have IDs that match their index within the episode set (e.g. the ID of the first episode is a "0" string).
This approach is error-prone and sloppy.

This will change in the future so that a list of episode IDs can be specified instead of a range of episode indices.

Where will these sets need to come from? Is this an artifact of supporting how episode-sets were created in the first place and stored?

This is an artifact of how episode sets were created.

Where will these sets need to come from?

They would be generated by the episode generator, possibly in json format, and fed to whichever system is responsible for generating HITL tasks (e.g. Psiturk).

Right now, we are limited to group episodes contiguously to satisfy the range-based system. This is error-prone and inflexible. For example, if we want N sets to start with a "practice episode", we need to copy the tutorial N times so that episode ranges stay contiguous.

examples/hitl/rearrange_v2/util.py

0mdc · 2024-05-19T16:01:01Z

examples/hitl/rearrange_v2/app_state_load_episode.py

+ # Wait for clients to signal that content finished loading on their end.
+ # HACK: The server isn't immediately aware that clients are loading. For now, we simply skip some frames.
+ # TODO: Use the keyframe ID from 'ClientMessageManager.set_server_keyframe_id()' to find the when the loading state is up-to-date.
+ if self._frame_number > 20:


This hack will be addressed in a future PR.

The correct approach to handle this is to look at the frame number when the new scene keyframe is sent, and wait for all clients to echo back this frame number.

This feature is supported but coupled with other systems. See ClientHelper.

This is to ensure that both clients are looking at the same state, right?

This is to ensure that the server doesn't progress to the start screen until all clients finished downloading the content.

zephirefaith

Nits for readability, let's add explicit issues with BE flag for the few places I flagged. Generally this looks good.

Thanks for your hard-work on this!

zephirefaith · 2024-05-20T15:55:33Z

examples/hitl/rearrange_v2/app_states.py

+def create_app_state_start_screen(
+ app_service: AppService, app_data: AppData, session: Session
+) -> AppStateBase:
+ from app_state_start_screen import AppStateStartScreen
+
+ return AppStateStartScreen(app_service, app_data, session)


I like this way of organizing and calling code.
There are a few misdirections at the beginning, but once I understand how code is laid out, easier to follow from there on.

zephirefaith · 2024-05-20T16:01:13Z

examples/hitl/rearrange_v2/rearrange_v2.py

 def get_next_state(self) -> Optional[AppStateBase]:
- # If cancelled, skip upload and clean-up.
- if self._cancel or self._is_episode_finished():
- return create_app_state_reset(self._app_service, self._app_data)
+ if self._cancel:
+ return create_app_state_cancel_session(
+ self._app_service,
+ self._app_data,
+ self._session,
+ "User disconnected",
+ )
+ elif self._is_episode_finished():
+ return create_app_state_load_episode(
+ self._app_service, self._app_data, self._session
+ )


OK so rearrange-state exclusively feeds into either loading of next episode or canceling sessions.

Does it matter if the session is canceled due to completion or inaction? Probably handled in app_state_end_session?

In these cases, users are kicked, which causes "user disconnected" error.

Not ideal, but could be improved with some refactoring.

examples/hitl/rearrange_v2/util.py

zephirefaith · 2024-05-20T16:04:23Z

examples/hitl/rearrange_v2/session.py

+ episode_ids: List[str],
+ connection_records: Dict[int, ConnectionRecord],
+ ):
+ self.success = False


Is this per episode? What does success mean for a session?
If not for session, suggest renaming for clarity, e.g. self.last_episode_success

zephirefaith · 2024-05-20T16:32:24Z

examples/hitl/rearrange_v2/app_state_start_screen.py

+ # If all users pressed the "Start" button, begin the session.
+ ready_to_start = True
+ for user_ready in self._ready_to_start:
+ ready_to_start &= user_ready
+ if ready_to_start or SKIP_START_SCREEN:
+ return create_app_state_rearrange(
+ self._app_service, self._app_data, self._session
+ )
+
+ return None


Assuming this keeps getting called in a input--get-next-state--execute loop?

Correct, this is handled by the state machine.

examples/hitl/rearrange_v2/app_state_start_screen.py

zephirefaith · 2024-05-20T16:45:14Z

examples/hitl/rearrange_v2/app_state_load_episode.py

+ # Wait for clients to signal that content finished loading on their end.
+ # HACK: The server isn't immediately aware that clients are loading. For now, we simply skip some frames.
+ # TODO: Use the keyframe ID from 'ClientMessageManager.set_server_keyframe_id()' to find the when the loading state is up-to-date.
+ if self._frame_number > 20:


This is to ensure that both clients are looking at the same state, right?

* Add session management. * Formatting changes. * Add clarifications to episode resolution. * Document temporary hack to check for client-side loading status. * Review pass - variable renaming and camera matrix chaching.

0mdc added 4 commits May 19, 2024 09:46

Add session management.

22e51de

Formatting changes.

fd126c3

Add clarifications to episode resolution.

0ec3939

Document temporary hack to check for client-side loading status.

ab29747

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label May 19, 2024

0mdc commented May 19, 2024

View reviewed changes

0mdc requested review from jturner65, aclegg3, zephirefaith and Ram81 May 19, 2024 16:02

0mdc mentioned this pull request May 20, 2024

HITL - Data collection #1967

Merged

5 tasks

zephirefaith approved these changes May 20, 2024

View reviewed changes

Review pass - variable renaming and camera matrix chaching.

60596bc

0mdc merged commit 11ec7bc into main May 20, 2024
3 of 4 checks passed

0mdc deleted the 0mdc/hitl_session branch May 20, 2024 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HITL - Rearrange session handling #1965

HITL - Rearrange session handling #1965

0mdc commented May 19, 2024

0mdc left a comment

0mdc May 19, 2024

0mdc May 19, 2024

zephirefaith May 20, 2024

0mdc May 20, 2024

0mdc May 19, 2024

zephirefaith May 20, 2024

0mdc May 20, 2024

zephirefaith left a comment

zephirefaith May 20, 2024

zephirefaith May 20, 2024

0mdc May 20, 2024

zephirefaith May 20, 2024

zephirefaith May 20, 2024

0mdc May 20, 2024

zephirefaith May 20, 2024

HITL - Rearrange session handling #1965

HITL - Rearrange session handling #1965

Conversation

0mdc commented May 19, 2024

Motivation and Context

How it works

How Has This Been Tested

Types of changes

Checklist

0mdc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zephirefaith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment