Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

Open
wants to merge 53 commits into
base: develop
Choose a base branch
from

Conversation

LucasDruart
Copy link

@LucasDruart LucasDruart commented Feb 21, 2024

What does this PR do?

This PR adds spoken dialogue state tracking recipes for two datasets: spoken MultiWoz and SpokenWoz. I have tried my best to conform to speechbrain's conventions. Please let me know if there are any conventions I missed. I do not know if and how recipes should be tested.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@asumagic asumagic self-requested a review March 5, 2024 11:03
@asumagic asumagic self-assigned this Mar 5, 2024
@asumagic
Copy link
Collaborator

asumagic commented Mar 5, 2024

#2441 also adds a T5 HuggingFace lobe as far as I could tell, we will see how we deal with this.

EDIT: Nevermind, it's not the same thing, thanks Adel.

@LucasDruart
Copy link
Author

Hey @asumagic,
Thanks for taking a look into this PR, I have seen that the pre-commit pipeline fails because of the spokenwoz data preparation function which is too complicated. I believe the complexity is measured in number of lines of code, however I have trouble simplifying it since the formatting sometimes reformats into many lines. For instance line 146.

@asumagic
Copy link
Collaborator

asumagic commented Mar 5, 2024

Hey @asumagic, Thanks for taking a look into this PR, I have seen that the pre-commit pipeline fails because of the spokenwoz data preparation function which is too complicated. I believe the complexity is measured in number of lines of code, however I have trouble simplifying it since the formatting sometimes reformats into many lines. For instance line 146.

I think the linter measures cyclomatic complexity rather than the plain number of lines of code, so fixing formatting would probably not help. In this case, it could make sense to extract the whole for dialog_id, dialog_info in annotations.items(): loop into its own function.

As for the other pre-commit style check fails, you can fix them locally by running pre-commit run -a (pypi).

As for the other tests failing, the issue seems to be that recipe tests should be added. Recipe tests use a small subset of data and launches a dummy training for a fixed number of epochs, then asserting that some metric is respected.
(They are not run inside of CI; but rather run by SpeechBrain developers locally.)
As far as I can tell, a line should be added in tests/recipes/MultiWOZ.csv and a tests/recipes/SpokenWOZ.csv file should be created. Recipe tests are further explained in tests/recipes.

@LucasDruart
Copy link
Author

Hi @asumagic,
I fixed the complexity and trailing space failed tests and added a test for each recipe. I had to commit without verifications the last commit since I had a failed test for large files for the sample audio (tests/samples/DST/SpokenWoz/audio_5700_train_dev/MUL1011.wav) required for the test. Let me know if I can do anything else.

@asumagic
Copy link
Collaborator

asumagic commented Mar 6, 2024

Now, some tests fail because:

  • There are some missing docstrings.
  • We have a check script that tries to ensure that training scripts use all of the YAML keys correctly. Because run.py and model.py are split, it fails to find the keys that are actually used in model.py and not run.py. Usually we just have a single, merged train.py, which should probably be done here (AFAIK Brain is not really intended for inference, only training/evaluation EDIT: to clarify, not what this PR does. what the PR does is OK IMO).

@asumagic
Copy link
Collaborator

asumagic commented Mar 6, 2024

BTW, I am still in the process of reviewing, but it does take some time.

@LucasDruart
Copy link
Author

I added the missing docstrings and the missing tests.
Regarding the single train.py file, at the beginning I wanted to factorise the model between both datasets, thus separating the data pipeline from the model, but I couldn't manage to import it properly. Maybe you will have a solution for this when you get at that point in the review.
Sorry for the (many) changes as you are reviewing.

Copy link
Collaborator

@asumagic asumagic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to review as much as I could but it remained less in-depth as I would have liked since I am not yet very familiar with this task. I have successfully run the dataset download steps and data preparation and run some steps of training so this part works OK.
I also made a number of changes you've probably noticed, let me know if they seem sane to you. The tests now pass.

There are some things I couldn't test and run (inference) as I don't think you have provided pre-trained models. If you are able to upload them, we could host it on the Dropbox and have it linked.

I would like other reviewers to also double-check the training script + metrics for correctness.

As for the duplication across training scripts (and hparams), I agree it isn't pretty, but I think this is just a flaw we have right now where very similar models just result in duplication across recipes. IMO this can be dealt with later if we figure out how to deduplicate dataset-agnostic things better.

speechbrain/decoders/seq2seq.py Outdated Show resolved Hide resolved
recipes/MultiWOZ/dialogue_state_tracking/README.md Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
speechbrain/utils/evaluate_dialogue_state_tracking.py Outdated Show resolved Hide resolved
@LucasDruart
Copy link
Author

Sorry for the delay and thank you for your review and changes which make a lot of sense! Regarding the other points:

  • I have not provided checkpoints but can provide them if needed. However it requires to re-run the whole training to reach the reported performances.
  • I corrected the few naming issues and missing documentation, hope its okay now.
  • I moved the Joint-Goal Accuracy tracker (used for training to keep a running JGA without storing the values) to metric_stats.py.
  • I left the common part of Dialogue State Tracking evaluation in utils to be able to import it from the recipes.
  • I moved the dataset specific evaluations to their recipe's meta folder.

Let me know what else I can do to go forward with this PR.

LucasDruart and others added 23 commits April 16, 2024 15:48
Still need a further commit to download the test_manifest from dropbox once
I can reupload it.
The dataset download script was removed and merged into the README as we
always avoid having shell scripts in the repo (might be worth simplifying
the steps down the line and avoid the destructive renames etc.).
The extract_audio script was renamed to explicitly refer to the DSTC11 version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants