Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

LucasDruart · 2024-02-21T13:01:37Z

What does this PR do?

This PR adds spoken dialogue state tracking recipes for two datasets: spoken MultiWoz and SpokenWoz. I have tried my best to conform to speechbrain's conventions. Please let me know if there are any conventions I missed. I do not know if and how recipes should be tested.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

asumagic · 2024-03-05T12:10:25Z

~~#2441 also adds a T5 HuggingFace lobe as far as I could tell, we will see how we deal with this.~~

EDIT: Nevermind, it's not the same thing, thanks Adel.

LucasDruart · 2024-03-05T13:21:42Z

Hey @asumagic,
Thanks for taking a look into this PR, I have seen that the pre-commit pipeline fails because of the spokenwoz data preparation function which is too complicated. I believe the complexity is measured in number of lines of code, however I have trouble simplifying it since the formatting sometimes reformats into many lines. For instance line 146.

asumagic · 2024-03-05T14:18:39Z

Hey @asumagic, Thanks for taking a look into this PR, I have seen that the pre-commit pipeline fails because of the spokenwoz data preparation function which is too complicated. I believe the complexity is measured in number of lines of code, however I have trouble simplifying it since the formatting sometimes reformats into many lines. For instance line 146.

I think the linter measures cyclomatic complexity rather than the plain number of lines of code, so fixing formatting would probably not help. In this case, it could make sense to extract the whole for dialog_id, dialog_info in annotations.items(): loop into its own function.

As for the other pre-commit style check fails, you can fix them locally by running pre-commit run -a (pypi).

As for the other tests failing, the issue seems to be that recipe tests should be added. Recipe tests use a small subset of data and launches a dummy training for a fixed number of epochs, then asserting that some metric is respected.
(They are not run inside of CI; but rather run by SpeechBrain developers locally.)
As far as I can tell, a line should be added in tests/recipes/MultiWOZ.csv and a tests/recipes/SpokenWOZ.csv file should be created. Recipe tests are further explained in tests/recipes.

LucasDruart · 2024-03-06T10:55:49Z

Hi @asumagic,
I fixed the complexity and trailing space failed tests and added a test for each recipe. I had to commit without verifications the last commit since I had a failed test for large files for the sample audio (tests/samples/DST/SpokenWoz/audio_5700_train_dev/MUL1011.wav) required for the test. Let me know if I can do anything else.

asumagic · 2024-03-06T11:21:47Z

Now, some tests fail because:

There are some missing docstrings.
We have a check script that tries to ensure that training scripts use all of the YAML keys correctly. Because run.py and model.py are split, it fails to find the keys that are actually used in model.py and not run.py. Usually we just have a single, merged train.py, which should probably be done here (AFAIK Brain is not really intended for inference, only training/evaluation EDIT: to clarify, not what this PR does. what the PR does is OK IMO).

asumagic · 2024-03-06T11:22:32Z

BTW, I am still in the process of reviewing, but it does take some time.

LucasDruart · 2024-03-06T14:15:21Z

I added the missing docstrings and the missing tests.
Regarding the single train.py file, at the beginning I wanted to factorise the model between both datasets, thus separating the data pipeline from the model, but I couldn't manage to import it properly. Maybe you will have a solution for this when you get at that point in the review.
Sorry for the (many) changes as you are reviewing.

asumagic

Tried to review as much as I could but it remained less in-depth as I would have liked since I am not yet very familiar with this task. I have successfully run the dataset download steps and data preparation and run some steps of training so this part works OK.
I also made a number of changes you've probably noticed, let me know if they seem sane to you. The tests now pass.

There are some things I couldn't test and run (inference) as I don't think you have provided pre-trained models. If you are able to upload them, we could host it on the Dropbox and have it linked.

I would like other reviewers to also double-check the training script + metrics for correctness.

As for the duplication across training scripts (and hparams), I agree it isn't pretty, but I think this is just a flaw we have right now where very similar models just result in duplication across recipes. IMO this can be dealt with later if we figure out how to deduplicate dataset-agnostic things better.

speechbrain/decoders/seq2seq.py

recipes/MultiWOZ/dialogue_state_tracking/README.md

tests/samples/DST/SpokenWoz/audio_5700_train_dev/MUL1011.wav

speechbrain/utils/evaluate_dialogue_state_tracking.py

LucasDruart · 2024-04-10T14:31:00Z

Sorry for the delay and thank you for your review and changes which make a lot of sense! Regarding the other points:

I have not provided checkpoints but can provide them if needed. However it requires to re-run the whole training to reach the reported performances.
I corrected the few naming issues and missing documentation, hope its okay now.
I moved the Joint-Goal Accuracy tracker (used for training to keep a running JGA without storing the values) to metric_stats.py.
I left the common part of Dialogue State Tracking evaluation in utils to be able to import it from the recipes.
I moved the dataset specific evaluations to their recipe's meta folder.

Let me know what else I can do to go forward with this PR.

…loats

…d not searcher

…alogue

… file

Still need a further commit to download the test_manifest from dropbox once I can reupload it. The dataset download script was removed and merged into the README as we always avoid having shell scripts in the repo (might be worth simplifying the steps down the line and avoid the destructive renames etc.). The extract_audio script was renamed to explicitly refer to the DSTC11 version.

…fixes

…valuation script in each recipe

…art/speechbrain into dialogue_state_tracking

asumagic self-requested a review March 5, 2024 11:03

asumagic self-assigned this Mar 5, 2024

asumagic requested changes Mar 8, 2024

View reviewed changes

LucasDruart force-pushed the dialogue_state_tracking branch from d0eb938 to 433af3b Compare April 16, 2024 13:45

LucasDruart added 18 commits April 16, 2024 15:48

WIP: Adding spoken dialogue state tracking. To be tested.

f97e515

WIP: Changed augmentation and updated gradient check

5f04c12

WIP: Changing augmentation method and typos

e7c2219

WIP: Updating the fetch method for import of model

86f0283

WIP: Making the evaluation script common to both datasets

aec1030

FIX: the decoder expects ints as input ids while the memory returns f…

7acb781

…loats

WIP: Removing logging of logprobs bc obtained from teacher forcing an…

fa44284

…d not searcher

FIX: Removing check_gradients which returned all parameters at None

3a1ff58

FIX: Removing unnecessary parameters

9268404

FIX: Updating resampler and testsplit processing

20bb39c

FIX: Issue with searcher removing last token --> seq2seq fix coming

c394b05

FIX: The lengths already account from removing the EOS token

4d10e58

Updating headers

0b3dd0f

FIX: Resampling only the needed dialogue turn instead of the whole di…

e59abfd

…alogue

FIX: Uniformising yamls

7037110

FIX: Adding file name to SpokenWoz reports

4cd1996

Cleaned READMEs and models

0546257

Ran black over code

7989b56

LucasDruart and others added 23 commits April 16, 2024 15:48

FIX: Adding test for other yaml files, there needs to be one per yaml…

d53b202

… file

FIX: Adding missing docstrings

e49dc32

Avoid loading the entire file in MultiWOZ data prep

b112c3f

Some cleanups in multiWOZ data prep

50d64b9

Merge run+model to train.py in MultiWOZ

3697c35

Rename multiwoz_prepare_slu to multioz_dstc11_prepare

30f424a

Crash MultiWOZ training if not fp32

ad6e02e

Added missing extra_requirements.py for MultiWOZ

d5442e4

Fixed broken table formatting for GitHub

e6826fb

Make it more obvious what the title of the table is

75e392b

Update link to test manifest

c3c9852

Move spokenwoz prepare script + move instructions to README + README …

4b259b5

…fixes

Merge run+model into train.py for SpokenWoz

374a62e

Rename SpokenWoz to SpokenWOZ

5226ea1

typo

d2e89d6

Update recipe script name too

17aff14

Add fp32 check to spokenWOZ as well

aa63d2c

Remove unused activation keys from yaml

ca2b8be

Fix recipe test path again

6509ed9

Moving JGATracker to metrics_stats.py and dataset specific parts of e…

8f2c55d

…valuation script in each recipe

Moving evaluation of recipes to meta folders

85055eb

Codespell for singe ==> single

433af3b

LucasDruart requested a review from asumagic April 16, 2024 13:51

LucasDruart and others added 6 commits April 16, 2024 15:58

Updating pre-commit pipeline

5476fda

Merge branch 'develop' into dialogue_state_tracking

64547a4

Merge branch 'speechbrain:develop' into dialogue_state_tracking

16755ba

Fix: Doctring correction and code cleaning

19f845d

Merge branch 'dialogue_state_tracking' of https://github.com/LucasDru…

9d6e96d

…art/speechbrain into dialogue_state_tracking

Merge branch 'dialogue_state_tracking' of https://github.com/LucasDru…

dd3eb77

…art/speechbrain into dialogue_state_tracking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

LucasDruart commented Feb 21, 2024 •

edited

asumagic commented Mar 5, 2024 •

edited

LucasDruart commented Mar 5, 2024

asumagic commented Mar 5, 2024

LucasDruart commented Mar 6, 2024

asumagic commented Mar 6, 2024 •

edited

asumagic commented Mar 6, 2024

LucasDruart commented Mar 6, 2024

asumagic left a comment

LucasDruart commented Apr 10, 2024

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

Are you sure you want to change the base?

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

Conversation

LucasDruart commented Feb 21, 2024 • edited

What does this PR do?

PR review

asumagic commented Mar 5, 2024 • edited

LucasDruart commented Mar 5, 2024

asumagic commented Mar 5, 2024

LucasDruart commented Mar 6, 2024

asumagic commented Mar 6, 2024 • edited

asumagic commented Mar 6, 2024

LucasDruart commented Mar 6, 2024

asumagic left a comment

Choose a reason for hiding this comment

LucasDruart commented Apr 10, 2024

LucasDruart commented Feb 21, 2024 •

edited

asumagic commented Mar 5, 2024 •

edited

asumagic commented Mar 6, 2024 •

edited