New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424
base: develop
Are you sure you want to change the base?
Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424
Conversation
EDIT: Nevermind, it's not the same thing, thanks Adel. |
Hey @asumagic, |
I think the linter measures cyclomatic complexity rather than the plain number of lines of code, so fixing formatting would probably not help. In this case, it could make sense to extract the whole As for the other pre-commit style check fails, you can fix them locally by running As for the other tests failing, the issue seems to be that recipe tests should be added. Recipe tests use a small subset of data and launches a dummy training for a fixed number of epochs, then asserting that some metric is respected. |
Hi @asumagic, |
Now, some tests fail because:
|
BTW, I am still in the process of reviewing, but it does take some time. |
I added the missing docstrings and the missing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to review as much as I could but it remained less in-depth as I would have liked since I am not yet very familiar with this task. I have successfully run the dataset download steps and data preparation and run some steps of training so this part works OK.
I also made a number of changes you've probably noticed, let me know if they seem sane to you. The tests now pass.
There are some things I couldn't test and run (inference) as I don't think you have provided pre-trained models. If you are able to upload them, we could host it on the Dropbox and have it linked.
I would like other reviewers to also double-check the training script + metrics for correctness.
As for the duplication across training scripts (and hparams), I agree it isn't pretty, but I think this is just a flaw we have right now where very similar models just result in duplication across recipes. IMO this can be dealt with later if we figure out how to deduplicate dataset-agnostic things better.
Sorry for the delay and thank you for your review and changes which make a lot of sense! Regarding the other points:
Let me know what else I can do to go forward with this PR. |
d0eb938
to
433af3b
Compare
Still need a further commit to download the test_manifest from dropbox once I can reupload it. The dataset download script was removed and merged into the README as we always avoid having shell scripts in the repo (might be worth simplifying the steps down the line and avoid the destructive renames etc.). The extract_audio script was renamed to explicitly refer to the DSTC11 version.
…valuation script in each recipe
…art/speechbrain into dialogue_state_tracking
…art/speechbrain into dialogue_state_tracking
What does this PR do?
This PR adds spoken dialogue state tracking recipes for two datasets: spoken MultiWoz and SpokenWoz. I have tried my best to conform to speechbrain's conventions. Please let me know if there are any conventions I missed. I do not know if and how recipes should be tested.
Before submitting
PR review
Reviewer checklist