Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BERT/PyTorch] How can we use create_datasets_from_start.sh for BERT pretraining #1359

Open
Druva24 opened this issue Oct 9, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@Druva24
Copy link

Druva24 commented Oct 9, 2023

Related to Model/Framework(s)
(e.g. GNMT/PyTorch or FasterTransformer/All)

BERT/PyTorch

In Readme.md of https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT
it is mentioned that on running create_datasets_from_start.sh, it will generate pretraining dataset for BERT. However whenever I tried to run the given shell script, it resuled in an error: download_wikipedia: command not found, I believe this is happening because, lddl is moved to a different repo here : https://github.com/NVIDIA/LDDL?, If that is the reason, what are the steps that I need to inorder to generate a pre training dataset, do we need any sudo privileges for running lddl on a slurm cluster. We are using slurm, and I don't have any sudo privileges, if lddl requires sudo privileges, do we have any alternatives for using lddl?

@Druva24 Druva24 added the bug Something isn't working label Oct 9, 2023
@sanjeebtiwary
Copy link

The script likely assumes that certain dependencies, including the download_wikipedia command, are available in your environment. However, it seems that there might be changes or issues with the dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants