Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx' is Not Found #185

Open
lostnighter opened this issue Feb 14, 2022 · 2 comments
Open

File 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx' is Not Found #185

lostnighter opened this issue Feb 14, 2022 · 2 comments

Comments

@lostnighter
Copy link

Hi! This file is needed for pretraining on Large corpus, but is not found. Could you share this file?

Thanks!

@jontooy
Copy link

jontooy commented Feb 16, 2022

Hi lostnighter,

I had the same problem when using OSCAR to fine-tune on image captioning with a custom dataset. I used this function to genereate the '.lineidx'-file

I guess that in your case you have a 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv' file. If that is true, you should try the function above, with parameters:

`
filein, idxout = 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv', 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx'

Let me know if it works!
`

@lostnighter
Copy link
Author

Hi jontooy,
I download this file via azcopy as follows:
path/to/azcopy copy https://biglmdiag.blob.core.windows.net/vinvl/pretrain_corpus/coco_flickr30k_googlecc_gqa_sbu_oi.lineidx ./ --recursive

This url is not given. I just try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants