Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets ? #2

Open
Khalife opened this issue Apr 12, 2017 · 3 comments
Open

Datasets ? #2

Khalife opened this issue Apr 12, 2017 · 3 comments

Comments

@Khalife
Copy link

Khalife commented Apr 12, 2017

Good afternoon,

And thanks again for making available the source code of your project.
Do you have an idea about when the corresponding datasets will be available ?

Thanks in advance,

Sammy Khalife

@octavian-ganea
Copy link
Contributor

octavian-ganea commented Apr 12, 2017

Hi. The indexes have a few tens of GB, so we can send them on demand to people that are interested. Please contact us (e-mails in the paper) to see how can we do this transfer.

@Khalife
Copy link
Author

Khalife commented Apr 13, 2017

Hi, Thanks for your answer.
I just sent a mail about this.

I still have the following questions in mind :
-Have you annotated every dataset in your list (CoNLL-AIDA, AQUAINT, MSNCB, ACE04) ?
-Can I find some of them on the web that are already annotated?
-If I'd like to train on sufficiently annotated dataset (not every one that you use), which file source code should I modify accordingly?

@octavian-ganea
Copy link
Contributor

octavian-ganea commented May 26, 2017

AQUAINT, MSNBC and ACE04 datasets can be obtained from http://webdocs.cs.ualberta.ca/~denilson/data/deos14_ualberta_experiments.tgz . The rest is handled during evaluation by the method in eval/datasets/AQUAINT_MSNBC_ACE04.scala .

The AIDA datasets are not public, one needs to get the license for them. Using this license, a text file with entity annotations is generated and this can be used with PBOH as shown here : eval/datasets/AIDA.scala . Just the annotations, without the full documents, can be obtained from here : https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants