Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruction statements mentioned in the paper #30532

Open
liumc14 opened this issue Apr 29, 2024 · 6 comments
Open

Reconstruction statements mentioned in the paper #30532

liumc14 opened this issue Apr 29, 2024 · 6 comments

Comments

@liumc14
Copy link

liumc14 commented Apr 29, 2024

Hello, the data set here is the squad data set, but the three domain data sets created in the paper do not seem to be reflected in the code, and it seems that the reconstruction statements in the three domain data sets disclosed in the paper are in the source and It's the same in target. Why is this? @shamanez

self.src_file = Path(data_dir).joinpath(type_path + ".source")

@amyeroberts
Copy link
Collaborator

Hi, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

@shamanez
Copy link
Contributor

@liumc14, you are correct. I open-sourced this code before my paper. Also, to keep the architecture clean, I didn't add the reconstruction statement.

But it is pretty straightforward

  1. Mix the data QA and Recon while having an identifier.
  2. Then, during the forward computation, only use the retrieved documents as the inputs to the Generator when training data is related to the reconstruction signal.

@liumc14
Copy link
Author

liumc14 commented Apr 29, 2024

@liumc14, you are correct. I open-sourced this code before my paper. Also, to keep the architecture clean, I didn't add the reconstruction statement.

But it is pretty straightforward

  1. Mix the data QA and Recon while having an identifier.
  2. Then, during the forward computation, only use the retrieved documents as the inputs to the Generator when training data is related to the reconstruction signal.
    @shamanez But in the three domain-specific data set download links you provided in the paper (https://drive.google.com/drive/folders/1up3yKcJFArBQ6e0F_6n_mfW1VPHxA20A), I found after downloading the data set that the reconstruction in the .source file in the training set The statement has the same result as in .target, for example:

    American Civil Liberties Union, ACLU of Arizona, National Immigration Law Center slam law. American Civil Liberties Union, ACLU of Arizona, National Immigration Law Center slam law. In this case, rebuild the statement Can it still be used for training?

@shamanez
Copy link
Contributor

Yes, the statement should be re-constructed. But the input to the generator should be the retrieved docs related to the statement.

@liumc14
Copy link
Author

liumc14 commented Apr 30, 2024

Yes, the statement should be re-constructed. But the input to the generator should be the retrieved docs related to the statement.

@shamanez So the training of reconstructed statements actually involves inputting reconstructed statements, retrieving related documents, and letting the generator generate reconstructed statements based on the relevant documents? Thank you for your advice

@shamanez
Copy link
Contributor

shamanez commented Apr 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants