Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several possible bugs #12

Open
lijierui opened this issue Nov 12, 2021 · 1 comment
Open

Several possible bugs #12

lijierui opened this issue Nov 12, 2021 · 1 comment

Comments

@lijierui
Copy link

lijierui commented Nov 12, 2021

I've been using this codebase to handle some new datasets, it did help me a lot, but I found a few places where there might be bugs or unclear descriptions.

  1. length of target wasn't cut to max_output_len for pretrain models, if that exceed max_len, in

MWPToolkit/mwptoolkit/model/PreTrain/robertagen.py, line 173 or bertgen.py line 173
decoder_inputs = self.pos_embedder(self.out_embedder(target))
the sequence length would exceed pos_embedder's max length

  1. for GTS, the code is not generalized for datasets with constants other than 1 and 3.14 and thus cause tensor size mismatch

(mwptoolkit/model/Seq2Tree/gts.py) ~line 904
if mask_flag: num_score2[i][:2] = -1e10 # for the first iterations, do not generate 1 and 3.14

  1. there might be bugs in processing " from_prefix_to_infix" and "from_infix_to_prefix" in the preprocessing tools:
    If you try to map this equation to prefix and map it back:
    1500/(((100+12)-(100-12))/100)
    it will yield this, where the relation between 100+12 and 100-12 is not correct.
    1500/(100+12-100-12)/100
    and for */, it would ignore () as well:
    1/(1-(1/(2*2))) would be mapped to 1/(1-1/2*2)

  2. Another small problem, every time when feeding a batch, it will re-preprocess the data. This would include much redundant computation if we run many epochs.

Thanks again for this tool!

@LYH-YF
Copy link
Owner

LYH-YF commented Nov 12, 2021

we appreciate your suggestions for the toolkit!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants