Confusion of answer pre-process #40

guoyang9 · 2018-12-15T08:15:02Z

There are three questions confusing me, some of them may largely affect the final performance.

When filtering answers, only the 'multiple_choice_answer' answer sets are pre-processed, as shown in this line, while most of the other answers remain pristine, not to mention the computation of answer occurrence.
The input answer files to this function are the raw answers instead of the pre-processed answers. This could result in failure of finding answer index in processed answer set.
Should we really need to do the process_digit_article? If processed in this way, some answers may become odd, for example, 'left one' will be 'left 1'. However, it brings a minor effect to the validation performance as we process the answers in both the training and validation sets. But for the testing set, further modification on code needs exploring (expanding the code to test phase).

Provide feedback