AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce `--match_threshold` (not recommend) #22

ForeverNightmare · 2023-05-19T02:52:15Z

Hi,
I'm traning my model under your framework. I got this error information:

Number of documents with category indicative terms found for each category is: {0: 9014, 1: 0, 2: 0, 3: 551, 4: 1478, 5: 20642, 6: 0, 7: 7429, 8: 8676, 9: 4814, 10: 1368, 11: 23, 12: 418}
Traceback (most recent call last):
File "src/train.py", line 66, in
main()
File "src/train.py", line 57, in main
trainer.mcp(top_pred_num=args.top_pred_num, match_threshold=args.match_threshold, epochs=args.mcp_epochs)
File "/home/xuanw/HL/LOTClass-master/src/trainer.py", line 451, in mcp
self.prepare_mcp(top_pred_num, match_threshold)
File "/home/xuanw/HL/LOTClass-master/src/trainer.py", line 392, in prepare_mcp
assert category_doc_num[i] > 10, f"Too few ({category_doc_num[i]}) documents with category indicative terms found for category {i}; "
AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce --match_threshold (not recommend)

But when I directly run the sh file again(the dataset dir in sh file is replaced with mine), it runs successfully without any error. Will the result I get be correct? Does the previous error message "affect" this result to make it wrong?

The text was updated successfully, but these errors were encountered:

yumeng5 · 2023-05-19T19:55:02Z

Hi,

The error is pretty much explained by the printouts -- for several categories (1, 2, 6) there are 0 documents with category indicative terms (as indicated by the dictionary printed out). So you probably need to add more documents likely to pertain to these categories to the corpus; otherwise, there is no way of training the classifier to detect these categories (and of course, the resulting classifier won't be accurate).

Thanks,
Yu

ForeverNightmare · 2023-05-20T02:17:20Z

Hi @yumeng5 ,

Thanks for your reply! My question is, my training dataset includes about 230,000 pieces of data, and each label of my 12 labels has many instances in the dataset. So I'm really confused how can the "Too few (0) documents with category indicative terms found for category 1" happens. Like for label 6, there are 2839 instances in the dataset, but the number of documents with category indicative terms found for 6 is 0. While for label 10, there are 808 instances but the number of documents with category indicative terms found for 10 is 1368, even more than 808.Label 5, 6482, but 20642 is shown. Based on your understanding of your thesis, would you mind speculating on what caused this result?

yumeng5 · 2023-05-21T19:44:19Z

The number of documents found with category indicative terms is derived based on the category vocabulary constructed in the first step and is not directly related to the actual number of instances in that category -- does the category vocabulary make sense for those categories without enough matching documents (e.g., label 1, 2, 6)?

I'd suggest trying different label names (more common and distinctive terms tend to work better) and checking the category vocabulary accordingly.

Thanks,
Yu

ForeverNightmare · 2023-06-04T02:07:12Z

@yumeng5 Thanks for your seggestions! Now I started training on a new dataset and met a new issue. I set the parameter like this:
MCP_EPOCH=20
SELF_TRAIN_EPOCH=10

But the result shows that the self train epochs are only excuted 2 time:
100%|██████████| 226/226 [01:41<00:00, 2.22it/s]lr: 9.929e-07
Average training loss: 0.10797090083360672
Test acc: 0.7305699586868286
lr: 8.905e-07
Average training loss: 0.11300306767225266
Test acc: 0.7253885865211487
Saving final model to datasets/movies/final_model.pt

What may cause this? I didn't set the early step parameter in .sh file so it should be false.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce `--match_threshold` (not recommend) #22

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce `--match_threshold` (not recommend) #22

ForeverNightmare commented May 19, 2023 •

edited

yumeng5 commented May 19, 2023 •

edited

ForeverNightmare commented May 20, 2023 •

edited

yumeng5 commented May 21, 2023

ForeverNightmare commented Jun 4, 2023

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce --match_threshold (not recommend) #22

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce --match_threshold (not recommend) #22

Comments

ForeverNightmare commented May 19, 2023 • edited

yumeng5 commented May 19, 2023 • edited

ForeverNightmare commented May 20, 2023 • edited

yumeng5 commented May 21, 2023

ForeverNightmare commented Jun 4, 2023

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce `--match_threshold` (not recommend) #22

AssertionError: Too few (0) documents with category indicative terms found for category 1; try to add more unlabeled documents to the training corpus (recommend) or reduce `--match_threshold` (not recommend) #22

ForeverNightmare commented May 19, 2023 •

edited

yumeng5 commented May 19, 2023 •

edited

ForeverNightmare commented May 20, 2023 •

edited