Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce training result: ICDAR2015 #90

Closed
JooYoungJang opened this issue Feb 14, 2022 · 4 comments
Closed

Cannot reproduce training result: ICDAR2015 #90

JooYoungJang opened this issue Feb 14, 2022 · 4 comments

Comments

@JooYoungJang
Copy link

JooYoungJang commented Feb 14, 2022

Hi, First of all, thank you for a great work!

I tried to train from pretrained synthetic_weight and fine-tuned in ICDAR2015 to reproduce the evaluation result.

However, after training exactly same as the paper said, I got following result:

  • precision": 0.5158306652436855, "recall": 0.6981222917669716, "hmean": 0.5932896890343698, "AP": 0

The evaluation protocal was from https://rrc.cvc.uab.es/?ch=4

For training, I did the list bellow

  1. Generate synthetic data for weakly-supervised learning. (Character-wise pseudo label was driven by model_syn_r101_pretrain.pth)
  • I used score_threshold=0.1, iou_threshold=0.8 same as CRAFT paper to filter false predictions.
  1. Convert gt text to coco-json format
  2. Trained with hyperparameters below
  • batch_size : 8
  • base learning rate : 0.005 (divided by 10 after 10K iterations)
  • step : 20K
  • optim: SGD w/ weight decay 0.0001, momentum 0.9

I also attach config file I used:
BASE: "./Base-RCNN-FPN.yaml"
MODEL:
MASK_ON: True
TEXTFUSENET_MUTIL_PATH_FUSE_ON: True
EXP_NAME: icdar2015_101_FPN_lr0.005_cls64_vsPaper
WEIGHTS: "/workspace/TextFuseNet_original/weights/model_final.pth"
PIXEL_STD: [57.375, 57.120, 58.395]
RESNETS:
STRIDE_IN_1X1: False # this is a C2 model
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
DEPTH: 101
ROI_HEADS:
NMS_THRESH_TEST: 0.35
TEXTFUSENET_SEG_HEAD:
FPN_FEATURES_FUSED_LEVEL: 2
POOLER_SCALES: (0.0625,)

DATASETS:
TRAIN: ("icdar2015_train",)
TEST: ("icdar2015_val",)
SOLVER:
IMS_PER_BATCH: 8
BASE_LR: 0.005
STEPS: (10000,)
MAX_ITER: 20000
CHECKPOINT_PERIOD: 1000

INPUT:
MIN_SIZE_TRAIN: (800,1000,1200)
MAX_SIZE_TRAIN: 1500
MIN_SIZE_TEST: 1000
MAX_SIZE_TEST: 3000
TEST:
GT: "/workspace/script_test_ch4_t1_e1-1577983151/gt.zip"

OUTPUT_DIR: "/workspace/TextFuseNet_original/out_dir_r101/icdar2015_paper/"

I am not sure where I am confused.
Can anyone give me an advice?

Thanks in advance,

@LIYHUI
Copy link

LIYHUI commented Aug 6, 2022

Hi, have you reproduced the results?

@LIYHUI
Copy link

LIYHUI commented Aug 6, 2022

@JooYoungJang I can't reproduce it too

@ying09
Copy link
Owner

ying09 commented Aug 8, 2022

@JooYoungJang The config files for training please refer to https://github.com/ying09/TextFuseNet/tree/master/configs/ocr
Our all config files have been upload.
Your reproduce result of hmean is 0.5932896890343698. There is definitely something wrong with this. Even if we use the original maskrcnn, its performance will be much better than 0.5932896890343698.

@ying09 ying09 closed this as completed Aug 8, 2022
@LIYHUI
Copy link

LIYHUI commented Aug 8, 2022

@ying09 hi, thanks for your reply, my reproduced result of hmean is 0.893 vs 0.922(original). On the one hand, I think this may be due to the difference in the gt generation code. On the other hand, I think there is something wrong with the pretrained model as metioned in #100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants