Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training method #58

Open
Re-dot-art opened this issue Mar 18, 2024 · 12 comments
Open

training method #58

Re-dot-art opened this issue Mar 18, 2024 · 12 comments

Comments

@Re-dot-art
Copy link

Hello, the result of running the code directly differs significantly from the result displayed in your paper. I guess it's due to a problem with the training method, so I would like to confirm with you: before using semi-supervised methods, do we need to train the Faster RCNN network with 1% or 5% label samples, and then use semi-supervised learning methods to train the pre-trained weights again after the training is completed? Looking forward to your reply!

@Adamdad
Copy link
Owner

Adamdad commented Mar 18, 2024

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number?

@Re-dot-art
Copy link
Author

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number?

image
image
I ran the above two experimental settings on two v100 images, samples_per_gpu=5

@Re-dot-art
Copy link
Author

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number?
image

@Adamdad
Copy link
Owner

Adamdad commented Mar 18, 2024

As mentioned in README, all experiments in the paper use 8gpux5sample-per-gpu for training. Smaller bs gets worse results as expected. But your results seems to be too low, which even worse than the baseline. Did you edit anything?

@Re-dot-art
Copy link
Author

As mentioned in README, all experiments in the paper use 8gpux5sample-per-gpu for training. Smaller bs gets worse results as expected. But your results seems to be too low, which even worse than the baseline. Did you edit anything?

I did not make any modifications to the code, only added some comments.

@Adamdad
Copy link
Owner

Adamdad commented Mar 18, 2024

Could you please share your configuration settings, the scripts you're using for execution, and the method you're employing to process the dataset? This results is even lower than baselines that just train on labeled data only (No use of unlabeled data). I suspect there is something wrong on your side. I'm here to assist, but I'll need more detailed information to provide effective support.

@Re-dot-art
Copy link
Author

Could you please share your configuration settings, the scripts you're using for execution, and the method you're employing to process the dataset? This results is even lower than baselines that just train on labeled data only (No use of unlabeled data). I suspect there is something wrong on your side. I'm here to assist, but I'll need more detailed information to provide effective support.

Okay, thank you. The config file for the experiment is as follows:
config.zip

The processing of the dataset is carried out according to the methods in readme, and the processing results are shown in the following figure:
image
image
Thank you.

@Adamdad
Copy link
Owner

Adamdad commented Mar 18, 2024

Do you use wandb to record the training process? If yes, can you also share?

@Re-dot-art
Copy link
Author

Do you use wandb to record the training process? If yes, can you also share?

Untitled Report _ consistent-teacher – Weights & Biases.pdf

@Re-dot-art
Copy link
Author

Do you use wandb to record the training process? If yes, can you also share?
The blue line represents the completed **10p experimental setup. The red color is the experimental setup of **1p being trained.

@Adamdad
Copy link
Owner

Adamdad commented Mar 18, 2024

I would suggest follow this config to (1) increase batch size (2) increase the number of labeled sample within a batch (3) and lower your learning rate https://github.com/Adamdad/ConsistentTeacher/blob/main/configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p_2x8.py

For 2 GPUs, the original config only has 2 labeled sample in total. As such 0.01 is too large for the network to converge. So for 2-GPU-training, we use 4labeled:4unlabeled per GPU, with in total 8 labeled sample and a learning rate of 0.005. This ensure, at least, the model need to converge on the labeled dataset in the first place.

@Re-dot-art
Copy link
Author

I would suggest follow this config to (1) increase batch size (2) increase the number of labeled sample within a batch (3) and lower your learning rate https://github.com/Adamdad/ConsistentTeacher/blob/main/configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p_2x8.py

For 2 GPUs, the original config only has 2 labeled sample in total. As such 0.01 is too large for the network to converge. So for 2-GPU-training, we use 4labeled:4unlabeled per GPU, with in total 8 labeled sample and a learning rate of 0.005. This ensure, at least, the model need to converge on the labeled dataset in the first place.

Okay, thank you very much! I'll give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants