Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About code for pretraining #4

Open
volgachen opened this issue Jun 4, 2022 · 8 comments
Open

About code for pretraining #4

volgachen opened this issue Jun 4, 2022 · 8 comments

Comments

@volgachen
Copy link

Excuse me, do you have any plan to release codes or instructions for pretraining?

@encounter1997
Copy link
Owner

encounter1997 commented Jun 5, 2022

Sorry that we do not have the plan to release the code for pre-training, but it can be easily implemented by replacing the model construction function in the DeiT code with our model construction function.

Hope this can help you and feel free to ask anything if you have difficulties in implementing the pre-training code.

@volgachen
Copy link
Author

Thank you for your response.
I guess I should modify query_shape into (1,1). Is there any other configs I should notice?

@encounter1997
Copy link
Owner

Taking fp-detr-base-in1k.py for example, there are several parts that should be modified in the config, as follows:

  1. Only the model definition is needed;
  2. The self-attn and the corresponding norm in encoder2 should be removed, and the operation order should be updated.
  3. return_intermediate should be updated since deep supervision is not used during pre-training. The code in the DeiT project may also need to be changed slightly, to obtain the class token from the output sequence for loss computation.
  4. num_classes should be 1000 for ImageNet classification.

@volgachen
Copy link
Author

volgachen commented Jun 6, 2022

  1. The self-attn and the corresponding norm in encoder2 should be removed, and the operation order should be updated.

I suppose the self-attn you mentioned in point 2 is actually prompt_self_attn?

@encounter1997
Copy link
Owner

Yes, that's right.

@volgachen
Copy link
Author

Thank you! It seems to be right now.

@volgachen
Copy link
Author

I find that there is a learning rate decay for sampling_offsets in the training configuration for detection.
How do you handle with sampling_offsets in the pretraining process?

@encounter1997
Copy link
Owner

We did not carefully tune the learning rate for sampling_offsets and reference_points during pre-training, and simply set their learning rate the same as other parameters in the transformer encoder. Tuning the learning rate may lead to better pre-training results, but we didn't try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants