About fine-tunning hierarchical bert model #31

Vitor-Almeida · 2022-09-24T12:51:36Z

Vitor-Almeida
Sep 24, 2022

Hi! Love your repo, I'm doing research in law text classification and I'm using your repo a lot.

I was wondering what technique exactly did you use to fine-tune your hierarchical bert model.

I was able to reproduce your "normal" bert results by only unfreezing the last layer of the bert model, which I think is the standard way to fine-tune a bert model.

but with the hierarchical bert model there's a few more added layers, there is a new embedding layer, and two new transformer layers. How are those trained? Are the learning rate the same for all the layers? Are all of those new layers unfrozen?

Thank you very much!

Answered by iliaschalkidis

Sep 25, 2022

Hi @Vitor-Almeida,

It's interesting that you coyly reproduce the BERT results with unfreezing a single layer.

With respect to your question, we used a fixed learning of 3e-5 rate across all models. No special scheduling (warmup, decay) or anything else was used. While we acknowledge that tuning the learning rate could possibly lead to better results, this process could be extremely resource-consuming and we lacked the resources to tune learning rates (or even other hyperparams) across 6 models and 7 tasks with multiple seeds...

The same applies for the hierarchical models, we use a fixed learning rate across model layers.

View full answer

iliaschalkidis · 2022-09-25T07:00:29Z

iliaschalkidis
Sep 25, 2022
Maintainer

Hi @Vitor-Almeida,

It's interesting that you coyly reproduce the BERT results with unfreezing a single layer.

With respect to your question, we used a fixed learning of 3e-5 rate across all models. No special scheduling (warmup, decay) or anything else was used. While we acknowledge that tuning the learning rate could possibly lead to better results, this process could be extremely resource-consuming and we lacked the resources to tune learning rates (or even other hyperparams) across 6 models and 7 tasks with multiple seeds...

The same applies for the hierarchical models, we use a fixed learning rate across model layers.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About fine-tunning hierarchical bert model #31

{{title}}

Replies: 1 comment

{{title}}

Select a reply

About fine-tunning hierarchical bert model #31

Vitor-Almeida Sep 24, 2022

Replies: 1 comment

iliaschalkidis Sep 25, 2022 Maintainer

Vitor-Almeida
Sep 24, 2022

iliaschalkidis
Sep 25, 2022
Maintainer