Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: process 0 terminated with signal SIGKILL #1262

Open
Huangzhw0221 opened this issue Sep 8, 2022 · 1 comment
Open

Exception: process 0 terminated with signal SIGKILL #1262

Huangzhw0221 opened this issue Sep 8, 2022 · 1 comment

Comments

@Huangzhw0221
Copy link

Hi,

While I am trying the training code with m4c_captioner model, I am getting the following error,

/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option datasets to textcaps
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option model to m4c_captioner
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option config to projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option env.save_dir to ./save/m4c_captioner/defaults
2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option run_type to train_val
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_LOG_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_REPORT_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_TENSORBOARD_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_WANDB_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14337
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:14337
2022-09-08T19:36:25 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 3
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False
2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:14337
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 1
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 2
2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 0
2022-09-08T19:36:30 | mmf: Logging to: ./save/m4c_captioner/defaults/train.log
2022-09-08T19:36:30 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['datasets=textcaps', 'model=m4c_captioner', 'config=projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml', 'env.save_dir=./save/m4c_captioner/defaults', 'run_type=train_val'])
2022-09-08T19:36:30 | mmf_cli.run: Torch version: 1.6.0
2022-09-08T19:36:30 | mmf.utils.general: CUDA Device 0 is: NVIDIA GeForce GTX 1080 Ti
2022-09-08T19:36:30 | mmf_cli.run: Using seed 30613796
2022-09-08T19:36:30 | mmf.trainers.mmf_trainer: Loading datasets
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.10.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-09-08T19:36:52 | mmf.trainers.mmf_trainer: Loading model
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'cls.predictions.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.query.weight']

  • This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
    If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
    2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading optimizer
    2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading metrics
    WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
    builtin_warn(*args, **kwargs)

WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia'
builtin_warn(*args, **kwargs)

WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)

WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}.
builtin_warn(*args, **kwargs)

loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.bias', 'cls.predictions.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.8.output.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight']

  • This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
    If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
    Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'cls.predictions.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight']
  • This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
    If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
    Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'cls.seq_relationship.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight']
  • This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased.
    If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training.
    2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: ===== Model =====
    2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: DistributedDataParallel(
    (module): M4CCaptioner(
    (text_bert): TextBert(
    (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
    (layer): ModuleList(
    (0): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (1): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (2): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    )
    )
    )
    (text_bert_out_linear): Identity()
    (obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
    (lc): Linear(in_features=2048, out_features=2048, bias=True)
    )
    (linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True)
    (linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
    (obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (obj_drop): Dropout(p=0.1, inplace=False)
    (ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
    (lc): Linear(in_features=2048, out_features=2048, bias=True)
    )
    (linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True)
    (linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
    (ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (ocr_drop): Dropout(p=0.1, inplace=False)
    (mmt): MMT(
    (prev_pred_embeddings): PrevPredEmbeddings(
    (position_embeddings): Embedding(100, 768)
    (token_type_embeddings): Embedding(5, 768)
    (ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (emb_dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
    (layer): ModuleList(
    (0): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (1): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (2): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (3): BertLayer(
    (attention): BertAttention(
    (self): BertSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    (intermediate): BertIntermediate(
    (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
    (dense): Linear(in_features=3072, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    )
    )
    )
    )
    (ocr_ptr_net): OcrPtrNet(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    )
    (classifier): ClassifierLayer(
    (module): Linear(in_features=768, out_features=6736, bias=True)
    )
    (losses): Losses(
    (losses): ModuleList(
    (0): MMFLoss(
    (loss_criterion): M4CDecodingBCEWithMaskLoss()
    )
    )
    )
    )
    )
    2022-09-08T19:37:01 | mmf.utils.general: Total Parameters: 92185168. Trained Parameters: 92185168
    2022-09-08T19:37:01 | mmf.trainers.core.training_loop: Starting training...
    2022-09-08T19:37:01 | mmf.datasets.processors.processors: Loading fasttext model now from /home/root1/.cache/torch/mmf/wiki.en.bin

Traceback (most recent call last):
File "/home/root1/anaconda3/envs/mmf/bin/mmf_run", line 8, in
sys.exit(run())
File "/media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf_cli/run.py", line 129, in run
nprocs=config.distributed.world_size,
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 108, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL

I followed some tips to increase the memory to 64Gb, set the num_worker to 0, and reduce the batch_size to 64, but it still doesn't work.Kindly help me to resolve this issue.

@Zhang-Henry
Copy link

The same problem. Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants