BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

Deep Contextualized Word Representations (NAACL 2018) [paper] - ELMo
Universal Language Model Fine-tuning for Text Classification (ACL 2018) [paper] - ULMFit
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019) [paper][code][official PyTorch code] - BERT
Improving Language Understanding by Generative Pre-Training (CoRR 2018) [paper] - GPT
Language Models are Unsupervised Multitask Learners (CoRR 2019) [paper][code] - GPT-2
MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019) [paper][code] - MASS
Unified Language Model Pre-training for Natural Language Understanding and Generation (CoRR 2019) [paper][code] - UNILM
Multi-Task Deep Neural Networks for Natural Language Understanding (ACL 2019) [paper][code] - MT-DNN
ERNIE: Enhanced Language Representation with Informative Entities (ACL 2019) [paper][code] - ERNIE (THU)
ERNIE: Enhanced Representation through Knowledge Integration (CoRR 2019) [paper] - ERNIE (Baidu)
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (CoRR 2019) [paper] - ERNIE 2.0 (Baidu)
Pre-Training with Whole Word Masking for Chinese BERT (CoRR 2019) [paper] - Chinese-BERT-wwm
SpanBERT: Improving Pre-training by Representing and Predicting Spans (CoRR 2019) [paper] - SpanBERT
XLNet: Generalized Autoregressive Pretraining for Language Understanding (CoRR 2019) [paper][code] - XLNet
RoBERTa: A Robustly Optimized BERT Pretraining Approach (CoRR 2019) [paper] - RoBERTa
NEZHA: Neural Contextualized Representation for Chinese Language Understanding (CoRR 2019) [paper][code] - NEZHA
K-BERT: Enabling Language Representation with Knowledge Graph (AAAI 2020) [paper][code] - K-BERT
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transforme (CoRR 2019) [paper][code] - T5
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (CoRR 2019) [paper][code] - ZEN
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (CoRR 2019) [paper][code] - BAAI-JDAI-BERT
Knowledge Enhanced Contextual Word Representations (EMNLP 2019) [paper] - KnowBert
UER: An Open-Source Toolkit for Pre-training Models (EMNLP 2019) [paper][code] - UER
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020) [paper] - ELECTRA
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR 2020) [paper] - StructBERT
FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR 2020) [paper][code] - FreeLB
HUBERT Untangles BERT to Improve Transfer across NLP Tasks (CoRR 2019) [paper] - HUBERT
CodeBERT: A Pre-Trained Model for Programming and Natural Languages (CoRR 2020) [paper] - CodeBERT
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (CoRR 2020) [paper] - ProphetNet
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (CoRR 2020) [paper][code] - ERNIE-GEN
Efficient Training of BERT by Progressively Stacking (ICML 2019) [paper][code] - StackingBERT
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination (CoRR 2020) [paper][code]
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training (CoRR 2020) [paper][code] - UNILMv2
MPNet: Masked and Permuted Pre-training for Language Understanding (CoRR 2020) [paper][code] - MPNet
Language Models are Few-Shot Learners (CoRR 2020) [paper][code] - GPT-3
SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL 2020) [paper] - SPECTER
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning (CoRR 2020) [paper][code] - PLATO-2
DeBERTa: Decoding-enhanced BERT with Disentangled Attention (CoRR 2020) [paper][code] - DeBERTa

多模态

VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV 2019) [paper]
Learning Video Representations using Contrastive Bidirectional Transformer (CoRR 2019) [paper] - CBT
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS 2019) [paper][code]
VisualBERT: A Simple and Performant Baseline for Vision and Language (CoRR 2019) [paper][code]
Fusion of Detected Objects in Text for Visual Question Answering (EMNLP 2019) [paper][[code]](https://github.com/google-research/ language/tree/master/language/question_answering/b2t2) - B2T2
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training (AAAI 2020) [paper]
LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP 2019) [paper][code]
VL-BERT: Pre-training of Generic Visual-Linguistic Representatio (CoRR 2019) [paper][code]
UNITER: Learning UNiversal Image-TExt Representations (CoRR 2019) [paper]
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval （SIGIR 2020) [paper] - FashionBERT
VD-BERT: A Unified Vision and Dialog Transformer with BERT (CoRR 2020) [paper] - VD-BERT

模型压缩

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. (CoRR 2019) [paper]
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System. Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. (CoRR 2019) [paper] - MKDM
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. (CoRR 2019) [paper]
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. (CoRR 2019) [paper]
Small and Practical BERT Models for Sequence Labeling. Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. (EMNLP 2019) [paper]
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. (AAAI 2020) [paper]
Patient Knowledge Distillation for BERT Model Compression. Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. (EMNLP 2019) [paper] - BERT-PKD
Extreme Language Model Compression with Optimal Subwords and Shared Projections. Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. (ICLR 2019) [paper]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. [paper][code]
TinyBERT: Distilling BERT for Natural Language Understanding. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. (ICLR 2019) [paper][code]
Q8BERT: Quantized 8Bit BERT. Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat. (NeurIPS 2019 Workshop) [paper]
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. (ICLR 2020) [paper][code]
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. Mitchell A. Gordon, Kevin Duh, Nicholas Andrews. (ICLR 2020) [paper][PyTorch code]
Reducing Transformer Depth on Demand with Structured Dropout. Angela Fan, Edouard Grave, Armand Joulin. (ICLR 2020) [paper] - LayerDrop
Multilingual Alignment of Contextual Word Representations (ICLR 2020) [paper]
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search. Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou. (IJCAI 2020) [paper] - AdaBERT
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou. (CoRR 2020) [paper][pt code][tf code][keras code]
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. (CoRR 2020) [paper][code]
FastBERT: a Self-distilling BERT with Adaptive Inference Time. Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, Qi Ju. (ACL 2020) [paper][code]
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou. (ACL 2020) [paper][code]
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation. Bowen Wu, Huan Zhang, Mengyuan Li, Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang. (CoRR 2020) [paper] - BiLSTM-SRA & LTD-BERT
Poor Man's BERT: Smaller and Faster Transformer Models. Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov. (CoRR 2020) [paper]
DynaBERT: Dynamic BERT with Adaptive Width and Depth. Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu. (CoRR 2020) [paper]
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?. Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. (CoRR 2020) [paper]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

多模态

模型压缩

模型搜索

About

Releases

Packages

BshoterJ/LaterBERTs-get-the-upper-hand

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

多模态

模型压缩

模型搜索

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages