Language models have revolutionized the field of natural language processing (NLP) and have become an essential tool for a wide range of applications. These models are trained on large amounts of text data and can perform a variety of tasks such as language translation, text generation, question answering, and sentiment analysis.

We took the time to list some of the most popular language models and their capabilities so you won’t have to.

Language ModelDescriptionTraining Data Format and QuantityCapabilities
BERTBidirectional Encoder Representations from Transformers is a transformer-based model that uses an attention mechanism to understand the context of words in a sentence.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 3.3 Billion wordsFine-tuned for various NLP tasks such as question answering, sentiment analysis, and named entity recognition.
GPT-2Generative Pre-trained Transformer 2 is a transformer-based model that uses an attention mechanism to generate human-like text.Pre-trained on a massive amount of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 40 GB of text dataFine-tuned for various NLP tasks such as language translation, question answering, and text generation
RoBERTaRobustly Optimized BERT Pre-training is an optimized version of BERT, which is trained on a larger dataset and for a longer time.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 160 GB of text dataFine-tuned for various NLP tasks such as question answering, sentiment analysis, and named entity recognition.
T5Text-to-Text Transfer Transformer is a transformer-based model trained on a diverse range of text-to-text transfer tasks.Pre-trained on a diverse range of text-to-text transfer tasks in structured format, such as question answering, translation, summarization and more. Quantity of data is estimated around 11 GB of text dataFine-tuned for various NLP tasks such as language translation, summarization, and text generation.
ULMFiTUniversal Language Model Fine-tuning is a pre-trained model that can be fine-tuned to a wide range of NLP tasks by using a small amount of task-specific data.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 600 GB of text dataFine-tuned for various NLP tasks such as text classification, language modeling and more.
XLNetXLNet is a transformer-based model that is trained to predict all the tokens in a text by maximizing the likelihood of the permutations of the input sequence rather than just the original order.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 32 TB of text dataFine-tuned for various NLP tasks such as text classification, language modeling and more
ALBERTA Lite version of BERT, which uses parameter sharing techniques to reduce the number of parameters and increase the training speed.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 2.5 Billion wordsFine-tuned for various NLP tasks such as question answering, sentiment analysis, and named entity recognition.
ELMOEmbeddings from Language Models is a deep contextualized word representation that models both the complex characteristics of word use with multiple pre-trained layers.Pre-trained on a large corpus of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 5.5 Billion wordsFine-tuned for various NLP tasks such as text classification, question answering, and named entity recognition.
OpenAI GPT-3Generative Pre-trained Transformer 3 is a transformer-based model that uses an attention mechanism to generate human-like text.Pre-trained on a massive amount of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 570 GB of text data Fine-tuned for various NLP tasks such as language translation, question answering, text generation, and even coding and question-answering in specific domains like medicine and law.
CTRLConditional Transformer Language Models is a transformer-based model that uses a conditional mechanism to generate text based on a given context or prompt.Pre-trained on a diverse range of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 45 GB of text dataFine-tuned for various NLP tasks such as language generation, text summarization, and dialogue generation.
MegatronA transformer-based model trained on a massive amount of text data. It is designed to scale to billions of parameters and can be fine-tuned for various NLP tasks such as language translation, text summarization, and named entity recognition.Pre-trained on a massive amount of text data in unstructured format, including books, articles, and websites. Quantity of data is estimated around 45 TB of text dataFine-tuned for various NLP tasks such as language translation, summarization, and named entity recognition

Categories: Services

Tanin Ehrami

Tanin is seasoned strategic consultant with over two decades of experience in development, analysis, architecture, management, financial services, and regulatory compliance risk. In 2020, he founded PSYBER to consult with professional services firms, governing bodies, brands, private equity investors, and risk and compliance professionals on issues related to cognitive security, AI ethics, digital transformations, agile governance, enterprise architecture, risk, and compliance. Tanin is an expert in his field and is available to provide consulting services on a range of topics.