Generative Pre-trained Transformers

Generative Pre-trained Transformers#

Twitter Handle LinkedIn Profile GitHub Profile

In modern years, Natural language processing (NLP) has been transformed by the rise of decoder-based transformers, which diverge from the traditional encoder-decoder framework of the pioneering Attention Is All You Need paper. The auto-regressive and masked self-attention mechanisms are key features that enable these models to do text generation effectively.

To this end, we would discuss the architecture of decoder-based transformers, through the family of Generative Pre-trained Transformers (GPT). We would review mainly the GPT-2 paper, discuss some concepts, and then implement from scratch a simplified version of the GPT-2 model.

Table of Contents#

Citations#