Essential Reading For Engineering Leaders

- Andrej Karpathy

GPT
LLM

tl;dr: “We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations.”

featured in #523

Let's Build The GPT Tokenizer

- Andrej Karpathy

GPT
Video

tl;dr: “In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.”

featured in #491

ChatGPT Plugins: Build Your Own In Python!

- James Briggs

Video
GPT
Python

tl;dr: OpenAI's ChatGPT launched plugins, which can be built by anyone. James demonstrates how to build a plugin using the chatgpt-retrieval-plugin template.

featured in #402

What Is ChatGPT Doing … and Why Does It Work?

- Stephan Wolfram

GPT
ML

tl;dr: "My purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why it is that it can do so well in producing what we might consider to be meaningful text. I should say at the outset that I’m going to focus on the big picture of what’s going on—and while I’ll mention some engineering details, I won’t get deeply into them.”

featured in #390

GPT In 60 Lines Of NumPy

- Jay Mody

Python
GPT

tl;dr: "In this post, we'll implement a GPT from scratch in just 60 lines of numpy. We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text.”

featured in #389

GPT Is Only Half Of The AI Language Revolution

- Jason Phillips

GPT
DeepDive

tl;dr: In this post, Slite Engineer Jason Phillips examines AI breakthroughs like GPT, exploring their potential for categorizing, filtering, and processing data. He suggests real-world applications rely more on processing than content generation.

featured in #387

Let's Build GPT: From Scratch, In Code, Spelled Out

- Andrej Karpathy

AI
GPT
Video

tl;dr: "We build a GPT, following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT."

featured in #382

How GPT3 Works - Visualizations And Animations

- Jay Alammar

ML
GPT

tl;dr: "The dataset of 300 billion tokens of text is used to generate training examples for the model. For example, these are three training examples generated from the one sentence at the top. You can see how you can slide a window across all the text and make lots of examples."

featured in #378

The GPT-3 Architecture, On A Napkin

- Daniel Dugas

Architecture
GPT

tl;dr: "There are so many brilliant posts on GPT-3, demonstrating what it can do, pondering its consequences, vizualizing how it works. With all these out there, it still took a crawl through several papers and blogs before I was confident that I had grasped the architecture. So the goal for this page is humble, but simple: help others build an as detailed as possible understanding of the GPT-3 architecture."

featured in #375

Building A Virtual Machine Inside ChatGPT

- Jonas Degrave

GPT
AI
Tools

tl;dr: The authors shows how to "build a virtual machine, inside an assistant chatbot, on the alt-internet, from a virtual machine, within ChatGPT's imagination."

featured in #372

/GPT