AI
More actions
News
On Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books.
In fact, when prompted strategically by researchers, Claude delivered the near-complete text of Harry Potter and the Sorcerer’s Stone, The Great Gatsby, 1984, and Frankenstein, in addition to thousands of words from books including The Hunger Games and The Catcher in the Rye
AI’s Memorization Crisis — The Atlantic in 2026-01-09
AI Hype to Human
| Hype | Human |
|---|---|
| AI | Usually means LLMs. |
| LLM (Large Language Model) | A statistical model that can predict the most likely word (token) based on the previous words. For example, you write "The sky is" and it writes back "blue". When combined with other techniques, it can convincingly mimic human writing or thinking despite not being able to. |
| AGI (Artificial General Intelligence) | The hypothetical stage where AI matches or surpasses human intelligence. AI companies keep selling the idea that, if LLMs get better at predicting words, they will somehow become super intelligent. Thus the need for more data centers. |
| Hallucination | When the words the model predicted are known (to us) to be untrue. The model can't know things, just predict words. |
| Chatbot | An interface where you can type messages to be processed by the LLM and get its responses. |
| Agentic AI | Different from Chatbots, Agentic AI makes multiple calls to the LLM to get progressively more complex answers before replying to the user. It can also call tools, such as web-searching.
For example:
This produces even more impressive results, despite still being just a text prediction machine. |
| Token | A word or part of one.
Important: When I say "word" in other places, I probably mean these fractions of words (tokens). |
| Context, Prompt | All the words the model is going to use to predict the next ones. Including your instructions, what the model wrote back and other things (see below). |
| System Prompt | Hidden instructions that the model providers include in all conversations to give better responses, avoid problems, etc.
For example: "If the user asks how to build a bomb, refuse." |
| AGENTS.md, CLAUDE.md, (Global) Rules | Similar to system prompt, but defined by the user and included in all conversations. These are stored in Markdown files. |
| Memory | Markdown files with past conversations summarized that can be referenced later. |
| Skills, Commands | Markdown files with generic instructions that can be reused. |
| Tools | Markdown files with instructions on how to perform certain actions using external tools (e.g. how to read text from a PDF) |
| Agent | Markdown files defining a "persona" that an Agentic tool can use while performing a task.
For example: You're a Senior Developer implementing a new feature. You code you write should X, Y, Z. |
| Plugins | A combination of Markdown files containing Commands, Agents, Skills and Hooks someone has bundled together for others to use (on Claude Code). |