AI glossary: 40 terms that matter in 2026
Curated glossary of 40 AI terms used in corporate and technical conversations in 2026 — short definition, usage example, common gotchas.
Pocket glossary for technical and corporate conversations in 2026. Short definition, usage example, and when applicable, the gotcha of who’s selling it wrong.
Models and foundations
LLM (Large Language Model) — model predicting next token from context. Claude, GPT, Gemini, Llama are LLMs. Gotcha: “model” doesn’t mean “product.” Claude is the model; Claude.ai is the product.
SLM (Small Language Model) — smaller, focal, cheaper, faster LLM. Runs locally or on modest hardware. Useful for specific tasks where the big LLM is overkill.
Frontier model — the lab’s most capable model at the moment. Claude Opus, GPT-5, Gemini Ultra. Used in frontier tasks, expensive per token.
Foundation model — the base model before fine-tuning. Every Claude starts as a foundation model trained by Anthropic.
Multimodal — processes more than one modality (text + image + audio + video). Standard in 2026 top models.
Context window — how many tokens fit in one interaction. 200K (Claude Sonnet), 1M (Opus 1M, Gemini Pro). Bigger ≠ better for all tasks.
Prompt and generation
Prompt — the instruction. “Summarize in 3 bullets” is a prompt.
System prompt — persistent instructions defining persona, rules, tone. Different from user prompt (the current question).
Prompt engineering — discipline of writing prompts that extract reliable output. Not magic — it’s technical writing.
Context engineering — design of what enters the context window: RAG, memory, examples, tools. Where the real quality gains live in 2026.
Hallucination — confident invention of fact. Not a bug. It’s how the model works.
Temperature — 0-1 parameter controlling creativity. 0 = deterministic, 1 = creative. Production agents typically use 0.0-0.3.
Top-p / Top-k — sampling parameters. Almost nobody tunes them. Temperature suffices.
Agents and orchestration
Agent — LLM + tools + loop. Executes, not just responds.
Multi-agent — system with multiple specialized agents orchestrated. Useful when the task has distinct phases (researcher + writer + reviewer).
Tool use — model calls to external functions. Web search, code execution, API calls.
Tool calling — the specific protocol. Standard in modern models.
MCP (Model Context Protocol) — Anthropic standard for connecting tools to models. In 2026, the de facto standard across vendors.
ReAct — reasoning pattern (Reason + Act). Agent thinks, acts, observes, repeats.
Autonomy level — how much the agent decides solo vs asks for confirmation. Defined by Agent Trust Stack.
Training and adaptation
Pre-training — expensive initial phase where the model reads the internet. Client companies don’t do this.
Fine-tuning — re-training on specific data. Expensive. In 2026, RAG solves 80% before you need it.
RLHF (Reinforcement Learning from Human Feedback) — humans rank answers, model learns preference. Why Claude is polite.
RLAIF — RLHF but AI does the ranking. Cheaper, scales better.
Distillation — big model “teaches” small model. How SLMs are born.
LoRA / QLoRA — lightweight fine-tuning. Doesn’t change the whole model, just one layer.
Retrieval and memory
RAG (Retrieval-Augmented Generation) — LLM + your knowledge base. Model searches before responding.
Embedding — numeric vector representing meaning. Foundation of semantic search.
Vector database — DB that indexes embeddings. Pinecone, Cloudflare Vectorize, pgvector.
Semantic search — search by meaning, not keyword.
Memory (in agent) — storage across sessions. Can be ephemeral (1 session), short-term (24h), long-term (always).
Safety and governance
Prompt injection — attacker manipulates prompt to change behavior. Top risk vector in 2026. See Prompt Infection Taxonomy.
Jailbreak — variant focused on making the model violate its own rules.
Harness — code wrapping the prompt in production. See Harness Stack.
Guardrails — checks before/after model output. Schema validation, content filter.
Durable pause — agent pauses on irreversible action waiting for human. Layer 7 of Harness Stack.
Confidence gating — agent declares confidence before acting. Layer 8 of Harness Stack.
Failure corpus — versioned repository of observed failures. Feeds the continuous improvement system.
Performance and cost
Token — billing unit. ~0.75 word English, ~0.5 word Portuguese.
Latency — time between prompt and first response. Critical in conversational UX.
Throughput — tokens per second. Critical in batch processing.
Streaming — receive response token by token while generating. Modern UX standard.
Cache — reuse context between calls. Brutally reduces cost when applied right.
Eval — test if the model/agent does what it should. Different from “does it work?” — it’s statistical, measures drift.
Why this unlocks
With vocabulary, you can read proposals, evaluate vendors, brief an AI engineer clearly, and identify when someone is selling Lego as Ferrari. The glossary evolves — in 6 months, half these terms will be more nuanced, and new ones will appear.
Next: read When AI isn’t the answer — the honest companion to this guide.