July 15, 2026 at 6:00 PM · Public House Bangkok

The Sovereign
Workspace

Your files. Your model. Your machine. Nothing leaves the room.

90–100 min 20 participants Windows · Intel CPU · 16GB RAM Works offline Public House Bangkok, 249 Sukhumvit 31
The master question
What did the model actually read?
Ask this of every AI system you will ever use — ChatGPT, Copilot, Claude, your own local model.
It reveals more about why AI fails than any benchmark ever will.

Pre-workshop setup

Do this at home · 3 days before
Install LM Studio
Desktop GUI for running local models. No terminal required.
lmstudio.ai
At home
Download Phi-4 Mini (2.3 GB)
In LM Studio → Search icon → search below → download Q4_K_M file
microsoft/phi-4-mini-instruct-GGUF File to select: phi-4-mini-instruct-Q4_K_M.gguf
2.3 GB
Test the model
LM Studio → Chat tab → select model → type this → expect a response in 15–20 sec
Say "ready" if you are working.
Verify
Prepare 3–5 business documents
Strategy notes, project briefs, meeting transcripts, client notes. PDF, Word, or text. Put them in one folder you can find quickly.
Bring
Optional: Coding demo setup
Install Ollama + VS Code + Ollama Code extension for the free local Copilot demo
# 1. Install Ollama from ollama.com # 2. Open Command Prompt and run: ollama pull qwen2.5-coder:1.5b # 3. Install VS Code from code.visualstudio.com # 4. VS Code → Ctrl+Shift+X → search "Ollama Code" (publisher: cgaspard) → install
Optional

What the model actually sees

The most dangerous assumption you can bring to any AI system:

"When I upload a document, the model reads it."

It does not. There is always a bottleneck between your document and the model.

Your document
501 pages · ~200k tokens
context window limit · chunk selection · truncation · retrieval quality
Selection bottleneck
← most AI failures happen here
only what fits passes through
Context window
8%–16% of a 500-page doc
only this reaches the model
Phi-4 Mini — in your RAM
no internet · no cloud
answer constrained by what was in the window
Answer
not by what was in the document
Context window as working memory — Daily Dose of Data Science
The context window is working memory. The core job: maximize signal under strict capacity constraints. — Daily Dose of Data Science, LLMOps Part 8
Context types taxonomy — Daily Dose of Data Science
A practical taxonomy of context types: instruction, query, knowledge (RAG), memory, tool, user-specific, environmental. — Daily Dose of Data Science

The two types of "free" AI

Most people assume "free AI" means one thing. It does not. These are opposites.

Type 1 · Someone else runs it
Hosted — You Call It
  • Google AI Studio, Groq, Mistral, OpenRouter
  • Real frontier models with rate limits
  • You give up your prompts — free tiers often train on what you send
  • Convenient. Not private.
Type 2 · You download and run it
Self-hosted — What We Build Today
  • Fully private. Nothing leaves your machine.
  • You pay in electricity and RAM, not data or dollars
  • Model is a file on your disk — it never changes
  • Deterministic. Versioned. Reproducible.
The hidden cost of free hosted tiers
Free hosted API = assume your prompts are training data
Your code, business logic, and client data may end up in someone's next training run
Self-hosted = 100% private by definition
If it contains client data or credentials — self-host or pay for a privacy tier

AI building blocks — a human analogy

AI Systems: A Human Analogy — LLM Brain, RAG Brain+Books, MCP Connector, AI Agent Brain+Hands
LLM = Brain · RAG = Brain + Books · MCP = Standard Connector · AI Agent = Brain + Hands. These are connected building blocks that can work together in one AI system.

The 11 universal lessons of context

Apply to every LLM ever built

These lessons do not change when you switch from Phi-4 Mini to ChatGPT, Copilot, or Claude. They are not lessons about local models. They are lessons about how language models process context.

01
The window is not the document The model sees a slice. Not the whole. Never the whole.
02
Gaps fill with training data When context is thin, the model invents plausible content. Confidently.
03
Synthesis requires co-presence Connecting two sections requires both to be in the same window at the same time.
04
Chunk boundaries break reasoning Information split across chunks cannot be reliably recombined by the model.
05
Truncation is invisible The model does not say "I only read 10% of your document." It just answers.
06
Position matters Content at the start and end of the window is recalled better than content in the middle.
08
You can engineer the context Pasting text directly into the prompt is more reliable than attaching a file.
09
Explicit grounding beats implicit retrieval Telling the model exactly what section to use produces better answers than letting it choose.
10
Multi-document context degrades rapidly Each document you add shrinks the window available for each one.
11
Document identity is not automatic The model does not reliably know which document it is reading from. You must tell it explicitly.
Chunking strategies for RAG — Daily Dose of Data Science
Chunking is a core design decision. For summarization workloads, coverage replaces relevance as the optimization goal. — Daily Dose of Data Science, LLMOps Part 8
Retrieval stack: vector search, precision vs recall — Daily Dose of Data Science
The standard retrieval stack: vector search as candidate generator, tuning precision vs. recall, contextual retrieval. — Daily Dose of Data Science

Workshop prompts

Copy · paste · run
SECRET DOCS demo — facilitator runs these
Factual retrieval Cyberspace Layers — AJP-3.20
Attach: AJP-3.20 only · Tests: Lessons 1, 6
[System Context] You are a precise military doctrine analyst. Answer using only the attached AJP-3.20 document. [Task] What are the three layers of cyberspace as defined in AJP-3.20? For each layer, provide the exact name and one sentence describing what entities exist at that layer according to the document. [Constraints] Use only the attached document. For each layer, quote one exact phrase from the relevant paragraph. Do not use outside knowledge about cybersecurity or SECRET doctrine. [Output Format] Three numbered items. Each: Layer name | Description | Quoted phrase.
Cross-section synthesis Cyber and Traditional Domains — AJP-3.20
Attach: AJP-3.20 only · Tests: Lessons 3, 4
[System Context] You are a military doctrine analyst. [Task] Using AJP-3.20, explain the relationship between cyberspace operations and the traditional domains of land, sea, air, and space. [Constraints] Cite specific paragraphs or sections. Do not use outside knowledge. [Output Format] Two paragraphs. First: how the document defines the relationship. Second: what the document says about command and coordination across domains.
Large doc stress test Planning Phases — OPS (501 pages)
Attach: COPD only · Tests: Lessons 5, 7
[System Context] You are a planning analyst reviewing SECRET operational doctrine. [Task] According to the COPD, what are the six phases of the Joint Operations Planning Process? For each phase, give the primary output produced. [Constraints] Use only the attached document. Cite the section or page number for each phase. [Output Format] Six numbered items. Each: Phase name | Primary output | Section reference.
Context injection Deliberate grounding — paste text directly
No file attachment — paste content directly · Tests: Lessons 8, 9
[System Context] You are a planning analyst. The following is an extract from the COPD, Section 3.2, describing the Operational Design process: [Paste 3–4 paragraphs of actual COPD text about Operational Design here] [Task] Based only on the extract above, list the key inputs required for Operational Design according to the COPD. Then identify one gap: something the extract implies is needed but does not explicitly define. [Constraints] Use only the extract above, not the full document. Be explicit about what is stated versus what is implied. [Output Format] Part 1: Numbered list of explicit inputs. Part 2: One paragraph identifying the implied gap.
Your own documents — three prompts to try
Prompt A — Factual retrieval
[System Context] You are a precise analyst reviewing the attached document. [Task] What are the three most important decisions described in this document? For each decision, identify who made it and what outcome was expected. [Constraints] Use only the attached document. Quote one phrase per decision. Do not use outside knowledge. [Output Format] Three numbered items. Each: Decision | Who | Expected outcome | Quote.
Prompt B — Gap detection
[System Context] You are a critical reviewer. [Task] After reading the attached document, identify one assumption the document makes but never explicitly justifies. Then explain what risk that unjustified assumption creates for the project or decision described. [Constraints] Stay within the document. Name the section where the assumption appears. [Output Format] Part 1: The assumption (one sentence). Part 2: Where it appears (section or page reference). Part 3: The risk it creates (two sentences).
Prompt C — Action extraction
[System Context] You are a project analyst. [Task] Extract all action items from the attached document. For each action, identify the responsible party if named and the deadline if stated. Mark missing information explicitly as "not stated." [Output Format] Numbered list. Each: Action | Owner | Deadline.

The four-part prompt anatomy

[System Context] → who the model is and what it is working with [Task] → one clear verb: extract, identify, compare, summarize [Constraints] → what to use, what to avoid, what to name explicitly [Output Format] → exact structure — no ambiguity

Common errors: task has two verbs (pick one) · no output format (you lose control) · no constraints (model fills gaps) · generic system context (give it a specific role)

The workshop stack

ToolRoleWhy this choice
LM StudioLocal model GUIDesktop app. No Docker. No terminal. Built-in document attach. CPU inference out of the box.
Phi-4 Mini Q4_K_MPrimary model3.8B parameters. 2.3GB on disk. Best CPU-only model in 2026. 10–12 tok/s on 13th gen i7.
Qwen2.5 3B Q4_K_MBackup modelIf Phi-4 Mini has issues on any machine. Similar speed profile.
Ollama + qwen2.5-coder:1.5bCoding demo onlyVS Code extension block. Optional. Requires pre-install.

Hardware reality

ModelSpeedVerdict
Phi-4 Mini Q4_K_M (3.8B)8–12 tok/s✓ Usable
Qwen2.5 3B Q4_K_M8–10 tok/s✓ Backup
Qwen2.5 7B2–4 tok/s✗ Too slow for live

A paragraph response takes 20–30 seconds. Use the wait time to read the output carefully. That is not wasted time. That is evaluation time.

Where to go from here

Today
Phi-4 Mini on CPU
Document analysis, private prompts, offline work
Next
GPU machine → Mistral 7B or Llama 3.1 8B
General tasks, longer context, 10x faster inference
Advanced
Graph RAG
Cross-document reasoning at scale — connects information across many files
Pro
Fine-tuning
Adapt a model to your domain, your language, your documents

Free AI resources