July 15, 2026 at 6:00 PM · Public House Bangkok

The Sovereign
Workspace

Your files. Your model. Your machine. Nothing leaves the room.

90–100 min 20 participants Windows · Intel CPU · 16GB RAM Works offline Public House Bangkok, 249 Sukhumvit 31

Setup What the model sees Free AI explained 11 Lessons Workshop prompts Stack Free resources

The master question

What did the model actually read?

Ask this of every AI system you will ever use — ChatGPT, Copilot, Claude, your own local model.
It reveals more about why AI fails than any benchmark ever will.

Pre-workshop setup

Do this at home · 3 days before

Install LM Studio

Desktop GUI for running local models. No terminal required.

lmstudio.ai

At home

Download Phi-4 Mini (2.3 GB)

In LM Studio → Search icon → search below → download Q4_K_M file

microsoft/phi-4-mini-instruct-GGUF
File to select: phi-4-mini-instruct-Q4_K_M.gguf

2.3 GB

Test the model

LM Studio → Chat tab → select model → type this → expect a response in 15–20 sec

Say "ready" if you are working.

Verify

Prepare 3–5 business documents

Strategy notes, project briefs, meeting transcripts, client notes. PDF, Word, or text. Put them in one folder you can find quickly.

Bring

Optional: Coding demo setup

Install Ollama + VS Code + Ollama Code extension for the free local Copilot demo

# 1. Install Ollama from ollama.com
# 2. Open Command Prompt and run:
ollama pull qwen2.5-coder:1.5b
# 3. Install VS Code from code.visualstudio.com
# 4. VS Code → Ctrl+Shift+X → search "Ollama Code" (publisher: cgaspard) → install

Optional

What the model actually sees

The most dangerous assumption you can bring to any AI system:

"When I upload a document, the model reads it."

It does not. There is always a bottleneck between your document and the model.

Your document

501 pages · ~200k tokens

context window limit · chunk selection · truncation · retrieval quality

Selection bottleneck

← most AI failures happen here

only what fits passes through

Context window

8%–16% of a 500-page doc

only this reaches the model

Phi-4 Mini — in your RAM

no internet · no cloud

answer constrained by what was in the window

Answer

not by what was in the document

Context window as working memory — Daily Dose of Data Science

The context window is working memory. The core job: maximize signal under strict capacity constraints. — Daily Dose of Data Science, LLMOps Part 8

Context types taxonomy — Daily Dose of Data Science

A practical taxonomy of context types: instruction, query, knowledge (RAG), memory, tool, user-specific, environmental. — Daily Dose of Data Science

The two types of "free" AI

Most people assume "free AI" means one thing. It does not. These are opposites.

Type 1 · Someone else runs it

Hosted — You Call It

Google AI Studio, Groq, Mistral, OpenRouter
Real frontier models with rate limits
You give up your prompts — free tiers often train on what you send
Convenient. Not private.

Type 2 · You download and run it

Self-hosted — What We Build Today

Fully private. Nothing leaves your machine.
You pay in electricity and RAM, not data or dollars
Model is a file on your disk — it never changes
Deterministic. Versioned. Reproducible.

The hidden cost of free hosted tiers

Free hosted API = assume your prompts are training data

Your code, business logic, and client data may end up in someone's next training run

Self-hosted = 100% private by definition

If it contains client data or credentials — self-host or pay for a privacy tier

AI building blocks — a human analogy

AI Systems: A Human Analogy — LLM Brain, RAG Brain+Books, MCP Connector, AI Agent Brain+Hands

LLM = Brain · RAG = Brain + Books · MCP = Standard Connector · AI Agent = Brain + Hands. These are connected building blocks that can work together in one AI system.

The 11 universal lessons of context

Apply to every LLM ever built

These lessons do not change when you switch from Phi-4 Mini to ChatGPT, Copilot, or Claude. They are not lessons about local models. They are lessons about how language models process context.

The window is not the document The model sees a slice. Not the whole. Never the whole.

Gaps fill with training data When context is thin, the model invents plausible content. Confidently.

Synthesis requires co-presence Connecting two sections requires both to be in the same window at the same time.

Chunk boundaries break reasoning Information split across chunks cannot be reliably recombined by the model.

Truncation is invisible The model does not say "I only read 10% of your document." It just answers.

Position matters Content at the start and end of the window is recalled better than content in the middle.

Confidence is not accuracy The model answers with the same confidence whether it saw the answer or invented it. This is the most dangerous lesson.

You can engineer the context Pasting text directly into the prompt is more reliable than attaching a file.

Explicit grounding beats implicit retrieval Telling the model exactly what section to use produces better answers than letting it choose.

Multi-document context degrades rapidly Each document you add shrinks the window available for each one.

Document identity is not automatic The model does not reliably know which document it is reading from. You must tell it explicitly.

Chunking strategies for RAG — Daily Dose of Data Science

Chunking is a core design decision. For summarization workloads, coverage replaces relevance as the optimization goal. — Daily Dose of Data Science, LLMOps Part 8

Retrieval stack: vector search, precision vs recall — Daily Dose of Data Science

The standard retrieval stack: vector search as candidate generator, tuning precision vs. recall, contextual retrieval. — Daily Dose of Data Science

Workshop prompts

Copy · paste · run

SECRET DOCS demo — facilitator runs these

Factual retrieval Cyberspace Layers — AJP-3.20

Attach: AJP-3.20 only · Tests: Lessons 1, 6

[System Context]
You are a precise military doctrine analyst.
Answer using only the attached AJP-3.20 document.

[Task]
What are the three layers of cyberspace as defined in AJP-3.20?
For each layer, provide the exact name and one sentence describing
what entities exist at that layer according to the document.

[Constraints]
Use only the attached document.
For each layer, quote one exact phrase from the relevant paragraph.
Do not use outside knowledge about cybersecurity or SECRET doctrine.

[Output Format]
Three numbered items. Each: Layer name | Description | Quoted phrase.

Cross-section synthesis Cyber and Traditional Domains — AJP-3.20

Attach: AJP-3.20 only · Tests: Lessons 3, 4

[System Context]
You are a military doctrine analyst.

[Task]
Using AJP-3.20, explain the relationship between cyberspace operations
and the traditional domains of land, sea, air, and space.

[Constraints]
Cite specific paragraphs or sections.
Do not use outside knowledge.

[Output Format]
Two paragraphs. First: how the document defines the relationship.
Second: what the document says about command and coordination across domains.

Large doc stress test Planning Phases — OPS (501 pages)

Attach: COPD only · Tests: Lessons 5, 7

[System Context]
You are a planning analyst reviewing SECRET operational doctrine.

[Task]
According to the COPD, what are the six phases of the Joint Operations
Planning Process? For each phase, give the primary output produced.

[Constraints]
Use only the attached document.
Cite the section or page number for each phase.

[Output Format]
Six numbered items. Each: Phase name | Primary output | Section reference.

Context injection Deliberate grounding — paste text directly

No file attachment — paste content directly · Tests: Lessons 8, 9

[System Context]
You are a planning analyst. The following is an extract from the COPD,
Section 3.2, describing the Operational Design process:

[Paste 3–4 paragraphs of actual COPD text about Operational Design here]

[Task]
Based only on the extract above, list the key inputs required for
Operational Design according to the COPD. Then identify one gap:
something the extract implies is needed but does not explicitly define.

[Constraints]
Use only the extract above, not the full document.
Be explicit about what is stated versus what is implied.

[Output Format]
Part 1: Numbered list of explicit inputs.
Part 2: One paragraph identifying the implied gap.

Your own documents — three prompts to try

Prompt A — Factual retrieval

[System Context]
You are a precise analyst reviewing the attached document.

[Task]
What are the three most important decisions described in this document?
For each decision, identify who made it and what outcome was expected.

[Constraints]
Use only the attached document. Quote one phrase per decision.
Do not use outside knowledge.

[Output Format]
Three numbered items. Each: Decision | Who | Expected outcome | Quote.

Prompt B — Gap detection

[System Context]
You are a critical reviewer.

[Task]
After reading the attached document, identify one assumption the document
makes but never explicitly justifies. Then explain what risk that
unjustified assumption creates for the project or decision described.

[Constraints]
Stay within the document. Name the section where the assumption appears.

[Output Format]
Part 1: The assumption (one sentence).
Part 2: Where it appears (section or page reference).
Part 3: The risk it creates (two sentences).

Prompt C — Action extraction

[System Context]
You are a project analyst.

[Task]
Extract all action items from the attached document.
For each action, identify the responsible party if named and the deadline
if stated. Mark missing information explicitly as "not stated."

[Output Format]
Numbered list. Each: Action | Owner | Deadline.

The four-part prompt anatomy

[System Context]  → who the model is and what it is working with
[Task]            → one clear verb: extract, identify, compare, summarize
[Constraints]    → what to use, what to avoid, what to name explicitly
[Output Format]  → exact structure — no ambiguity

Common errors: task has two verbs (pick one) · no output format (you lose control) · no constraints (model fills gaps) · generic system context (give it a specific role)

The workshop stack

Tool	Role	Why this choice
LM Studio	Local model GUI	Desktop app. No Docker. No terminal. Built-in document attach. CPU inference out of the box.
Phi-4 Mini Q4_K_M	Primary model	3.8B parameters. 2.3GB on disk. Best CPU-only model in 2026. 10–12 tok/s on 13th gen i7.
Qwen2.5 3B Q4_K_M	Backup model	If Phi-4 Mini has issues on any machine. Similar speed profile.
Ollama + qwen2.5-coder:1.5b	Coding demo only	VS Code extension block. Optional. Requires pre-install.

Hardware reality

Model	Speed	Verdict
Phi-4 Mini Q4_K_M (3.8B)	8–12 tok/s	✓ Usable
Qwen2.5 3B Q4_K_M	8–10 tok/s	✓ Backup
Qwen2.5 7B	2–4 tok/s	✗ Too slow for live

A paragraph response takes 20–30 seconds. Use the wait time to read the output carefully. That is not wasted time. That is evaluation time.

Where to go from here

Today

Phi-4 Mini on CPU

Document analysis, private prompts, offline work

GPU machine → Mistral 7B or Llama 3.1 8B

General tasks, longer context, 10x faster inference

Advanced

Graph RAG

Cross-document reasoning at scale — connects information across many files

Pro

Fine-tuning

Adapt a model to your domain, your language, your documents

Free AI resources

AgentRouter

One-time free tier credit

Credit

The SovereignWorkspace