AI Tools
Practical AI tools, frameworks, APIs, and developer workflows
RAG (Retrieval-Augmented Generation): How LLMs Access External Knowledge
Retrieval-Augmented Generation (RAG) is a technique where an LLM's response is augmented with relevant information retrieved from an external knowledge base. The typical pipeline: user query → convert to embedding → search a vector database for similar chunks → inject retrieved chunks into the LLM prompt as context → LLM generates an answer grounded in the retrieved information. RAG addresses the problem of LLM knowledge cutoffs and hallucination by giving models access to current, domain-specific data.
Caveman Skill and the Brevity Research: 65-75% Token Reduction That Improves LLM Accuracy
Caveman is a Claude Code skill that instructs the model to drop articles, filler, hedging, and pleasantries — cutting 65-75% of output tokens with no loss of technical accuracy. The underlying research (Hakim, March 2026) tested 31 models on 1,485 problems and found that on 7.7% of benchmark problems, larger models underperform smaller ones by 28.4 percentage points due to 'spontaneous scale-dependent verbosity' — verbose reasoning paths that degrade accuracy. Forcing brevity reverses this.
The LLM Wiki Pattern: Personal Knowledge Bases Without RAG
Andrej Karpathy popularized an approach to personal knowledge bases: dump raw material into a folder, have an LLM read and organize it into interlinked markdown wiki pages with an auto-maintained index. No vector database, no embeddings, no RAG pipeline — just markdown files and links. Simpler and cheaper than RAG for personal-scale knowledge (hundreds of pages), with deeper relationship understanding through explicit backlinks.
LLM API Basics: System Prompts vs User Prompts
LLM APIs separate system prompts (developer-set behavior/constraints, highest authority) from user prompts (end-user messages). System always takes precedence over user in conflicts.
Claude Code: Clear Context vs Keep Context for Complex Tasks
For complex Claude Code tasks, "clear context and auto-accept" is preferred over keeping history — the written plan file preserves context while freeing the context window for execution.
Building AI-for-AI Knowledge Layers with MCP
FixCache pioneered the pattern of AI-for-AI shared knowledge bases via MCP, evolving into Philosopher's Stone as a broader knowledge commons.