To get AI to think consistently, I had to teach it how to think like a person.
I didn't set out to build a brain. I built what worked. Then I looked at the neuroscience and realized the architecture I'd converged on — fast classification, selective retrieval, modular processing, feedback consolidation — is the same architecture the human brain uses. Not because I copied it. Because the problem demands it.
Every number on this page is verifiable. Every citation is peer-reviewed. How It Works shows the outcomes. This page shows the science.
Built like a brain — by convergence, not by design.
Both the human brain and grāmatr solve the same problem: how to process variable-complexity inputs efficiently under resource constraints. The solutions converged independently.
Fast classification = Amygdala
The amygdala classifies incoming stimuli in milliseconds before conscious reasoning engages. grāmatr's trained classifiers triage every request — effort, intent, context tier — before expensive models are invoked.
LeDoux, 2000 · Pessoa & Adolphs, 2010Routing = Prefrontal cortex
The PFC doesn't do the thinking — it decides what kind of thinking is needed and routes accordingly. grāmatr's pipeline orchestrates which context, skills, and directives the AI receives.
Miller & Cohen, 2001Selective retrieval = Hippocampus
The hippocampus doesn't recall everything — it selectively retrieves what's relevant via pattern completion. grāmatr's semantic vector search mirrors this: retrieving only the context needed, not everything stored.
Norman & O'Reilly, 2003Progressive learning = Predictive coding
The brain predicts what's coming and only processes the surprise. As grāmatr's classifiers improve, familiar patterns need less processing — the same efficiency principle described by Friston's free-energy framework.
Friston, 2010 · Clark, 2013Feedback loop = Sleep consolidation
During sleep, the brain replays daily experiences to consolidate specific memories into generalized knowledge. grāmatr's feedback loop does the same — specific interactions become generalized classification intelligence through retraining.
McClelland et al., 1995 · Diekelmann & Born, 2010Multi-head classifiers = Modular brain
The brain runs specialized modules in parallel — face recognition, language, spatial reasoning — coordinated by hub networks. grāmatr runs parallel classifiers for effort, intent, and context tier, integrating outputs into a unified intelligence packet.
Kanwisher, 2010 · Sporns & Betzel, 2016These aren't metaphors. They are convergent solutions to the same computational problem. The brain evolved these architectures over 500 million years. grāmatr implements them in software.
How the system gets smaller and faster.
Most AI memory tools work like filing cabinets — store everything, retrieve when asked. grāmatr works like a student. It studies your interactions, builds increasingly accurate models of your intent and preferences, and progressively optimizes its own classifiers to serve you faster with less data.
On day one, every request goes through a general-purpose language model with a full context payload. As interaction data accumulates, grāmatr's classification pipeline improves at the platform level — training LoRA adapters using Low-Rank Adaptation (Hu et al., 2021) that improve routing and classification for all users. These adapters don't replace the base model — they sit on top of it, lightweight parameter-efficient layers that encode patterns without full model fine-tuning. Personal adapter training — a premium feature — adds an additional layer that encodes individual workflow patterns for even faster, more accurate routing.
Classifier progression timeline
Full context payload: ~40,000 tokens. Classification by the primary LLM. Latency: 3-5 seconds per request. Accuracy: baseline (the model is learning you).
Local classifier begins handling effort-level and intent routing. Context payload drops as the system internalizes repeated patterns. Latency: 1-3 seconds.
Platform-level LoRA adapters handle classification with high confidence. Context payload: ~1,200 tokens — a surgical intelligence packet. The system doesn't need the encyclopedia anymore because it learned the curriculum. Personal adapters, available as a premium feature, add per-user optimization on top.
Each interaction generates training signal. Classification confidence increases. Smaller models handle more decisions. Larger models are reserved for genuinely complex requests. The cost per interaction decreases while accuracy increases.
The flywheel math: every classified request produces a feedback tuple — the original prompt, the classification decision, and the outcome quality signal. That tuple becomes training data. Better training data produces more accurate classifiers. More accurate classifiers produce better routing. The cycle compounds.
grāmatr's classification pipeline has processed over 4,189 routed requests with 1,761 learning corrections from active production use — a single-user dataset that drives the progressive learning cycle. That feedback loop is what collapsed a 40,000-token system prompt to 1,200 tokens.
40,000 → 1,200.
Before the routing engine, Brian's CLAUDE.md file — the system prompt that told Claude how to behave — had grown to over 40,000 tokens (verifiable in the git history). It contained every rule, every preference, every coding convention, every behavioral directive. Every single request carried all of that context, whether relevant or not.
After the routing engine: 1,200 tokens. A 97% reduction. And the system performs better.
Brute-force approach — ship everything, hope the model finds what's relevant.
Surgical briefing — only what the current request actually needs.
What's in the 1,200-token intelligence packet
The reason performance improved: large language models perform worse with irrelevant context. The 40,000-token prompt was full of useful information that was irrelevant to any given request. The 1,200-token packet contains only relevant information — because the classifier already determined what's relevant.
The numbers (publicly visible on GitHub).
The routing engine went live on March 21, 2026. The difference in output is measurable in any git client.
git log --oneline --since='2026-03-21' --until='2026-03-29' | wc -lgit log --oneline in each repogit log --oneline --all | grep -i "bump" | wc -lThese are not estimates. Run the commands yourself:
# Commits per day (grāmatr repo)
git log --oneline --since="2026-03-21" --until="2026-03-29" | wc -l
# Version deployments
git log --oneline --all | grep -i "bump" | wc -l Knowledge graph architecture.
grāmatr's knowledge graph is not a flat key-value store. It's a structured semantic memory with typed entities, weighted observations, and tiered retrieval.
Entity types include: user_profile, project_context, learning_signal, skill_definition, agent_definition, decision_record, code_pattern, preference, steering_rule, reflection, handoff_state, classification_eval, feedback_signal, prd, hard_problem, and session_context.
Memory tiers
Hot memory
Active project state, current session context, recent decisions. Retrieved on every request. This is what makes the 1,200-token intelligence packet possible.
Warm memory
Learned preferences, coding conventions, behavioral patterns, skill definitions. Retrieved when the classifier determines relevance.
Cold memory
Historical decisions, archived project context, completed PRDs. Retrieved only on explicit search or when semantic similarity exceeds a confidence threshold.
Vector search + semantic retrieval
Entity retrieval uses pgvector with 4096-dimensional embeddings for semantic similarity search. When the routing engine needs context, it computes vector similarity against the query embedding — not keyword matching. This is why grāmatr can surface a coding convention you established three months ago when you encounter a similar pattern today.
Per-user encryption. Row-level security.
Per-user encryption
Every piece of data in grāmatr's knowledge graph is encrypted and isolated per user. The architecture enforces this at the database level, not the application level — meaning a bug in application code cannot expose one user's data to another.
User interaction data is stored in both a vector semantic database (pgvector on PostgreSQL) and a structured object database. Both enforce user-scoped access. Even grāmatr staff cannot query another user's knowledge graph.
Row-level security
PostgreSQL row-level security (RLS) policies enforce user isolation at every query. Every table carries a user_id column. RLS policies automatically filter every query to the authenticated user's scope. There is no application-layer "WHERE user_id = ?" that a developer could forget — the database enforces it.
Your interactions train only your intelligence. Encrypted, isolated, invisible to everyone.
Team admins explicitly control which patterns and skills are shared. Everything else stays private.
Enterprise admins govern what gets incorporated into organizational intelligence. Full authorization required at every level.
No data flows between tiers automatically. Every cross-tier share requires explicit admin authorization.
2,471 tests, independent agents.
Quality in grāmatr is enforced through separation of concerns — the agents that write code are not the agents that test it.
The separation is visible in the git history. Test commits are distinct from feature commits — different agents, different review passes, different validation logic. The independent test engineer agent validates pipeline steps from ingestion through classification through feedback capture.
Open source acknowledgments.
grāmatr's routing engine was influenced by patterns from two open-source projects by Daniel Miessler:
- Fabric — MIT License, Copyright (c) 2025. Pattern-based AI workflow orchestration.
- PAI (Personal AI Infrastructure) — MIT License, Copyright (c) 2025. Personal AI routing architecture.
Brian discovered these projects via the Network Chuck YouTube channel in February 2026. The routing patterns in PAI provided the conceptual foundation for grāmatr's decision router — the inflection point that turned a knowledge graph into a context engineering pipeline.
References
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. arxiv.org
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized Language Models. arXiv:2305.14314. arxiv.org
Miessler, D. (2025). Fabric: An open-source framework for augmenting humans using AI. MIT License. github.com
Miessler, D. (2025). PAI (Personal AI Infrastructure). MIT License. github.com
Anthropic. (2025). Effective Context Engineering for AI Agents. anthropic.com
Neuroscience
LeDoux, J. E. (2000). Emotion Circuits in the Brain. Annual Review of Neuroscience, 23, 155-184. DOI
Miller, E. K., & Cohen, J. D. (2001). An Integrative Theory of Prefrontal Cortex Function. Annual Review of Neuroscience, 24, 167-202. DOI
Norman, K. A., & O'Reilly, R. C. (2003). Modeling Hippocampal and Neocortical Contributions to Recognition Memory. Psychological Review, 110(4), 611-646. DOI
Friston, K. (2010). The Free-Energy Principle: A Unified Brain Theory? Nature Reviews Neuroscience, 11(2), 127-138. DOI
Clark, A. (2013). Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behavioral and Brain Sciences, 36(3), 181-204. DOI
McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why There Are Complementary Learning Systems. Psychological Review, 102(3), 419-457. DOI
Diekelmann, S., & Born, J. (2010). The Memory Function of Sleep. Nature Reviews Neuroscience, 11(2), 114-126. DOI
Kanwisher, N. (2010). Functional Specificity in the Human Brain. PNAS, 107(25), 11163-11170. DOI
Sporns, O., & Betzel, R. F. (2016). Modular Brain Networks. Annual Review of Psychology, 67, 613-640. DOI
Kahneman, D. (2003). A Perspective on Judgment and Choice. American Psychologist, 58(9), 697-720. DOI
Pessoa, L., & Adolphs, R. (2010). Emotion Processing and the Amygdala. Nature Reviews Neuroscience, 11(11), 773-783. DOI
Other
Stack Overflow. (2025). 2025 Developer Survey — AI Section. survey.stackoverflow.co
Twiss, J. (2026, January 8). AI Coding Degrades: Silent Failures Emerge. IEEE Spectrum. spectrum.ieee.org
Ready to see it in action?
The architecture is production. The numbers are real. Request Early Access
Read the founding story or start with How It Works.