Andrej Karpathy Uses Obsidian and Codex for His AI Research. Here's His Setup.

Andrej Karpathy Uses Obsidian and Codex for His AI Research. Here's His Setup.

Developer Culture

Andrej Karpathy built the reference implementation of GPT-2 from scratch. For his LLM research wiki, he uses Obsidian and Codex. Here's what that setup looks like and why it transfers to your workflow.

Andrej Karpathy is not the kind of person who uses a tool because it looked good in a tweet. The ex-OpenAI founding team member and ex-Tesla AI director built nanoGPT, micrograd, and a series of educational materials that have become the standard reference for learning how transformers work at a mechanistic level. When MindStudio documented that his LLM research wiki runs on an Obsidian and Codex setup, developers paid attention. ([MindStudio Blog](https://mindstudio.ai/blog), 2025). The andrej karpathy obsidian codex setup is worth understanding not because of who built it but because of what it does.

[INTERNAL-LINK: obsidian codex cli developer wiki -> how to set up Codex as a vault automation agent]

Key Takeaways

What the Karpathy setup teaches developers about knowledge management.

  • Karpathy's research wiki uses Obsidian as the knowledge base and Codex CLI to organize it.
  • The setup is documented by MindStudio and is replicable by any developer who reads papers.
  • Codex auto-indexes new papers, generates cross-links, and maintains the knowledge graph.
  • The wiki stays organized without manual maintenance because Codex handles the structure.
  • Any developer maintaining technical documentation can use the same pattern.

[IMAGE: A research notes folder in Obsidian showing linked paper notes with backlinks visible in the sidebar - search terms: obsidian research notes linked knowledge graph academic papers]

Who Is Andrej Karpathy and Why Does His Setup Matter?

Karpathy's credentials are relevant because his tool choices reflect engineering discipline rather than trend-following. He was Director of AI at Tesla during Autopilot's development. He was on OpenAI's founding team. He created nanoGPT, which has over 30,000 GitHub stars and is cited in transformer architecture courses at MIT, Stanford, and CMU. (GitHub - karpathy/nanoGPT, 2024).

His educational work has an unusual quality: it is rigorously correct and practically accessible. When he explains something, the explanation reflects how the system actually works, not a simplified metaphor. This extends to his tooling. Karpathy does not use overly complex setups. He uses what works.

The fact that he built his LLM research wiki on Obsidian and Codex rather than a custom database, a Notion setup, or a vector search system is informative. It means this particular combination of tools is good enough for someone who could build anything else.

[PERSONAL EXPERIENCE] In our experience reading Karpathy's published materials and talks, his workflow philosophy leans toward tools that are auditable and local. He wants to see the actual files. He wants grep to work. Obsidian's local-first architecture and Codex's command-line nature fit that preference exactly.

The Structure of the Research Wiki

The MindStudio documentation of the Karpathy setup describes a specific vault structure optimized for paper notes and research tracking. (MindStudio Blog, 2025). Each paper gets its own note. The note includes the paper title, authors, publication venue, date, a summary, key contributions, limitations, and cross-links to related papers already in the vault.

Each paper gets its own note. Codex generates the cross-links between them automatically. The knowledge graph builds itself.

Codex CLI runs as an index and link agent. When a new paper note is added, Codex scans the vault for notes that reference similar concepts, methods, or authors. It adds cross-links in both directions: the new note links to related existing notes, and the existing notes get an "also see" section updated to reference the new note.

This bidirectional linking is the core value of the knowledge graph. Notes that sit in isolation are not a knowledge base. They are a file collection. Cross-links create the connective structure that makes the vault searchable by concept rather than just by filename.

[CHART: Flow diagram showing: New paper note added -> Codex scans vault for related notes -> Bidirectional cross-links added -> Index file updated -> Git commit logged - Source: MindStudio Blog documentation of Karpathy setup 2025]

What Codex Actually Does in This Setup

Codex runs three operations in the Karpathy pattern. The first is concept extraction: reading a new note and identifying key technical terms, method names, and author names. The second is similarity matching: comparing those extracted concepts against existing notes. The third is link injection: adding cross-reference sections to both the new note and the matching existing notes.

For a researcher who reads several papers per week, this automation compounds. After six months, a vault with 200 paper notes has a dense cross-link structure that makes it possible to navigate from "attention mechanism" to every paper in the vault that discusses attention, ordered by relevance. That structure would take hours of manual work to build. Codex builds it incrementally, one paper at a time.

[UNIQUE INSIGHT] The Karpathy pattern inverts the usual approach to knowledge management tools. Most developers add cross-links manually when they remember to. The Codex automation makes cross-linking the default behavior rather than the optional extra step. Over time this inverts the knowledge graph quality curve: well-maintained vaults typically degrade as manual effort decreases. The automated vault improves as it grows.

According to the MindStudio documentation, the setup also maintains a master index note: a single file listing all papers in the vault by topic cluster, auto-generated by Codex from the note frontmatter. (MindStudio Blog, 2025). This becomes the entry point for navigating the research library.

[IMAGE: A master index Obsidian note showing paper titles organized under topic cluster headings with internal links - search terms: obsidian index note linked knowledge base research topics]

Why Any Developer Should Care

You do not need to be doing AI research to use this pattern. The cross-linking and indexing behavior that works for paper notes works for any technical knowledge base. Engineering decision records. Debugging postmortems. RFC notes. Architecture diagrams with explanatory context. Meeting notes from technical reviews.

Any domain where you accumulate technical notes over time benefits from automated cross-linking and indexing. The pattern is the same. New note added, Codex scans for related notes, links get added in both directions, index gets updated, git commits the change.

The operational overhead is near zero after initial setup. Install Codex CLI, write the task definition, schedule the cron job. The vault maintains itself from that point.

[INTERNAL-LINK: claude code memory obsidian -> connecting the organized vault to Claude Code sessions]

Adapting the Setup for Different Developer Contexts

Developers who read technical blog posts and documentation rather than papers can adapt the pattern. Replace "paper notes" with "concept notes." Each time you deeply read something technical, a spec, a deep-dive blog post, a language RFC, you create a note. Codex links it to related existing notes.

Over a year of consistent use, the vault becomes a searchable, cross-linked record of everything you've understood well enough to document. That's a career-level knowledge asset. It beats a bookmarks folder. It beats a private blog. It beats trying to grep your git history for that one architecture decision you made 18 months ago.

Consistency requirement

The knowledge graph only works if you consistently add notes. Sporadic use produces a vault with isolated islands of linked content and large gaps. The Karpathy pattern works because he adds notes regularly. The automation handles organization, but it cannot create notes from papers you didn't read.

The Replicability Question

The MindStudio post describes the setup as buildable "in 5 minutes," which is marketing language. (MindStudio Blog, 2025). A realistic estimate for a developer who has not used Codex CLI before is closer to two to three hours for initial setup, task definition writing, and testing. The cron configuration adds another 30 minutes.

That is a one-time cost. The ongoing time cost of maintaining the vault after setup is close to zero. The math works out clearly in favor of investment: two hours once versus ongoing manual organization forever.

The greg-asher/codex-obsidian project provides a starting implementation that developers can adapt. (greg-asher/codex-obsidian GitHub, 2025). The task definitions in that repository handle the four core maintenance operations and can be extended to include the cross-linking step from the Karpathy pattern.

FAQ

Did Karpathy officially document this setup himself?

The setup is documented by MindStudio based on Karpathy's public tooling discussions, not a first-person writeup from Karpathy himself. The documentation should be treated as a well-informed third-party interpretation of his workflow, not a direct quote. The technical pattern described is sound regardless of sourcing.

How does this differ from using a vector database for semantic search?

Vector databases give you semantic search across notes but require a running server and API queries. The Karpathy Obsidian pattern gives you explicit cross-links that are visible in the files themselves and searchable with grep. Cross-links are inspectable, auditable, and portable. Vector embeddings are not. The two approaches have different tradeoffs and are not mutually exclusive.

Can I use this pattern with Zettlr or Logseq instead of Obsidian?

The Codex CLI task definitions can target any directory of markdown files. Zettlr and Logseq also use local markdown files with similar structure. The vault-specific features like Obsidian's backlinks sidebar won't apply, but the automated cross-link generation and indexing tasks work the same way against any markdown file directory.

Does Codex read LaTeX papers or just markdown notes?

Codex CLI processes text files. For papers in PDF format, you would need a separate extraction step to convert the paper content to markdown or text before Codex can process it. Several developers use a PDF-to-markdown pipeline as a pre-step. The Karpathy pattern, as documented, assumes you are writing the note yourself after reading the paper rather than having Codex ingest the paper directly.

How many notes does a vault need before the cross-linking becomes valuable?

Cross-linking becomes noticeably useful around 50-100 notes on related topics. Below 50 notes, you likely remember the connections yourself. Above 100 notes, the connections you don't remember outweigh the ones you do. The Codex automation compounds in value as vault size grows.


The andrej karpathy obsidian codex setup is interesting because of what it reveals about how serious technical thinkers manage knowledge at scale. Not a custom database. Not a SaaS wiki. A local markdown vault with automated maintenance. The tools are boring. The discipline is what matters. Karpathy adds notes consistently. Codex keeps them organized. The knowledge graph builds itself. Any developer with a technical domain they care about can replicate this.

[INTERNAL-LINK: obsidian codex cli developer wiki -> full Codex CLI automation setup guide]

Emcy

Founder, CodeCulture — Developer apparel built by a dev, for devs

For developers who actually read the papers