Claude CLI + Obsidian

8 Karpathy-Style AI Agent Use Cases — Deep Research Report

🧠 AI Agents 📓 Obsidian 🖥️ CLI-First 🗄️ Trino + Snowflake 📊 Data Intelligence 📅 April 18, 2026

📋 Table of Contents

🧠 Personal Knowledge RAG — Claude as a second brain over your Obsidian vault
🗄️ Data Lake / DB Query Agent — Natural language SQL on Trino + Snowflake
🌐 Research Assistant — Web search → synthesized notes → knowledge graph
💻 Codebase Explorer — Map, explain, and document any codebase
📋 Meeting Intelligence — Transcript → decisions + tasks → linked task graph
✍️ Writing Pipeline — Claude drafts from your Obsidian context, in your voice
📊 BI Dashboard + Analyst — Live SQL → trend analysis → Obsidian notes
📚 Learning System — Papers → summaries → flashcards → spaced repetition

Executive Summary

This report covers 8 high-value use cases for building a personal AI agent stack using Claude CLI (Karpathy-style: minimal, CLI-first, tool-calling loop) wired to Obsidian as the knowledge base, with external data connectors (Trino, Snowflake, web search, git) feeding context in.

Each use case is analyzed across: implementation approach, agentic loop pattern, core prompts, Obsidian vault structure, required tools, security model, and complexity. All 8 are implementable today with existing tools.

Data Agent focus: Trino + Snowflake connectors are deeply covered in Use Case 2 (Data Lake/DB Agent) and Use Case 7 (BI Dashboard), with schema knowledge base patterns, efficient SQL generation, and least-privilege security.

🧠

Personal Knowledge RAG

Claude as a second brain over your Obsidian vault — reads, writes, links, and resumes across sessions

Setup: Medium Time to first result: 30min–2hr

What It Is

A local AI knowledge management stack pairing Obsidian (as a Zettelkasten vault) with a CLI agent that reads/writes notes, queries semantic memory via Dataview, and autonomously maintains bidirectional links. The agent treats the vault as its memory layer — creating, linking, and surfacing notes on behalf of the user.

Unlike cloud-based "Second Brain" tools, everything stays local. Obsidian vaults are plain Markdown files — no vendor lock-in, no subscription, no data leaving your machine.

Karpathy CLI Approach

The Karpathy pattern: vault as working directory → Claude reads/writes Markdown files directly. Tool access via ripgrep (search), glob, and Dataview HTTP queries. Three wiring options:

Claude Code CLI — claude --dir ~/vault with project config pointing at vault root
Custom Python agent — llm API calls with filesystem tools
MCP server — Obsidian Local REST API plugin enables structured Dataview queries from CLI

Agentic Loop Pattern

①User query → Perceive

②vault_search / dataview_query → Recall (ranked notes)

③LLM synthesizes → Reason

④vault_write / vault_link / vault_tag → Act (new note, link, update)

Tools needed: vault_read, vault_search, vault_write, vault_glob, dataview_query, vault_link, vault_tag

Core Prompts

System prompt: "You are a personal knowledge management agent. Your memory is an Obsidian vault. All notes are .md with YAML frontmatter. Prefer creating/updating notes over giving long answers. Link with wikilinks. Tag in frontmatter."
Summarize & save: "summarize my notes on X, create a permanent note"
Research: "I read an article about Y, save key takeaways to Inbox"
Query: "what do I know about Z, find and link related notes"

Obsidian Vault Structure

Inbox/

Unsorted capture, process later

Permanent/

Decontextualized evergreen notes

References/

Literature with bibliographic metadata

Projects/

Project-specific notes

Journal/

Daily notes (Daily Notes plugin)

memory/

Agent session state & context

Templates/

Templater templates

Attachments/

Images, PDFs

Obsidian Plugins

Dataview — frontmatter indexing & live queries Templater — dynamic note templates Local REST API — HTTP vault queries for CLI agents Daily Notes — built-in session journal Obsidian Git — vault backup

Key Challenges & Gotchas

Context window overflow — large vaults exceed LLM context. Use two-phase retrieve-then-reason with Dataview pre-filtering.
Wikilink case sensitivity — APFS (macOS) is case-insensitive; ext4 (Linux) is sensitive. Use all-lowercase kebab-case filenames.
Dataview index lag — queries may be stale if Obsidian is closed. Trigger refresh via Local REST API after bulk writes.
Daily note conflicts — agent and user writing simultaneously clobber each other. Use separate memory/sessions/ folder for agent.
CORS on Local REST API — if agent runs on different machine via SSH, tunnel or configure API auth.

Recommended Resources

• Karpathy Zero To Hero — neural networks, philosophy basis for Karpathy agent approach

• Claude Code docs — set CLAUDE_CODE_DIR to vault path

• Obsidian Dataview plugin — essential for RAG retrieval layer

• Obsidian Local REST API plugin — HTTP vault queries

🗄️

Data Lake / DB Query Agent

Natural language → optimized SQL on Trino + Snowflake — Obsidian as persistent schema knowledge base

Setup: Medium Time to first result: 2–4hr

What It Is

A CLI agent (Karpathy-style Python REPL loop + Claude API) that uses Obsidian as a persistent schema/context knowledge base, connects natively to Trino (federated SQL across S3/ADLS/GCS/Hive) and Snowflake (enterprise DW), translates natural language questions into self-verified optimized SQL, and returns human-readable results with full provenance.

Trino provides federated SQL across data lake formats (Parquet, ORC, Avro, JSON) without moving data. Snowflake connector in Trino allows joining data lake + DW tables in a single query.

Karpathy CLI Architecture

Python script REPL loop wrapping trino CLI + snow CLI + Obsidian file reads:

System prompt: Defines persona (DB query expert), available tools (trino_query, snow_query, obsidian_search, validate_sql), behavior rules
Tool layer: Python functions wrapping CLI commands — trino --server https://host:8443 --catalog aws -f query.sql
Obsidian layer: obsidian_search(query) calls grep/Dataview API to find relevant schema docs
Two backends: Agent decides whether to use Trino or Snowflake based on question topic

10 Core Tools

Tool	What It Does
`trino_describe`	DESCRIBE table via Trino CLI — column names, types, partition keys
`trino_list_tables`	SHOW CATALOGS / SHOW TABLES FROM schema
`trino_query`	Execute SQL, return JSON/CSV results + stats
`snow_describe`	Describe Snowflake table metadata
`snow_list_tables`	List Snowflake tables/row counts
`snow_query`	Execute Snowflake SQL via snow CLI
`obsidian_search`	Search vault for table descriptions, business definitions
`obsidian_read`	Read specific schema note from vault
`validate_sql`	EXPLAIN dry-run before execution
`format_results`	Convert JSON/CSV to human-readable markdown

Agentic Loop

①User NL question → Decide: Trino or Snowflake?

②Search Obsidian for relevant schema docs

③Generate SQL with explanation

④Validate SQL via dry-run (EXPLAIN)

⑤Execute query

⑥Format and return results

⑦Log query to Obsidian query history

Trino Connector — Key Patterns

Predicate pushdown: Include partition key filters (e.g., WHERE date = '2024-01-01') to leverage Trino's connector-level filter optimization
Sampling: Use TABLESAMPLE (BERNOULLI(1)) for quick exploration before full analysis
File formats: ORC (default, best compression), Parquet (widely compatible), Avro, JSON, CSV
Auth: Certificate-based or service account — avoid passwords. LDAP, OAuth2, Kerberos for enterprise
Trino → Snowflake: Native connector queries Snowflake directly; allows JOINing Snowflake dims with S3 facts in one SQL statement

Snowflake Connector — Key Patterns

Dedicated warehouse: Agent uses a dedicated warehouse (e.g., COMPUTE_WH) with auto-suspend (60s) to avoid idle costs
Key pair auth: More secure than password — store private key passphrase in env var or secrets manager
Read-only role: GRANT USAGE + SELECT only — never INSERT/UPDATE/DELETE/SYSADMIN
Micro-partitions: Queries benefit from predicates on clustered columns — align filters with clustering keys
Result caching: Use identical query text for repeated exploration to hit Snowflake's query result cache

Efficient SQL Patterns

Partition pushdown: Always include partition key predicates — without them, Trino scans all partitions
Aggregation pushdown: Let Trino/Snowflake aggregate at the storage layer — never pull raw rows
LIMIT rows early: Apply LIMIT 10000 immediately to reduce row volume before expensive operations
JOIN ordering: Use smaller table as build side in hash joins; LEFT JOIN for dimension tables
EXPLAIN first: Always run EXPLAIN before execution on large tables — reject if scan > 100GB or rows > 1B
APPROX_DISTINCT: Use for high-cardinality counts to reduce payload

Obsidian Schema Knowledge Base

/schema/

Table schemas by catalog/schema, one file per table

/metrics/

Business metric definitions + calculation SQL

/query-history/

Daily log of queries run with NL + SQL

/business-glossary/

Term definitions and data dictionary

/data-quality/

Known issues, null semantics, update frequencies

Each schema note has YAML frontmatter: type: table-schema, catalog, schema, partitionKeys, clusterKeys, rowEstimate, lastUpdated. Dataview queries find tables related to any topic.

Query Self-Verification

Run EXPLAIN on generated SQL before executing — verify table names, column references, row estimates, no Cartesian products
Cross-reference column names with Obsidian schema docs — catch typos before execution
Check JOIN key types match (string vs int vs date) — type mismatch causes silent wrong results
Validate WHERE predicates: no invalid date ranges, no comparisons on non-comparable types
Spot-check results against Obsidian sample data for semantic correctness

Security & Auth

🔑 Credential Management

Trino: certificate or key pair auth in env vars. Snowflake: key pair + passphrase in env var. Obsidian vault: encrypted path or access-controlled sync.

🛡️ Least-Privilege Roles

Trino: read-only on catalogs needed. Snowflake: USAGE + SELECT only, no INSERT/UPDATE/DELETE. Trino-to-Snowflake connector: dedicated read-only Snowflake user.

🚫 Column Masking

Snowflake masking policies on PII columns. Agent sees masked values unless unmasked role granted (it shouldn't be).

📏 Result Size Limits

Enforce max rows returned (e.g., 10,000). Pre-aggregate at DB layer. Use APPROX_COUNT. Never pull raw fact table rows into LLM context.

Required Tools

trino-cli — Trino CLI JAR snowsql — Snowflake CLI snowflake-connector-python Dataview + Templater — Obsidian plugins Python wrapper (dbagent package) anthropic Python SDK

Key Challenges

Schema drift: Tables change but Obsidian docs go stale — agent should detect inconsistencies on describe and flag
Vault size: Large vaults (10k+ notes) can't fit in context — use Dataview to dynamically filter per question
SQL dialect differences: Trino uses HiveQL derivatives; Snowflake uses its own dialect — agent must route correctly
Snowflake warehouse cold start: ~60s wake-up cost if suspended — warn about this
Query cost: Snowflake charges per credit — always run EXPLAIN first and warn about expensive queries

🌐

Research Assistant

Web search + Obsidian notes synthesis — auto-links into a growing knowledge graph

Setup: Medium Time to first result: 15–30min

What It Is

Claude conducts deep web research, synthesizes findings into atomic markdown notes, and automatically links them into a growing Obsidian knowledge graph. Every research session feeds a compounding graph — future queries leverage prior work so research on related topics auto-surfaces old notes.

Core principle: every factual claim gets an inline [Source](URL) citation. Claude fetches source pages directly to verify, not just trust search snippets.

Karpathy CLI Approach

Terminal-first, no GUI automation of Obsidian — just direct file I/O on .md files. Claude operates as the agent brain; Obsidian is the viewer/output:

claude --dir ~/vault — vault as working directory
WebSearch + WebFetch built-in (no MCP needed) — Claude issues searches, reads top results
Optional: Brave Search MCP for higher-quality results with free API key

Agentic Loop

OBSERVE WebSearch + WebFetch + vault_read

THINK LLM synthesizes, identifies gaps, decides next search

ACT Write new note, Edit existing, Bash git/Dataview

Loop terminates when: question answered, Claude signals diminishing returns, or user interrupts. Subagents can run parallel research branches.

Core Prompts

Research system: "Always cite sources with [Source](URL). Create one note per concept (atomic). Always link to related notes with wikilinks. If note exists, extend it — don't duplicate."
Deep research: Issue search → read 2-3 source pages → write atomic note → link to related notes → add #Sources block
Note synthesis: Review last 5 inbox notes, identify top 3 concepts, create synthesis note connecting all items
Link discovery: Run Dataview for orphaned notes, search web for how orphaned concept relates to other topics
Session resume: Read session-log.md, summarize last thread, suggest next searches

Obsidian Vault Structure

inbox/

Raw research output, not yet organized

topics/

Curated atomic notes, one per concept

papers/

Paper summaries and annotations

people/

Notes on individuals

projects/

Multi-topic research projects

daily/

Daily research logs (YYYY-MM-DD.md)

session-log.md

Current session context at root

index.md

Hub note with Dataview queries

Session Persistence

A session-log.md at vault root records: current research question, what was searched, what was found, what remains open, suggested next steps. Updated at end of each session. On startup, Claude reads it first.

Git commits provide full history — git log can reconstruct past research threads. The vault becomes a fully versioned, queryable knowledge base.

Source Tracking

Inline citation: Every factual claim ends with [Source](https://...)
Frontmatter: source: https://... and sources: [...] in YAML
Source block: Notes end with # Sources block listing all URLs
Provenance: Note's created date + git commit history = full provenance trail

Key Plugins

Dataview — query vault as database Templater — dynamic note templates Recent Files — surface recently touched notes Natural Language Dates — type "tomorrow", "next week" in dates Brave Search MCP (optional) — higher quality search Memory MCP (optional) — knowledge-graph persistent memory

⚠️ Hallucination risk: Claude may hallucinate facts even with citations. Always require inline [Source](URL) for every claim. Claude should fetch source page directly for verification, not trust search snippets.

Key Challenges

Duplicate notes: Instruct Claude to always search vault first before creating a new note
Wikilink breakage: Rename handling via Obsidian's internal link index; use stable lowercase filenames
Dataview slowness: On 10k+ vaults, use limit clause and cache queries
Agent drift: Set max session length; reference session-log.md explicitly to keep on track
Research quality degradation: Summarize and git-commit every 10 searches to prevent drift

💻

Codebase Explorer

Claude maps and explains any large codebase — outputs architectural docs to Obsidian

Setup: Medium Time to first result: 15–30min

What It Is

An AI-powered codebase exploration and documentation system. Claude reads local codebases, traces execution paths, maps dependencies, explains architecture, and auto-generates interconnected documentation within an Obsidian vault — turning a codebase into a searchable, navigable knowledge graph with ADRs, inline explanations, and architectural overviews.

Claude Code's Explore subagent (read-only, Haiku-powered) is optimized for codebase traversal at scale — no Write/Edit tools, purely exploratory.

Karpathy CLI Approach

Vault as working directory → Claude Code operates on local repo → generated docs written to Obsidian vault at docs/vault/:

Claude Code Explore subagent: Read-only traversal, designed for large codebases
Custom SKILL.md: Drives exploration loop with structured prompts
Vault output: docs/vault/{project-name}/ — can be separate git repo or subdirectory
Git history: git log/blame/show enrich understanding of decisions and evolution

Agentic Loop — Tools

Tool	Purpose
`Read`	Read file contents — primary data source, offset/limit for large files
`exec (grep/rg)`	Search patterns: function names, imports, comments
`exec (ctags)`	Symbol resolution — find where functions/types are defined
`exec (git log/blame)`	Git history analysis — who changed what, when, why
`exec (tree-sitter)`	AST parsing — structural extraction (optional but powerful)
`Write/Edit`	Write documentation to Obsidian vault

Core Prompts

Architecture explanation: "Identify layers, key abstractions, data flow, entry/exit points. Cite file paths and line numbers."
Dependency tracing: "Trace dependency chain — what it depends on, what depends on it. Identify circular dependencies."
Code review: "Design patterns, bugs, performance, conventions, error handling. Rate severity × confidence."
ADR generation: "Generate Architecture Decision Record. Base on git history analysis and code structure."
Git enrichment: "Who were primary authors? Major evolution points? Long-standing TODOs/FIXMEs?"

Obsidian Vault Structure

index.md

Project overview, key metrics, links

architecture.md

System-wide architectural overview

modules/

One .md per major module

adrs/

Architecture Decision Records (ADR-001.md…)

explorations/

Time-boxed exploration reports

code-graph/

Dependency graphs, call chains

search-indexes/

Dataview-generated indexes

Large Codebase Strategies

Hierarchical summarization: Explore module-by-module, summarize each, then synthesize at higher levels
Targeted exploration: Only read files relevant to query — never full repo at once
ctags TAGS file: Pre-generated, checked into repo — O(1) symbol-to-file lookup
Tree-sitter CLI: Optional but dramatically improves structural understanding
Exploration budget: Max 50 files read, max 5 subdirectories traversed per query

Key Plugins

Claude Code (core) Obsidian (documentation UI) ripgrep — fast pattern matching git — version control tree-sitter CLI (optional) Dataview + Templater

⚠️ AI-generated docs can be wrong: Cross-reference Claude's conclusions with git blame and actual code. Use tree-sitter for structural ground truth. Mark generated docs with "⚠️ AI-generated — verify assertions".

Key Challenges

Context overflow: Never read entire repo — always scope by directory or import graph
Stale docs: Regenerate periodically; include last-analyzed date; Dataview flags stale (>30 days)
Wikilink breakage: Always write .md before linking to it; use lowercase kebab filenames
Monorepo variants: One vault per package with root vault linking all sub-vaults

📋

Meeting Intelligence

Transcript → decisions + action items → Obsidian linked task graph

Setup: Medium Time to first result: 30–60min

What It Is

Claude ingests meeting content (raw notes, Zoom/Teams/Otter transcripts, or audio via Whisper), extracts decisions, action items with owners and deadlines, and writes them into Obsidian as a linked task graph. Every meeting becomes a living knowledge graph node — tasks surface automatically, owners are explicit, deadlines tracked via Dataview.

Meets the "dead notes" problem: decisions get forgotten, action items get lost. This system turns every meeting into an actionable, trackable knowledge graph — with zero manual bookkeeping after setup.

Karpathy CLI Approach

Pipeline: transcript file → shell script strips formatting/timestamps → Claude extracts structured JSON → Obsidian writes markdown notes:

Raw text: claude --print < meeting-notes.txt
Transcript formats: Zoom VTT, Teams TXT, Otter TXT/XLSX, Google Meet VTT — strip with sed/awk before processing
Audio: Whisper.cpp (local, CPU-friendly) → transcript.txt → Claude → Obsidian

Structured Output Schema

Claude outputs ONLY valid JSON in this exact schema:

{
  "meeting_title": "string",
  "meeting_date": "YYYY-MM-DD",
  "attendees": ["name1", "name2"],
  "summary": "2-3 sentence executive summary",
  "decisions": [{ "id": "DEC-1", "decision": "...", "owner": "..." }],
  "action_items": [{
    "id": "ACT-1",
    "description": "starts with verb",
    "owner": "full name",
    "deadline": "YYYY-MM-DD or null",
    "priority": "high|medium|low",
    "status": "open"
  }]
}

Obsidian Vault Structure

meetings/

YYYY-MM-DD-title.md — meeting note template

tasks/

ACT-N.md — individual task notes

people/

Person notes with meeting backlinks

projects/

Project notes linked to related meetings

Every task note has frontmatter: id, owner, deadline, priority, status, meeting_source and links back to the originating meeting.

Cross-Linking Graph

Meeting → Task: Task file path in meeting frontmatter; task links back via meeting_source
Meeting → Person: Attendee list in frontmatter; each person note has meetings: [...]
Meeting → Project: Topics/projects in frontmatter; project note has meetings: [...]
Task → Project: project: [[Project Name]] in task frontmatter

Dataview Queries for Task Tracking

Open actions:

TABLE owner, deadline, priority FROM #action-item
WHERE status = "open" AND deadline != null SORT deadline ASC

Overdue:

TABLE owner, deadline FROM #action-item
WHERE status = "open" AND deadline < date(today)

By person:

TABLE deadline, priority FROM #action-item
WHERE owner = "Alice Smith" SORT deadline ASC

Required Tools

Claude CLI Dataview + Templater Obsidian Tasks plugin Calendar plugin Whisper.cpp (optional, for audio)

⚠️ Implied vs explicit actions: People rarely say "ACTION: Alice will..." — they say "I can send that over" or "someone needs to follow up". Claude must infer from context. Implied actions should be flagged with lower confidence for user review.

Key Challenges

Noisy transcripts: Auto-generated transcripts have hallucinated words, speaker confusion. Always review action item extraction.
Context window limits: Very long meetings (>2 hours) chunk by 30-min segments, then merge outputs
Owner name disambiguation: "John", "John from engineering", "JS" all same person — deduplicate
Deadline inference: "next Tuesday" → meeting date + 7 days; complex verbal references need manual correction
Action items without owners: Flag as orphaned tasks for meeting organizer assignment

✍️

Writing Pipeline

Claude drafts from Obsidian context, learns your voice, writes consistently across months

Setup: Medium Time to first result: 1–2hr

What It Is

A self-contained AI writing pipeline where Claude operates as a persistent agent over an Obsidian vault, reading accumulated notes and context to draft documents — emails, reports, blog posts, documentation, presentations — in a consistent authorial voice. The vault acts as both knowledge base and drafting workspace.

Most AI writing tools start cold every session. This pipeline gives Claude persistent memory — writing that improves over time, accumulates knowledge, and maintains voice consistency.

Karpathy CLI Approach

Claude CLI + Obsidian MCP server via Local REST API plugin:

Read: Fetch relevant notes from vault (by topic, folder, or search) → Claude analyzes context packet (typically 5–15 notes)
Draft: Generate content using Templater template → write to vault/drafts/
Review: Show draft to user or auto-evaluate against criteria
Edit: Iterate draft in place
Commit: Git commit draft → promote to vault/writing/final/ when approved

Core Prompts

Drafting from context: "Read notes on [TOPIC]. Draft a [DOCUMENT TYPE] synthesizing this context. Maintain [STYLE]. Output to vault/drafts/[date]-[slug].md."
Style matching: "Analyze writing style of [voice samples]. Apply: [specific traits]. Match sentence length, paragraph structure, formality."
Tone adaptation: "Audience is [AUDIENCE]. Adjust: more formal/informal, technical/plain-language, persuasive/informative."
Outline expansion: "Expand each section of [outline-file.md] into full prose. Keep flowing, avoid bullet fragmentation."

Voice / Style Learning

Claude learns writing patterns by reading 3–5 representative documents from vault/style/ or vault/writing/final/:

Average sentence length, paragraph structure, vocabulary level
Passive vs active voice, bullet frequency, formality
Store summary in vault/notes/meta/authorial-voice.md — Claude reads it at every session start
After edits, save feedback as a note: what changed, why, what to maintain → compounding improvement

Writing Types & Turnaround

Type	Approach	Time
📧 Emails	Fast, context-loaded prompts, conversational	30–60s
📄 Reports	Longer context, outline-first, multiple iterations	10–30min
✍️ Blog posts	Flexible, engaging opening, strong conclusions	5–15min
📖 Documentation	Highly structured, code examples, wikilinks	10–20min
🎥 Presentations	Marp format, slide notes in YAML frontmatter	5–15min

Obsidian Vault Structure

vault/drafts/

Active work-in-progress, frequently git-committed

vault/writing/final/

Polished, approved documents

vault/templates/

Per-document-type Templater templates

vault/context/

Background knowledge notes

vault/style/

Voice corpus: 5–10 representative docs

Required Tools

Claude CLI Templater (required) Various Complements (auto-completion) Obsidian Git (required) Local REST API + Obsidian MCP

Key Challenges

MCP connection fragility: Local REST API plugin must be running in Obsidian — if closed, connection fails
Context window limits: Can't read entire vault — discipline to load only relevant context notes
Voice matching is approximate: Claude extracts patterns but subtle personal voice needs periodic correction/feedback
Export to DOCX/PDF: Requires Pandoc/Marp CLI — not built-in

📊

BI Dashboard + Analyst

Live Trino/Snowflake SQL → trend analysis → Obsidian notes with charts and anomaly alerts

Setup: Medium-High Time to first result: 1–2hr

What It Is

Claude Code runs as a headless BI analyst loop — on demand or on a schedule, it executes SQL against Trino/Snowflake, performs trend analysis, anomaly detection, and forecasting, then writes rich Obsidian notes with tables, trend indicators, inline charts, and Dataview queries — creating a living BI dashboard that's browsable, searchable, and annotatable.

Replaces expensive BI tooling with a flexible, text-first, LLM-powered analyst that writes its own queries and generates its own reports. Obsidian becomes a personal/team knowledge base that also happens to be a live BI surface.

Agentic Loop — 9 Steps

①Receive directive (trend/anomaly/forecast/scheduled report)

②Identify data source(s) and table(s) from schema registry

③Write SQL (or generate with LLM + schema context)

④Execute via trino-cli / snowsql; capture CSV/JSON

⑤Analyze in context: z-score, moving average, seasonality

⑥Render: tables, trend arrows (▲▼➤), Mermaid or Chart.js

⑦Write/update Obsidian note at vault/metrics/

⑧If anomaly → tag #alert, write alert note, notify

⑨Log source query in frontmatter for full provenance

Core Prompts

Trend analysis: "Query last N days of metric_column grouped by group_by. Return Markdown table with trend arrows + 7-day moving average. Write Obsidian note with Mermaid line chart."
Anomaly detection: "Compute z-score vs mean/stddev. Flag rows where |z| > 2.5. Write alert note at vault/alerts/ if anomalies found."
Forecast: "Using last 90 days of metric data, perform 14-day linear trend forecast. Overlay on chart. Write note with confidence intervals."
Data storytelling: "Latest 30-day results from vault/metrics/. Write narrative briefing: headline findings → drill-down → flag anything needing attention → next steps."

Chart Rendering — 6-Tier Strategy

Method	Complexity	Best For
🟢 Mermaid	Low	Simple lines, bars — no plugin needed
🟡 Grafana iframe	Low-Med	Interactive dashboards — embed via ![[chart.html]]
🟡 obsidian-charts plugin	Medium	Clean chart syntax, JSON/CSV data in note
🟡 Chart.js HTML embed	Medium	Rich charts, animations, tooltips
🔴 matplotlib PNG	Medium	Publication-quality static charts
🔴 Raw SVG	Low	LLM-generated, self-contained, animatable

Anomaly Detection Methods

Z-score: (value - mean) / stddev over 30-day rolling window. Flag if |z| > 2.5.
IQR: Flag values outside Q1 - 1.5×IQR and Q3 + 1.5×IQR. Better for skewed distributions.
% change: Flag if today's value deviates >X% from 7-day or 30-day moving average.
Seasonality-adjusted: Compare against same-day-of-week rolling average to avoid false alerts on weekly cycles.
MA crossover: Short-window MA crosses long-window MA = classic trend change signal.

Alert suppression: after firing, suppress repeats for 24h (same metric) to avoid alert fatigue.

12 Common Business Metrics

DAU

Daily Active Users

MRR

Monthly Recurring Revenue

ARPU

Revenue per User

Conversion Rate

CAC

Customer Acquisition Cost

Churn Rate

% monthly

NPS

Net Promoter Score

AOV

Average Order Value

Infra Cost Ratio

infra cost / revenue

Error Rate

API errors / total requests

P99 Latency

milliseconds

Queue Depth

messages pending

Cron Schedules

Schedule	Task	Output
`0 6 * * *`	Morning brief: key metrics yesterday	vault/metrics/YYYY/MM/daily-YYYY-MM-DD.md
`0 7 * * 1`	Weekly rollup: 7-day week-over-week	vault/reports/weekly-YYYY-WXX.md
`0 6 1 * *`	Monthly close: full month narrative	vault/reports/monthly-YYYY-MM.md
`/15 * * *`	Ops: queue depth, error rate, P99	vault/ops/YYYY-MM-DD-HHMM.md
`0 8 * * *`	14-day forecast refresh on revenue	vault/forecasts/forecast-YYYY-MM-DD.md

Required Tools

Claude Code CLI Trino CLI SnowSQL Dataview + Templater + Calendar Grafana (optional, for chart iframes) Python + pandas + matplotlib (optional) cron / systemd timer

⚠️ LLM hallucinating metric values: Always include raw SQL + result set in note frontmatter. Make reproducibility first-class. Never let Claude summarize without anchoring to actual values.

Key Challenges

Vault growth: Implement retention policy — archive notes >12 months to separate vault
Trino timeouts: Set CLI timeout (120s); prefer pre-aggregated tables; log and retry once
Snowflake credits: Use QUERY RESULT CACHE; prefer batch aggregation over ad-hoc per metric
Alert fatigue: Seasonality-adjust thresholds; implement cooldown; require consecutive breaches
No live refresh: Dataview re-runs on note open; use Auto Refresh plugin or Templater journal creation

📚

Learning System

Papers → summaries → flashcards → spaced repetition — all in Obsidian

Setup: Medium Time to first result: 1–2hr

What It Is

Claude reads papers (PDF), articles (URLs), and books; generates structured summaries; extracts key concepts; creates flashcards and Socratic questions; and writes them into Obsidian organized by topic. An SRS plugin (SM-2 or FSRS algorithm) schedules flashcard reviews and tracks retention — a closed-loop learning system where AI both ingests knowledge and generates review material.

Transforms passive reading into active learning with zero extra effort. The knowledge graph means concepts link across sources — building a personal wiki of learned material.

Agentic Loop — 7 Steps

①Ingest: PDF (pdftotext), URL (web_fetch), arXiv, local text

②Summarize: abstract, key findings, methodology, implications

③Extract concepts: 10–20 atomic facts, definitions, relationships

④Generate flashcards: basic Q/A, cloze deletion, multi-modal

⑤Socratic questions: reasoning over recall

⑥Write to Obsidian: source note + concept notes + flashcard note

⑦Schedule review: SM-2 or FSRS intervals, next-review dates

Flashcard Formats

Basic Q/A

Q: [question]
A: [answer]
Best for: definitions, facts

Cloze deletion

{{c1::hidden answer}} in context
Best for: formulas, fill-in-blank

Multi-modal

Image/diagram + explanation
Best for: anatomy, circuits, maps

Reversed cards

Auto-generate both directions
Forces deeper understanding

Socratic

Question + reasoning path + application
Requires reasoning, not recall

Spaced Repetition Algorithms

SM-2 (SuperMemo 2)

Classic algorithm. Ease factor starts at 2.5. Intervals: 1d → 6d → interval × EF. Quality ratings 0–5. Reliable, well-understood.

FSRS (Free Spaced Repetition)

Modern neural-network-based. Uses 9 stability/hardship parameters. Better predictions, less "鬼 rate" than SM-2. Available in Anki 23.10+ and Obsidian SRS plugins.

Review cycles: limit new cards/day (20–30) to avoid overwhelm. Review all due cards before adding new ones.

Obsidian Vault Structure

sources/papers/

Paper summaries by subject

sources/articles/

Article summaries by subject

concepts/

Atomic concept notes, linked to sources

flashcards/

Batch flashcards by date

daily/

Daily review logs

templates/

source, concept, flashcard templates

Frontmatter Schema

type: source|concept|flashcard
source-type: paper|article|book|video
title:
author:
date:
subject:
summary: (for sources)
concepts: [tag1, tag2]
read: false  (sources)
review-count: 0
last-review:
next-review:
rating: 0  (SM-2 ease × 100)
interval: 1  (days)
source-url:

Dataview Queries

Due today: TABLE WHERE next-review <= date(today) AND type = "flashcard"
Overdue: TABLE WHERE next-review < date(today)
Retention stats: TABLE avg(rating), count() GROUP BY subject
By subject: TABLE key, value FROM "concepts" WHERE subject = "ml"

Key Plugins

Templater Dataview obsidian-spaced-repetition Natural Language Dates PDF Highlights pdftotext / pdfminer (optional)

Key Challenges

Flashcard quality: AI-generated cards can be superficial — always review before scheduling
Long papers: Process in chunks (abstract+intro, methods, results, conclusion) — then merge
SRS plugin quality: Community plugins vary — test multiple; check recent commit activity
Review overwhelm: Start strict (10–15 new cards/day); quality over quantity
Math notation: Use LaTeX in Obsidian (native support); cloze cards work well for formulas

Quick Comparison — All 8 Use Cases

Use Case	Setup	Time to Result	Key Data Source	Output
🧠 Knowledge RAG	Medium	30min–2hr	Obsidian vault	Linked atomic notes
🗄️ Data Agent	Medium	2–4hr	Trino + Snowflake	SQL + results + schema docs
🌐 Research	Medium	15–30min	Web search	Linked topic notes + citations
💻 Codebase	Medium	15–30min	Local codebase	Docs + ADRs + architecture
📋 Meetings	Medium	30–60min	Transcripts / audio	Decisions + task graph
✍️ Writing	Medium	1–2hr	Obsidian vault	Drafts + final documents
📊 BI Dashboard	Medium-High	1–2hr	Trino + Snowflake	Metric notes + alerts + forecasts
📚 Learning	Medium	1–2hr	PDFs / URLs / papers	Summaries + flashcards + SRS

Common Threads Across All 8 Use Cases

Karpathy CLI pattern: All use cases use a simple tool-calling loop — Claude CLI reads files, reasons, writes back. No GUI automation.
Obsidian as universal output layer: Every use case writes to Obsidian. The vault structure differs per case, but the pattern is identical: write structured Markdown with YAML frontmatter.
Dataview + Templater: These two plugins appear in every use case — Dataview for querying the vault, Templater for structured note generation.
Session persistence: Every use case needs a session context mechanism — session-log.md, query history, daily notes, git commits.
Security model: API keys in ~/.env (0600 perms), never in vault or prompts. Local REST API binds to localhost only.
Modular tool stack: Each use case composes the same base tools (Claude CLI, Obsidian, git) + use-case-specific tools (Trino CLI, SnowSQL, Whisper, etc.)