Claude CLI + Obsidian
8 Karpathy-Style AI Agent Use Cases β Deep Research Report
π§ AI Agents
π Obsidian
π₯οΈ CLI-First
ποΈ Trino + Snowflake
π Data Intelligence
π
April 18, 2026
Executive Summary
This report covers 8 high-value use cases for building a personal AI agent stack using Claude CLI (Karpathy-style: minimal, CLI-first, tool-calling loop) wired to Obsidian as the knowledge base, with external data connectors (Trino, Snowflake, web search, git) feeding context in.
Each use case is analyzed across: implementation approach, agentic loop pattern, core prompts, Obsidian vault structure, required tools, security model, and complexity. All 8 are implementable today with existing tools.
Data Agent focus: Trino + Snowflake connectors are deeply covered in Use Case 2 (Data Lake/DB Agent) and Use Case 7 (BI Dashboard), with schema knowledge base patterns, efficient SQL generation, and least-privilege security.
What It Is
A local AI knowledge management stack pairing Obsidian (as a Zettelkasten vault) with a CLI agent that reads/writes notes, queries semantic memory via Dataview, and autonomously maintains bidirectional links. The agent treats the vault as its memory layer β creating, linking, and surfacing notes on behalf of the user.
Unlike cloud-based "Second Brain" tools, everything stays local. Obsidian vaults are plain Markdown files β no vendor lock-in, no subscription, no data leaving your machine.
Karpathy CLI Approach
The Karpathy pattern: vault as working directory β Claude reads/writes Markdown files directly. Tool access via ripgrep (search), glob, and Dataview HTTP queries. Three wiring options:
- Claude Code CLI β
claude --dir ~/vault with project config pointing at vault root
- Custom Python agent β llm API calls with filesystem tools
- MCP server β Obsidian Local REST API plugin enables structured Dataview queries from CLI
Agentic Loop Pattern
β User query β Perceive
β‘vault_search / dataview_query β Recall (ranked notes)
β’LLM synthesizes β Reason
β£vault_write / vault_link / vault_tag β Act (new note, link, update)
Tools needed: vault_read, vault_search, vault_write, vault_glob, dataview_query, vault_link, vault_tag
Core Prompts
- System prompt: "You are a personal knowledge management agent. Your memory is an Obsidian vault. All notes are .md with YAML frontmatter. Prefer creating/updating notes over giving long answers. Link with wikilinks. Tag in frontmatter."
- Summarize & save: "summarize my notes on X, create a permanent note"
- Research: "I read an article about Y, save key takeaways to Inbox"
- Query: "what do I know about Z, find and link related notes"
Obsidian Vault Structure
Inbox/
Unsorted capture, process later
Permanent/
Decontextualized evergreen notes
References/
Literature with bibliographic metadata
Projects/
Project-specific notes
Journal/
Daily notes (Daily Notes plugin)
memory/
Agent session state & context
Templates/
Templater templates
Obsidian Plugins
Dataview β frontmatter indexing & live queries
Templater β dynamic note templates
Local REST API β HTTP vault queries for CLI agents
Daily Notes β built-in session journal
Obsidian Git β vault backup
Key Challenges & Gotchas
- Context window overflow β large vaults exceed LLM context. Use two-phase retrieve-then-reason with Dataview pre-filtering.
- Wikilink case sensitivity β APFS (macOS) is case-insensitive; ext4 (Linux) is sensitive. Use all-lowercase kebab-case filenames.
- Dataview index lag β queries may be stale if Obsidian is closed. Trigger refresh via Local REST API after bulk writes.
- Daily note conflicts β agent and user writing simultaneously clobber each other. Use separate
memory/sessions/ folder for agent.
- CORS on Local REST API β if agent runs on different machine via SSH, tunnel or configure API auth.
What It Is
A CLI agent (Karpathy-style Python REPL loop + Claude API) that uses Obsidian as a persistent schema/context knowledge base, connects natively to Trino (federated SQL across S3/ADLS/GCS/Hive) and Snowflake (enterprise DW), translates natural language questions into self-verified optimized SQL, and returns human-readable results with full provenance.
Trino provides federated SQL across data lake formats (Parquet, ORC, Avro, JSON) without moving data. Snowflake connector in Trino allows joining data lake + DW tables in a single query.
Karpathy CLI Architecture
Python script REPL loop wrapping trino CLI + snow CLI + Obsidian file reads:
- System prompt: Defines persona (DB query expert), available tools (trino_query, snow_query, obsidian_search, validate_sql), behavior rules
- Tool layer: Python functions wrapping CLI commands β
trino --server https://host:8443 --catalog aws -f query.sql
- Obsidian layer:
obsidian_search(query) calls grep/Dataview API to find relevant schema docs
- Two backends: Agent decides whether to use Trino or Snowflake based on question topic
10 Core Tools
| Tool | What It Does |
trino_describe | DESCRIBE table via Trino CLI β column names, types, partition keys |
trino_list_tables | SHOW CATALOGS / SHOW TABLES FROM schema |
trino_query | Execute SQL, return JSON/CSV results + stats |
snow_describe | Describe Snowflake table metadata |
snow_list_tables | List Snowflake tables/row counts |
snow_query | Execute Snowflake SQL via snow CLI |
obsidian_search | Search vault for table descriptions, business definitions |
obsidian_read | Read specific schema note from vault |
validate_sql | EXPLAIN dry-run before execution |
format_results | Convert JSON/CSV to human-readable markdown |
Agentic Loop
β User NL question β Decide: Trino or Snowflake?
β‘Search Obsidian for relevant schema docs
β’Generate SQL with explanation
β£Validate SQL via dry-run (EXPLAIN)
β€Execute query
β₯Format and return results
β¦Log query to Obsidian query history
Trino Connector β Key Patterns
- Predicate pushdown: Include partition key filters (e.g.,
WHERE date = '2024-01-01') to leverage Trino's connector-level filter optimization
- Sampling: Use
TABLESAMPLE (BERNOULLI(1)) for quick exploration before full analysis
- File formats: ORC (default, best compression), Parquet (widely compatible), Avro, JSON, CSV
- Auth: Certificate-based or service account β avoid passwords. LDAP, OAuth2, Kerberos for enterprise
- Trino β Snowflake: Native connector queries Snowflake directly; allows JOINing Snowflake dims with S3 facts in one SQL statement
Snowflake Connector β Key Patterns
- Dedicated warehouse: Agent uses a dedicated warehouse (e.g.,
COMPUTE_WH) with auto-suspend (60s) to avoid idle costs
- Key pair auth: More secure than password β store private key passphrase in env var or secrets manager
- Read-only role:
GRANT USAGE + SELECT only β never INSERT/UPDATE/DELETE/SYSADMIN
- Micro-partitions: Queries benefit from predicates on clustered columns β align filters with clustering keys
- Result caching: Use identical query text for repeated exploration to hit Snowflake's query result cache
Efficient SQL Patterns
- Partition pushdown: Always include partition key predicates β without them, Trino scans all partitions
- Aggregation pushdown: Let Trino/Snowflake aggregate at the storage layer β never pull raw rows
- LIMIT rows early: Apply
LIMIT 10000 immediately to reduce row volume before expensive operations
- JOIN ordering: Use smaller table as build side in hash joins; LEFT JOIN for dimension tables
- EXPLAIN first: Always run
EXPLAIN before execution on large tables β reject if scan > 100GB or rows > 1B
- APPROX_DISTINCT: Use for high-cardinality counts to reduce payload
Obsidian Schema Knowledge Base
/schema/
Table schemas by catalog/schema, one file per table
/metrics/
Business metric definitions + calculation SQL
/query-history/
Daily log of queries run with NL + SQL
/business-glossary/
Term definitions and data dictionary
/data-quality/
Known issues, null semantics, update frequencies
Each schema note has YAML frontmatter: type: table-schema, catalog, schema, partitionKeys, clusterKeys, rowEstimate, lastUpdated. Dataview queries find tables related to any topic.
Query Self-Verification
- Run
EXPLAIN on generated SQL before executing β verify table names, column references, row estimates, no Cartesian products
- Cross-reference column names with Obsidian schema docs β catch typos before execution
- Check JOIN key types match (string vs int vs date) β type mismatch causes silent wrong results
- Validate WHERE predicates: no invalid date ranges, no comparisons on non-comparable types
- Spot-check results against Obsidian sample data for semantic correctness
Security & Auth
π Credential Management
Trino: certificate or key pair auth in env vars. Snowflake: key pair + passphrase in env var. Obsidian vault: encrypted path or access-controlled sync.
π‘οΈ Least-Privilege Roles
Trino: read-only on catalogs needed. Snowflake: USAGE + SELECT only, no INSERT/UPDATE/DELETE. Trino-to-Snowflake connector: dedicated read-only Snowflake user.
π« Column Masking
Snowflake masking policies on PII columns. Agent sees masked values unless unmasked role granted (it shouldn't be).
π Result Size Limits
Enforce max rows returned (e.g., 10,000). Pre-aggregate at DB layer. Use APPROX_COUNT. Never pull raw fact table rows into LLM context.
Required Tools
trino-cli β Trino CLI JAR
snowsql β Snowflake CLI
snowflake-connector-python
Dataview + Templater β Obsidian plugins
Python wrapper (dbagent package)
anthropic Python SDK
Key Challenges
- Schema drift: Tables change but Obsidian docs go stale β agent should detect inconsistencies on describe and flag
- Vault size: Large vaults (10k+ notes) can't fit in context β use Dataview to dynamically filter per question
- SQL dialect differences: Trino uses HiveQL derivatives; Snowflake uses its own dialect β agent must route correctly
- Snowflake warehouse cold start: ~60s wake-up cost if suspended β warn about this
- Query cost: Snowflake charges per credit β always run EXPLAIN first and warn about expensive queries
What It Is
Claude conducts deep web research, synthesizes findings into atomic markdown notes, and automatically links them into a growing Obsidian knowledge graph. Every research session feeds a compounding graph β future queries leverage prior work so research on related topics auto-surfaces old notes.
Core principle: every factual claim gets an inline [Source](URL) citation. Claude fetches source pages directly to verify, not just trust search snippets.
Karpathy CLI Approach
Terminal-first, no GUI automation of Obsidian β just direct file I/O on .md files. Claude operates as the agent brain; Obsidian is the viewer/output:
claude --dir ~/vault β vault as working directory
- WebSearch + WebFetch built-in (no MCP needed) β Claude issues searches, reads top results
- Optional: Brave Search MCP for higher-quality results with free API key
Agentic Loop
OBSERVE WebSearch + WebFetch + vault_read
THINK LLM synthesizes, identifies gaps, decides next search
ACT Write new note, Edit existing, Bash git/Dataview
Loop terminates when: question answered, Claude signals diminishing returns, or user interrupts. Subagents can run parallel research branches.
Core Prompts
- Research system: "Always cite sources with [Source](URL). Create one note per concept (atomic). Always link to related notes with wikilinks. If note exists, extend it β don't duplicate."
- Deep research: Issue search β read 2-3 source pages β write atomic note β link to related notes β add #Sources block
- Note synthesis: Review last 5 inbox notes, identify top 3 concepts, create synthesis note connecting all items
- Link discovery: Run Dataview for orphaned notes, search web for how orphaned concept relates to other topics
- Session resume: Read session-log.md, summarize last thread, suggest next searches
Obsidian Vault Structure
inbox/
Raw research output, not yet organized
topics/
Curated atomic notes, one per concept
papers/
Paper summaries and annotations
people/
Notes on individuals
projects/
Multi-topic research projects
daily/
Daily research logs (YYYY-MM-DD.md)
session-log.md
Current session context at root
index.md
Hub note with Dataview queries
Session Persistence
A session-log.md at vault root records: current research question, what was searched, what was found, what remains open, suggested next steps. Updated at end of each session. On startup, Claude reads it first.
Git commits provide full history β git log can reconstruct past research threads. The vault becomes a fully versioned, queryable knowledge base.
Source Tracking
- Inline citation: Every factual claim ends with
[Source](https://...)
- Frontmatter:
source: https://... and sources: [...] in YAML
- Source block: Notes end with
# Sources block listing all URLs
- Provenance: Note's
created date + git commit history = full provenance trail
Key Plugins
Dataview β query vault as database
Templater β dynamic note templates
Recent Files β surface recently touched notes
Natural Language Dates β type "tomorrow", "next week" in dates
Brave Search MCP (optional) β higher quality search
Memory MCP (optional) β knowledge-graph persistent memory
β οΈ Hallucination risk: Claude may hallucinate facts even with citations. Always require inline [Source](URL) for every claim. Claude should fetch source page directly for verification, not trust search snippets.
Key Challenges
- Duplicate notes: Instruct Claude to always search vault first before creating a new note
- Wikilink breakage: Rename handling via Obsidian's internal link index; use stable lowercase filenames
- Dataview slowness: On 10k+ vaults, use
limit clause and cache queries
- Agent drift: Set max session length; reference session-log.md explicitly to keep on track
- Research quality degradation: Summarize and git-commit every 10 searches to prevent drift
What It Is
An AI-powered codebase exploration and documentation system. Claude reads local codebases, traces execution paths, maps dependencies, explains architecture, and auto-generates interconnected documentation within an Obsidian vault β turning a codebase into a searchable, navigable knowledge graph with ADRs, inline explanations, and architectural overviews.
Claude Code's Explore subagent (read-only, Haiku-powered) is optimized for codebase traversal at scale β no Write/Edit tools, purely exploratory.
Karpathy CLI Approach
Vault as working directory β Claude Code operates on local repo β generated docs written to Obsidian vault at docs/vault/:
- Claude Code Explore subagent: Read-only traversal, designed for large codebases
- Custom SKILL.md: Drives exploration loop with structured prompts
- Vault output:
docs/vault/{project-name}/ β can be separate git repo or subdirectory
- Git history:
git log/blame/show enrich understanding of decisions and evolution
Agentic Loop β Tools
| Tool | Purpose |
Read | Read file contents β primary data source, offset/limit for large files |
exec (grep/rg) | Search patterns: function names, imports, comments |
exec (ctags) | Symbol resolution β find where functions/types are defined |
exec (git log/blame) | Git history analysis β who changed what, when, why |
exec (tree-sitter) | AST parsing β structural extraction (optional but powerful) |
Write/Edit | Write documentation to Obsidian vault |
Core Prompts
- Architecture explanation: "Identify layers, key abstractions, data flow, entry/exit points. Cite file paths and line numbers."
- Dependency tracing: "Trace dependency chain β what it depends on, what depends on it. Identify circular dependencies."
- Code review: "Design patterns, bugs, performance, conventions, error handling. Rate severity Γ confidence."
- ADR generation: "Generate Architecture Decision Record. Base on git history analysis and code structure."
- Git enrichment: "Who were primary authors? Major evolution points? Long-standing TODOs/FIXMEs?"
Obsidian Vault Structure
index.md
Project overview, key metrics, links
architecture.md
System-wide architectural overview
modules/
One .md per major module
adrs/
Architecture Decision Records (ADR-001.mdβ¦)
explorations/
Time-boxed exploration reports
code-graph/
Dependency graphs, call chains
search-indexes/
Dataview-generated indexes
Large Codebase Strategies
- Hierarchical summarization: Explore module-by-module, summarize each, then synthesize at higher levels
- Targeted exploration: Only read files relevant to query β never full repo at once
- ctags TAGS file: Pre-generated, checked into repo β O(1) symbol-to-file lookup
- Tree-sitter CLI: Optional but dramatically improves structural understanding
- Exploration budget: Max 50 files read, max 5 subdirectories traversed per query
Key Plugins
Claude Code (core)
Obsidian (documentation UI)
ripgrep β fast pattern matching
git β version control
tree-sitter CLI (optional)
Dataview + Templater
β οΈ AI-generated docs can be wrong: Cross-reference Claude's conclusions with git blame and actual code. Use tree-sitter for structural ground truth. Mark generated docs with "β οΈ AI-generated β verify assertions".
Key Challenges
- Context overflow: Never read entire repo β always scope by directory or import graph
- Stale docs: Regenerate periodically; include
last-analyzed date; Dataview flags stale (>30 days)
- Wikilink breakage: Always write .md before linking to it; use lowercase kebab filenames
- Monorepo variants: One vault per package with root vault linking all sub-vaults
What It Is
Claude ingests meeting content (raw notes, Zoom/Teams/Otter transcripts, or audio via Whisper), extracts decisions, action items with owners and deadlines, and writes them into Obsidian as a linked task graph. Every meeting becomes a living knowledge graph node β tasks surface automatically, owners are explicit, deadlines tracked via Dataview.
Meets the "dead notes" problem: decisions get forgotten, action items get lost. This system turns every meeting into an actionable, trackable knowledge graph β with zero manual bookkeeping after setup.
Karpathy CLI Approach
Pipeline: transcript file β shell script strips formatting/timestamps β Claude extracts structured JSON β Obsidian writes markdown notes:
- Raw text:
claude --print < meeting-notes.txt
- Transcript formats: Zoom VTT, Teams TXT, Otter TXT/XLSX, Google Meet VTT β strip with sed/awk before processing
- Audio: Whisper.cpp (local, CPU-friendly) β transcript.txt β Claude β Obsidian
Structured Output Schema
Claude outputs ONLY valid JSON in this exact schema:
{
"meeting_title": "string",
"meeting_date": "YYYY-MM-DD",
"attendees": ["name1", "name2"],
"summary": "2-3 sentence executive summary",
"decisions": [{ "id": "DEC-1", "decision": "...", "owner": "..." }],
"action_items": [{
"id": "ACT-1",
"description": "starts with verb",
"owner": "full name",
"deadline": "YYYY-MM-DD or null",
"priority": "high|medium|low",
"status": "open"
}]
}
Obsidian Vault Structure
meetings/
YYYY-MM-DD-title.md β meeting note template
tasks/
ACT-N.md β individual task notes
people/
Person notes with meeting backlinks
projects/
Project notes linked to related meetings
Every task note has frontmatter: id, owner, deadline, priority, status, meeting_source and links back to the originating meeting.
Cross-Linking Graph
- Meeting β Task: Task file path in meeting frontmatter; task links back via
meeting_source
- Meeting β Person: Attendee list in frontmatter; each person note has
meetings: [...]
- Meeting β Project: Topics/projects in frontmatter; project note has
meetings: [...]
- Task β Project:
project: [[Project Name]] in task frontmatter
Dataview Queries for Task Tracking
Open actions:
TABLE owner, deadline, priority FROM #action-item
WHERE status = "open" AND deadline != null SORT deadline ASC
Overdue:
TABLE owner, deadline FROM #action-item
WHERE status = "open" AND deadline < date(today)
By person:
TABLE deadline, priority FROM #action-item
WHERE owner = "Alice Smith" SORT deadline ASC
Required Tools
Claude CLI
Dataview + Templater
Obsidian Tasks plugin
Calendar plugin
Whisper.cpp (optional, for audio)
β οΈ Implied vs explicit actions: People rarely say "ACTION: Alice will..." β they say "I can send that over" or "someone needs to follow up". Claude must infer from context. Implied actions should be flagged with lower confidence for user review.
Key Challenges
- Noisy transcripts: Auto-generated transcripts have hallucinated words, speaker confusion. Always review action item extraction.
- Context window limits: Very long meetings (>2 hours) chunk by 30-min segments, then merge outputs
- Owner name disambiguation: "John", "John from engineering", "JS" all same person β deduplicate
- Deadline inference: "next Tuesday" β meeting date + 7 days; complex verbal references need manual correction
- Action items without owners: Flag as orphaned tasks for meeting organizer assignment
What It Is
A self-contained AI writing pipeline where Claude operates as a persistent agent over an Obsidian vault, reading accumulated notes and context to draft documents β emails, reports, blog posts, documentation, presentations β in a consistent authorial voice. The vault acts as both knowledge base and drafting workspace.
Most AI writing tools start cold every session. This pipeline gives Claude persistent memory β writing that improves over time, accumulates knowledge, and maintains voice consistency.
Karpathy CLI Approach
Claude CLI + Obsidian MCP server via Local REST API plugin:
- Read: Fetch relevant notes from vault (by topic, folder, or search) β Claude analyzes context packet (typically 5β15 notes)
- Draft: Generate content using Templater template β write to
vault/drafts/
- Review: Show draft to user or auto-evaluate against criteria
- Edit: Iterate draft in place
- Commit: Git commit draft β promote to
vault/writing/final/ when approved
Core Prompts
- Drafting from context: "Read notes on [TOPIC]. Draft a [DOCUMENT TYPE] synthesizing this context. Maintain [STYLE]. Output to vault/drafts/[date]-[slug].md."
- Style matching: "Analyze writing style of [voice samples]. Apply: [specific traits]. Match sentence length, paragraph structure, formality."
- Tone adaptation: "Audience is [AUDIENCE]. Adjust: more formal/informal, technical/plain-language, persuasive/informative."
- Outline expansion: "Expand each section of [outline-file.md] into full prose. Keep flowing, avoid bullet fragmentation."
Voice / Style Learning
Claude learns writing patterns by reading 3β5 representative documents from vault/style/ or vault/writing/final/:
- Average sentence length, paragraph structure, vocabulary level
- Passive vs active voice, bullet frequency, formality
- Store summary in
vault/notes/meta/authorial-voice.md β Claude reads it at every session start
- After edits, save feedback as a note: what changed, why, what to maintain β compounding improvement
Writing Types & Turnaround
| Type | Approach | Time |
| π§ Emails | Fast, context-loaded prompts, conversational | 30β60s |
| π Reports | Longer context, outline-first, multiple iterations | 10β30min |
| βοΈ Blog posts | Flexible, engaging opening, strong conclusions | 5β15min |
| π Documentation | Highly structured, code examples, wikilinks | 10β20min |
| π₯ Presentations | Marp format, slide notes in YAML frontmatter | 5β15min |
Obsidian Vault Structure
vault/drafts/
Active work-in-progress, frequently git-committed
vault/writing/final/
Polished, approved documents
vault/templates/
Per-document-type Templater templates
vault/context/
Background knowledge notes
vault/style/
Voice corpus: 5β10 representative docs
Required Tools
Claude CLI
Templater (required)
Various Complements (auto-completion)
Obsidian Git (required)
Local REST API + Obsidian MCP
Key Challenges
- MCP connection fragility: Local REST API plugin must be running in Obsidian β if closed, connection fails
- Context window limits: Can't read entire vault β discipline to load only relevant context notes
- Voice matching is approximate: Claude extracts patterns but subtle personal voice needs periodic correction/feedback
- Export to DOCX/PDF: Requires Pandoc/Marp CLI β not built-in
What It Is
Claude Code runs as a headless BI analyst loop β on demand or on a schedule, it executes SQL against Trino/Snowflake, performs trend analysis, anomaly detection, and forecasting, then writes rich Obsidian notes with tables, trend indicators, inline charts, and Dataview queries β creating a living BI dashboard that's browsable, searchable, and annotatable.
Replaces expensive BI tooling with a flexible, text-first, LLM-powered analyst that writes its own queries and generates its own reports. Obsidian becomes a personal/team knowledge base that also happens to be a live BI surface.
Agentic Loop β 9 Steps
β Receive directive (trend/anomaly/forecast/scheduled report)
β‘Identify data source(s) and table(s) from schema registry
β’Write SQL (or generate with LLM + schema context)
β£Execute via trino-cli / snowsql; capture CSV/JSON
β€Analyze in context: z-score, moving average, seasonality
β₯Render: tables, trend arrows (β²βΌβ€), Mermaid or Chart.js
β¦Write/update Obsidian note at vault/metrics/
β§If anomaly β tag #alert, write alert note, notify
β¨Log source query in frontmatter for full provenance
Core Prompts
- Trend analysis: "Query last N days of metric_column grouped by group_by. Return Markdown table with trend arrows + 7-day moving average. Write Obsidian note with Mermaid line chart."
- Anomaly detection: "Compute z-score vs mean/stddev. Flag rows where |z| > 2.5. Write alert note at vault/alerts/ if anomalies found."
- Forecast: "Using last 90 days of metric data, perform 14-day linear trend forecast. Overlay on chart. Write note with confidence intervals."
- Data storytelling: "Latest 30-day results from vault/metrics/. Write narrative briefing: headline findings β drill-down β flag anything needing attention β next steps."
Chart Rendering β 6-Tier Strategy
| Method | Complexity | Best For |
| π’ Mermaid | Low | Simple lines, bars β no plugin needed |
| π‘ Grafana iframe | Low-Med | Interactive dashboards β embed via ![[chart.html]] |
| π‘ obsidian-charts plugin | Medium | Clean chart syntax, JSON/CSV data in note |
| π‘ Chart.js HTML embed | Medium | Rich charts, animations, tooltips |
| π΄ matplotlib PNG | Medium | Publication-quality static charts |
| π΄ Raw SVG | Low | LLM-generated, self-contained, animatable |
Anomaly Detection Methods
- Z-score:
(value - mean) / stddev over 30-day rolling window. Flag if |z| > 2.5.
- IQR: Flag values outside
Q1 - 1.5ΓIQR and Q3 + 1.5ΓIQR. Better for skewed distributions.
- % change: Flag if today's value deviates >X% from 7-day or 30-day moving average.
- Seasonality-adjusted: Compare against same-day-of-week rolling average to avoid false alerts on weekly cycles.
- MA crossover: Short-window MA crosses long-window MA = classic trend change signal.
Alert suppression: after firing, suppress repeats for 24h (same metric) to avoid alert fatigue.
12 Common Business Metrics
MRR
Monthly Recurring Revenue
CAC
Customer Acquisition Cost
Infra Cost Ratio
infra cost / revenue
Error Rate
API errors / total requests
Queue Depth
messages pending
Cron Schedules
| Schedule | Task | Output |
0 6 * * * | Morning brief: key metrics yesterday | vault/metrics/YYYY/MM/daily-YYYY-MM-DD.md |
0 7 * * 1 | Weekly rollup: 7-day week-over-week | vault/reports/weekly-YYYY-WXX.md |
0 6 1 * * | Monthly close: full month narrative | vault/reports/monthly-YYYY-MM.md |
*/15 * * * * | Ops: queue depth, error rate, P99 | vault/ops/YYYY-MM-DD-HHMM.md |
0 8 * * * | 14-day forecast refresh on revenue | vault/forecasts/forecast-YYYY-MM-DD.md |
Required Tools
Claude Code CLI
Trino CLI
SnowSQL
Dataview + Templater + Calendar
Grafana (optional, for chart iframes)
Python + pandas + matplotlib (optional)
cron / systemd timer
β οΈ LLM hallucinating metric values: Always include raw SQL + result set in note frontmatter. Make reproducibility first-class. Never let Claude summarize without anchoring to actual values.
Key Challenges
- Vault growth: Implement retention policy β archive notes >12 months to separate vault
- Trino timeouts: Set CLI timeout (120s); prefer pre-aggregated tables; log and retry once
- Snowflake credits: Use QUERY RESULT CACHE; prefer batch aggregation over ad-hoc per metric
- Alert fatigue: Seasonality-adjust thresholds; implement cooldown; require consecutive breaches
- No live refresh: Dataview re-runs on note open; use Auto Refresh plugin or Templater journal creation
What It Is
Claude reads papers (PDF), articles (URLs), and books; generates structured summaries; extracts key concepts; creates flashcards and Socratic questions; and writes them into Obsidian organized by topic. An SRS plugin (SM-2 or FSRS algorithm) schedules flashcard reviews and tracks retention β a closed-loop learning system where AI both ingests knowledge and generates review material.
Transforms passive reading into active learning with zero extra effort. The knowledge graph means concepts link across sources β building a personal wiki of learned material.
Agentic Loop β 7 Steps
β Ingest: PDF (pdftotext), URL (web_fetch), arXiv, local text
β‘Summarize: abstract, key findings, methodology, implications
β’Extract concepts: 10β20 atomic facts, definitions, relationships
β£Generate flashcards: basic Q/A, cloze deletion, multi-modal
β€Socratic questions: reasoning over recall
β₯Write to Obsidian: source note + concept notes + flashcard note
β¦Schedule review: SM-2 or FSRS intervals, next-review dates
Flashcard Formats
Basic Q/A
Q: [question]
A: [answer]
Best for: definitions, facts
Cloze deletion
{{c1::hidden answer}} in context
Best for: formulas, fill-in-blank
Multi-modal
Image/diagram + explanation
Best for: anatomy, circuits, maps
Reversed cards
Auto-generate both directions
Forces deeper understanding
Socratic
Question + reasoning path + application
Requires reasoning, not recall
Spaced Repetition Algorithms
SM-2 (SuperMemo 2)
Classic algorithm. Ease factor starts at 2.5. Intervals: 1d β 6d β interval Γ EF. Quality ratings 0β5. Reliable, well-understood.
FSRS (Free Spaced Repetition)
Modern neural-network-based. Uses 9 stability/hardship parameters. Better predictions, less "ι¬Ό rate" than SM-2. Available in Anki 23.10+ and Obsidian SRS plugins.
Review cycles: limit new cards/day (20β30) to avoid overwhelm. Review all due cards before adding new ones.
Obsidian Vault Structure
sources/papers/
Paper summaries by subject
sources/articles/
Article summaries by subject
concepts/
Atomic concept notes, linked to sources
flashcards/
Batch flashcards by date
templates/
source, concept, flashcard templates
Frontmatter Schema
type: source|concept|flashcard
source-type: paper|article|book|video
title:
author:
date:
subject:
summary: (for sources)
concepts: [tag1, tag2]
read: false (sources)
review-count: 0
last-review:
next-review:
rating: 0 (SM-2 ease Γ 100)
interval: 1 (days)
source-url:
Dataview Queries
- Due today:
TABLE WHERE next-review <= date(today) AND type = "flashcard"
- Overdue:
TABLE WHERE next-review < date(today)
- Retention stats:
TABLE avg(rating), count() GROUP BY subject
- By subject:
TABLE key, value FROM "concepts" WHERE subject = "ml"
Key Plugins
Templater
Dataview
obsidian-spaced-repetition
Natural Language Dates
PDF Highlights
pdftotext / pdfminer (optional)
Key Challenges
- Flashcard quality: AI-generated cards can be superficial β always review before scheduling
- Long papers: Process in chunks (abstract+intro, methods, results, conclusion) β then merge
- SRS plugin quality: Community plugins vary β test multiple; check recent commit activity
- Review overwhelm: Start strict (10β15 new cards/day); quality over quantity
- Math notation: Use LaTeX in Obsidian (native support); cloze cards work well for formulas