- What jCodeMunch actually does
- 1. Quick Start
- 2. Add jCodeMunch to your MCP client
- 3. Tell your agent to use jCodeMunch
- 4. Your first useful workflows
- 5. Core mental model
- 6. Tool reference
- 7. How search works
- 8. Token savings
- 9. Live token savings counter
- 10. Community savings meter
- 11. Local LLM tuning
- 12. Storage and indexing
- 13. Troubleshooting
- 14. Best practices
- 15. Prompting the agent
- 16. Final advice
Reference docs
jCodeMunch helps AI agents explore codebases without reading the whole damn file every time.
Most agents inspect repos the expensive way:
- open a large file
- skim hundreds or thousands of lines
- extract one useful function
- repeat somewhere else
- quietly set your token budget on fire
jCodeMunch replaces that with structured retrieval.
It indexes a repository once, extracts symbols with tree-sitter, stores metadata plus byte offsets into the original source, and lets your MCP-compatible agent retrieve only the code it actually needs. That is why token savings can be dramatic in retrieval-heavy workflows. :contentReference[oaicite:2]{index=2}
If you only remember one thing from this guide, make it this:
jCodeMunch is not magic because it is installed.
It is powerful because your agent uses it instead of brute-reading files.
pip install jcodemunch-mcpVerify the install:
jcodemunch-mcp --helpFor MCP client configuration, uvx is usually the better choice because it runs the package on demand and avoids PATH headaches.
Fastest setup:
claude mcp add jcodemunch uvx jcodemunch-mcpProject-only install:
claude mcp add --scope project jcodemunch uvx jcodemunch-mcpWith optional environment variables:
claude mcp add jcodemunch uvx jcodemunch-mcp \
-e GITHUB_TOKEN=ghp_... \
-e ANTHROPIC_API_KEY=sk-ant-...Restart Claude Code afterward.
| Scope | Path |
|---|---|
| User | ~/.claude.json |
| Project | .claude/settings.json |
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}Config file location:
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Linux | ~/.config/claude/claude_desktop_config.json |
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
Minimal config:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"]
}
}
}With optional GitHub auth and AI summaries:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}-
GITHUB_TOKENEnables private repos and higher GitHub API limits. -
ANTHROPIC_API_KEYEnables AI-generated summaries via Claude. -
ANTHROPIC_MODELOverrides the default Anthropic model. -
GOOGLE_API_KEYEnables AI-generated summaries via Gemini if Anthropic is not configured. -
GOOGLE_MODELOverrides the default Gemini model. -
JCODEMUNCH_SUMMARIZER_PROVIDERForces the AI summarizer provider. Supported values:anthropic,gemini,openai,minimax,glm,none. If unset,jcodemunch-mcpauto-detects by API keys in this order: Anthropic, Gemini, OpenAI-compatible, MiniMax, GLM-5. -
OPENAI_API_BASEEnables OpenAI-compatible summaries against a local or custom endpoint when no higher-priority provider is configured or whenJCODEMUNCH_SUMMARIZER_PROVIDER=openai. -
allow_remote_summarizerControls whether OpenAI-compatible endpoints may point to non-localhost hosts. The default isfalse, which means endpoints such ashttp://127.0.0.1:11434/v1work, but remote hosts such ashttps://api.minimax.io/v1are blocked. When blocked, jcodemunch does not send code to that provider and falls back to docstring or signature-based summaries. Setallow_remote_summarizer: trueinconfig.jsoncwhen you intentionally want to use a hosted OpenAI-compatible provider. -
MINIMAX_API_KEYEnables AI-generated summaries via MiniMax using the default modelminimax-m2.7. -
ZHIPUAI_API_KEYEnables AI-generated summaries via GLM-5 using the default endpointhttps://api.z.ai/api/paas/v4/. -
JCODEMUNCH_PATH_MAPRemaps stored path prefixes so an index built on one machine can be reused on another without re-indexing. Format:orig1=new1,orig2=new2whereorigis the prefix as stored in the index (the path used at index time) andnewis the equivalent path on the current machine. Each pair is split on the last=, so=signs within path components are preserved. Pairs are comma-separated; path components containing commas are not supported. The first matching prefix wins — list more-specific prefixes before broader ones when they overlap.Example (Linux index reused on Windows):
JCODEMUNCH_PATH_MAP=/home/user/Dev=C:\Users\user\Dev -
JCODEMUNCH_CONTEXT_PROVIDERS=0Disables context-provider enrichment during indexing. -
JCODEMUNCH_EMBED_MODELActivates local embedding viasentence-transformers. Set to a model name such asall-MiniLM-L6-v2. Install the optional dep withpip install jcodemunch-mcp[semantic]. -
OPENAI_EMBED_MODELActivates OpenAI embedding (requiresOPENAI_API_KEYalso set). Example:text-embedding-3-small. -
GOOGLE_EMBED_MODELActivates Gemini embedding (requiresGOOGLE_API_KEYalso set). Example:models/text-embedding-004.
Restart Claude Desktop after saving.
If you need to troubleshoot indexing or server startup, use a log file instead of stderr:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": [
"jcodemunch-mcp",
"--log-level", "DEBUG",
"--log-file", "/tmp/jcodemunch.log"
]
}
}
}Open Settings → Tools & MCP → New MCP Server, then add:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"]
}
}
}Save and confirm the server starts successfully.
Add to .vscode/settings.json:
{
"mcp.servers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_..."
}
}
}
}- Open the Agent pane
- Click the
⋯menu - Choose MCP Servers → Manage MCP Servers
- Open View raw config
- Add:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}Restart the MCP server afterward.
Installing the server makes the tools available.
It does not guarantee your agent will stop opening giant files like a confused tourist with a flashlight.
Note: For a comprehensive guide on enforcing these rules through agent hooks and prompt policies, see AGENT_HOOKS.md.
Give it an instruction like this:
Use jcodemunch-mcp for code lookup whenever available. Prefer symbol search, outlines, and targeted retrieval over reading full files.That one sentence can be the difference between:
- “this is incredible” and
- “I installed it and saw no change”
index_repo: { "url": "fastapi/fastapi" }
get_repo_outline: { "repo": "fastapi/fastapi" }
get_file_tree: { "repo": "fastapi/fastapi", "path_prefix": "fastapi" }
get_file_outline: { "repo": "fastapi/fastapi", "file_path": "fastapi/main.py" }Use this when:
- you are new to a repo
- you want the lay of the land before reading code
- you want to avoid blind file spelunking
index_folder: { "path": "/home/user/myproject" }
resolve_repo: { "path": "/home/user/myproject" }
get_repo_outline: { "repo": "myproject" }
search_symbols: { "repo": "myproject", "query": "main" }Use this when:
- you want fast local indexing
- you are working on private code
- you want repeat retrieval without re-scanning the repo every time
When indexing local folders, jCodeMunch can detect ecosystem tools and enrich the index with domain-specific metadata. The current built-in example is dbt support, which can fold model descriptions, tags, and column metadata into summaries and search keywords. (GitHub)
search_symbols: { "repo": "owner/repo", "query": "authenticate", "kind": "function" }
get_symbol_source: { "repo": "owner/repo", "symbol_id": "src/auth.py::authenticate#function" }This is one of the core jCodeMunch loops:
- search
- identify the symbol
- fetch only that symbol
That is where a lot of the token savings come from.
get_file_outline: { "repo": "owner/repo", "file_path": "src/auth.py" }
get_symbol_source: {
"repo": "owner/repo",
"symbol_ids": [
"src/auth.py::AuthHandler.login#method",
"src/auth.py::AuthHandler.logout#method"
]
}Use get_file_outline first to see the API surface, then retrieve only the methods you care about.
search_text: {
"repo": "owner/repo",
"query": "TODO",
"file_pattern": "*.py",
"context_lines": 1
}Use this for:
- string literals
- comments
- configuration values
- weird text fragments
- anything that is not likely to appear as a symbol name
get_file_content: {
"repo": "owner/repo",
"file_path": "src/main.py",
"start_line": 20,
"end_line": 40
}This is useful when the thing you need is line-oriented rather than symbol-oriented.
get_symbol_source: {
"repo": "owner/repo",
"symbol_id": "src/main.py::process#function",
"verify": true
}Check _meta.content_verified in the response.
This tells you whether the retrieved source still matches the indexed version.
invalidate_cache: { "repo": "owner/repo" }
index_repo: { "url": "owner/repo" }Use this when:
- the index is stale
- the repo changed substantially
- you want a clean reset
For GitHub repos, newer builds also store the Git tree SHA so unchanged incremental re-index runs can return immediately instead of re-downloading the universe just to discover nothing changed. (GitHub)
Each symbol is indexed with structured metadata such as:
- signature
- kind
- qualified name
- one-line summary
- byte offsets into the original file
That lets jCodeMunch fetch the exact source later by byte offset rather than opening and re-parsing the entire file on every request. (GitHub)
Symbol IDs look like this:
{file_path}::{qualified_name}#{kind}
Examples:
src/main.py::UserService#class
src/main.py::UserService.login#method
src/utils.py::authenticate#function
config.py::MAX_RETRIES#constant
These IDs stay stable across re-indexing as long as path, qualified name, and kind stay the same. (GitHub)
| Tool | What it does | Key parameters |
|---|---|---|
index_repo |
Index a GitHub repository | url, incremental, use_ai_summaries, extra_ignore_patterns |
index_folder |
Index a local folder | path, incremental, use_ai_summaries, extra_ignore_patterns, follow_symlinks |
index_file |
Re-index one file — faster than index_folder for surgical updates |
path, use_ai_summaries, context_providers |
embed_repo |
Precompute and cache all symbol embeddings for semantic search in one pass (optional warm-up; embeddings are also computed lazily on first semantic query) | repo, batch_size, force |
list_repos |
List all indexed repositories | — |
resolve_repo |
Resolve a filesystem path to its repo ID — O(1) lookup, preferred over list_repos when you know the path |
path |
invalidate_cache |
Delete cached index and force a full re-index | repo |
| Tool | What it does | Key parameters |
|---|---|---|
suggest_queries |
Surface useful entry-point files, keywords, and example queries for an unfamiliar repo | repo |
get_repo_outline |
High-level overview: directories, file counts, language breakdown, symbol counts | repo |
get_file_tree |
Browse file structure, optionally filtered by path prefix | repo, path_prefix, include_summaries |
get_file_outline |
All symbols in a file with full signatures and summaries; supports batch via file_paths |
repo, file_path, file_paths |
| Tool | What it does | Key parameters |
|---|---|---|
get_symbol_source |
Retrieve symbol source: symbol_id (single, flat response) or symbol_ids[] (batch, {symbols,errors}); supports verify and context_lines |
repo, symbol_id, symbol_ids, verify, context_lines |
get_context_bundle |
Symbol + its imports + optional callers in one bundle; supports multi-symbol, Markdown output, and token budgeting (token_budget, budget_strategy: most_relevant/core_first/compact, include_budget_report) |
repo, symbol_id, symbol_ids, include_callers, output_format, token_budget, budget_strategy, include_budget_report |
get_ranked_context |
Query-driven token-budgeted context assembler — returns the best-fit symbols for a task, ranked by relevance + centrality and greedily packed to fit the budget | repo, query, token_budget, strategy, include_kinds, scope |
get_file_content |
Read cached file content, optionally sliced to a line range | repo, file_path, start_line, end_line |
| Tool | What it does | Key parameters |
|---|---|---|
search_symbols |
Search symbol index by name, signature, summary, or docstring; supports kind/language/file_pattern filters, fuzzy matching (fuzzy, fuzzy_threshold, max_edit_distance), centrality-aware ranking (sort_by: relevance/centrality/combined), and optional semantic/hybrid search (semantic, semantic_weight, semantic_only) |
repo, query, kind, language, file_pattern, max_results, token_budget, detail_level, fuzzy, sort_by, semantic |
search_text |
Full-text search across indexed file contents; supports regex, context lines, and optional semantic search | repo, query, is_regex, file_pattern, max_results, context_lines, semantic |
search_columns |
Search column metadata across dbt / SQLMesh / database catalog models | repo, query, model_pattern, max_results |
| Tool | What it does | Key parameters |
|---|---|---|
find_importers |
Find all files that import a given file; supports batch via file_paths; each result includes has_importers flag for spotting transitive dead-code chains |
repo, file_path, file_paths, max_results |
find_references |
Find all files that import or reference a given identifier; supports batch via identifiers |
repo, identifier, identifiers, max_results |
check_references |
Quick dead-code check: is an identifier referenced anywhere? Combines import + content search | repo, identifier, identifiers, search_content, max_content_results |
get_dependency_graph |
File-level dependency graph up to 3 hops; direction = imports, importers, or both | repo, file, direction, depth |
get_blast_radius |
Which files break if this symbol changes? Returns confirmed/potential impacted files, overall_risk_score, direct_dependents_count; set include_depth_scores=true for impact_by_depth grouped by BFS layer |
repo, symbol, depth, include_depth_scores |
get_symbol_importance |
Rank symbols by architectural centrality using PageRank or in-degree on the import graph; surfaces the most load-bearing symbols in a repo | repo, top_n, algorithm, scope |
find_dead_code |
Find symbols and files unreachable from any entry point via the import graph; entry points auto-detected (main, init, CLI decorators, etc.) | repo, granularity, min_confidence, include_tests, entry_point_patterns |
get_changed_symbols |
Map a git diff to affected symbols; detects added/modified/removed/renamed symbols between two commits; optionally includes blast radius per changed symbol | repo, since_sha, until_sha, include_blast_radius, max_blast_depth |
get_class_hierarchy |
Full inheritance chain (ancestors + descendants) across Python, TS, Java, C#, and more | repo, class_name |
get_related_symbols |
Symbols related to a given symbol via co-location, shared importers, and name-token overlap | repo, symbol_id, max_results |
get_symbol_diff |
Diff symbol sets of two indexed repo snapshots; detects added, removed, and changed symbols | repo_a, repo_b |
| Tool | What it does | Key parameters |
|---|---|---|
get_session_stats |
Token savings, cost avoided, and per-tool breakdown for the current session | — |
New / unfamiliar repo?
→ suggest_queries → get_repo_outline → get_file_tree
Looking for a symbol by name?
→ search_symbols (add kind= / language= / file_pattern= to narrow)
Typo or partial name? (fuzzy)
→ search_symbols(fuzzy=true)
Concept search — "database connection" when the code says "db_pool"?
→ search_symbols(semantic=true) (requires embedding provider)
What are the most architecturally important symbols?
→ get_symbol_importance (PageRank on the import graph)
Get the best-fit context for a task without blowing the token budget?
→ get_ranked_context(query="...", token_budget=4000)
Looking for text, strings, or comments?
→ search_text (supports regex and context_lines)
Need to read a function or class?
→ get_file_outline → get_symbol_source
Need symbol + its imports in one shot?
→ get_context_bundle (add token_budget= to cap size)
What imports this file?
→ find_importers
Where is this identifier used?
→ find_references (or check_references for a quick yes/no)
What breaks if I change this symbol?
→ get_blast_radius(include_depth_scores=true) → find_importers
What symbols actually changed since the last commit?
→ get_changed_symbols (add include_blast_radius=true for downstream impact)
Is this code dead / unreachable?
→ find_dead_code (or check_references for a single identifier)
Class hierarchy?
→ get_class_hierarchy
File dependency graph?
→ get_dependency_graph
What changed between two repo snapshots?
→ get_symbol_diff
Database column search (dbt / SQLMesh)?
→ search_columns
search_symbols is not a naive grep dressed up in a fake mustache.
The search logic uses weighted scoring across things like:
- exact name match
- name substring match
- word overlap
- signature terms
- summary terms
- docstring and keyword matches
Filters like kind, language, and file_pattern narrow the field before scoring. Zero-score results are discarded. (GitHub)
Fuzzy matching — pass fuzzy=true to enable a trigram Jaccard + Levenshtein fallback that fires when BM25 confidence is low. Useful for typos or partial names (conn → connection_pool). Fuzzy results include match_type, fuzzy_similarity, and edit_distance fields. Zero behavioral change when fuzzy=false (default).
Centrality-aware ranking — pass sort_by="centrality" to rank results by PageRank on the import graph, or sort_by="combined" to blend BM25 and PageRank. Default stays "relevance" (pure BM25).
Semantic / hybrid search — pass semantic=true to enable embedding-based search alongside BM25. Requires a configured embedding provider (JCODEMUNCH_EMBED_MODEL, OPENAI_API_KEY + OPENAI_EMBED_MODEL, or GOOGLE_API_KEY + GOOGLE_EMBED_MODEL). semantic_weight controls the BM25/embedding blend (default 0.5). semantic_only=true skips BM25 entirely. Zero performance impact when semantic=false (default).
Practical takeaway:
- use a precise query when you know the symbol name
- add
kindwhen you know whether you want a function, class, method, etc. - use
file_patternorlanguagewhen a repo is large or polyglot - use
fuzzy=truefor typos, partials, or snake_case mismatches - use
semantic=truefor concept-level queries when you don't know the exact symbol name
jCodeMunch can produce very large token savings because it changes the workflow from:
read everything to find something
to:
find something, then read only that
Typical task categories in the project’s own token-savings material show very large reductions for repo exploration, finding specific functions, and reading targeted implementations. (GitHub)
But keep the mental model honest:
- savings happen when the agent actually uses targeted retrieval
- savings are strongest in retrieval-heavy workflows
- installing the MCP is not the same as changing agent behavior
That is why onboarding and prompting matter.
If you use Claude Code, you can surface a running savings counter in the status line.
Example:
Claude Sonnet 4.6 | my-project | ░░░░░░░░░░ 0% | 1,280,837 tkns saved · $6.40 saved on Opus
The data comes from:
~/.code-index/_savings.json
It tracks cumulative token savings and can be used to estimate avoided cost at a given model rate.
By default, jCodeMunch can contribute an anonymous savings delta to a global counter.
Only two values are sent:
- token savings delta
- a random anonymous install ID
No code, repo names, file paths, or identifying project data are transmitted, according to the guide.
To disable it:
{
"mcpServers": {
"jcodemunch": {
"command": "uvx",
"args": ["jcodemunch-mcp"],
"env": {
"JCODEMUNCH_SHARE_SAVINGS": "0"
}
}
}
}You can generate summaries with a local OpenAI-compatible server such as LM Studio by setting:
"env": {
"OPENAI_API_BASE": "http://127.0.0.1:1234/v1",
"OPENAI_MODEL": "qwen/qwen3-8b",
"OPENAI_API_KEY": "local-llm"
}Useful tuning knobs:
OPENAI_CONCURRENCYOPENAI_BATCH_SIZEOPENAI_MAX_TOKENS
For hosted OpenAI-compatible providers, use explicit provider selection instead:
"env": {
"JCODEMUNCH_SUMMARIZER_PROVIDER": "minimax",
"MINIMAX_API_KEY": "..."
}"env": {
"JCODEMUNCH_SUMMARIZER_PROVIDER": "glm",
"ZHIPUAI_API_KEY": "..."
}If you document this section, I would keep it framed as optional power-user tuning, not required setup.
By default, indexes live under:
~/.code-index/
Typical layout:
~/.code-index/
├── owner-repo.json
└── owner-repo/
└── src/main.py
The JSON index stores metadata, hashes, and symbol records. Raw files are stored separately for precise later retrieval. (GitHub)
Use owner/repo or a full GitHub URL. For private repos, set GITHUB_TOKEN.
The repo may not contain supported source files, or everything useful may have been excluded by skip patterns.
Set GITHUB_TOKEN to increase GitHub API limits.
Set one of ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_BASE, MINIMAX_API_KEY, or ZHIPUAI_API_KEY. You can also force a specific provider with JCODEMUNCH_SUMMARIZER_PROVIDER. Without a configured provider, summaries fall back to docstrings or signatures.
Use invalidate_cache followed by a fresh index_repo or index_folder.
Use uvx, or configure the absolute path to jcodemunch-mcp.
Do not log to stderr during stdio MCP sessions. Use --log-file or JCODEMUNCH_LOG_FILE instead. (GitHub)
- Start with
suggest_querieson any unfamiliar repo, thenget_repo_outline. - Use
get_file_outlinebefore pulling source — see API surface before reading code. - Use
search_symbolsbeforeget_file_contentwhenever possible. - Use
get_symbol_sourcewithsymbol_ids[]orget_context_bundlefor related items instead of repeated single-symbol calls. - Use
search_textfor comments, strings, and non-symbol content. - Use
verify: truewhen freshness matters. - Re-index when the codebase changes materially. Use
index_filefor single-file updates. - Tell your agent to prefer jCodeMunch, or it may fall back to old brute-force habits.
Good:
- “Use jcodemunch to locate the authentication flow.”
- “Start with the repo outline, then find the class responsible for retries.”
- “Use symbol search instead of reading full files.”
- “Retrieve only the exact methods related to billing.”
- “Verify the symbol before quoting the implementation.”
Bad:
- “Read the whole repo and tell me what it does.”
- “Open every likely file.”
- “Search manually through source until you find it.”
You are trying to teach the model to navigate, not rummage.
jCodeMunch works best when you treat it like a precision instrument, not a lucky rabbit’s foot.
Index the repo. Ask for outlines. Search by symbol. Retrieve narrowly. Batch related symbols. Re-index when needed. And most importantly, make your agent use the tools on purpose.
That is where the speed comes from. That is where the accuracy comes from. And that is where the ugly token bill finally starts to shrink.