Skip to content

Latest commit

 

History

History
896 lines (636 loc) · 28.4 KB

File metadata and controls

896 lines (636 loc) · 28.4 KB

jCodeMunch User Guide

What jCodeMunch actually does

jCodeMunch helps AI agents explore codebases without reading the whole damn file every time.

Most agents inspect repos the expensive way:

  1. open a large file
  2. skim hundreds or thousands of lines
  3. extract one useful function
  4. repeat somewhere else
  5. quietly set your token budget on fire

jCodeMunch replaces that with structured retrieval.

It indexes a repository once, extracts symbols with tree-sitter, stores metadata plus byte offsets into the original source, and lets your MCP-compatible agent retrieve only the code it actually needs. That is why token savings can be dramatic in retrieval-heavy workflows. :contentReference[oaicite:2]{index=2}

If you only remember one thing from this guide, make it this:

jCodeMunch is not magic because it is installed.
It is powerful because your agent uses it instead of brute-reading files.


1. Quick Start

Install

pip install jcodemunch-mcp

Verify the install:

jcodemunch-mcp --help

Recommended: use uvx in MCP clients

For MCP client configuration, uvx is usually the better choice because it runs the package on demand and avoids PATH headaches.


2. Add jCodeMunch to your MCP client

Claude Code

Fastest setup:

claude mcp add jcodemunch uvx jcodemunch-mcp

Project-only install:

claude mcp add --scope project jcodemunch uvx jcodemunch-mcp

With optional environment variables:

claude mcp add jcodemunch uvx jcodemunch-mcp \
  -e GITHUB_TOKEN=ghp_... \
  -e ANTHROPIC_API_KEY=sk-ant-...

Restart Claude Code afterward.

Manual Claude Code config

Scope Path
User ~/.claude.json
Project .claude/settings.json
{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"],
      "env": {
        "GITHUB_TOKEN": "ghp_...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Claude Desktop

Config file location:

OS Path
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Linux ~/.config/claude/claude_desktop_config.json
Windows %APPDATA%\Claude\claude_desktop_config.json

Minimal config:

{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"]
    }
  }
}

With optional GitHub auth and AI summaries:

{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"],
      "env": {
        "GITHUB_TOKEN": "ghp_...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Optional environment variables

  • GITHUB_TOKEN Enables private repos and higher GitHub API limits.

  • ANTHROPIC_API_KEY Enables AI-generated summaries via Claude.

  • ANTHROPIC_MODEL Overrides the default Anthropic model.

  • GOOGLE_API_KEY Enables AI-generated summaries via Gemini if Anthropic is not configured.

  • GOOGLE_MODEL Overrides the default Gemini model.

  • JCODEMUNCH_SUMMARIZER_PROVIDER Forces the AI summarizer provider. Supported values: anthropic, gemini, openai, minimax, glm, none. If unset, jcodemunch-mcp auto-detects by API keys in this order: Anthropic, Gemini, OpenAI-compatible, MiniMax, GLM-5.

  • OPENAI_API_BASE Enables OpenAI-compatible summaries against a local or custom endpoint when no higher-priority provider is configured or when JCODEMUNCH_SUMMARIZER_PROVIDER=openai.

  • allow_remote_summarizer Controls whether OpenAI-compatible endpoints may point to non-localhost hosts. The default is false, which means endpoints such as http://127.0.0.1:11434/v1 work, but remote hosts such as https://api.minimax.io/v1 are blocked. When blocked, jcodemunch does not send code to that provider and falls back to docstring or signature-based summaries. Set allow_remote_summarizer: true in config.jsonc when you intentionally want to use a hosted OpenAI-compatible provider.

  • MINIMAX_API_KEY Enables AI-generated summaries via MiniMax using the default model minimax-m2.7.

  • ZHIPUAI_API_KEY Enables AI-generated summaries via GLM-5 using the default endpoint https://api.z.ai/api/paas/v4/.

  • JCODEMUNCH_PATH_MAP Remaps stored path prefixes so an index built on one machine can be reused on another without re-indexing. Format: orig1=new1,orig2=new2 where orig is the prefix as stored in the index (the path used at index time) and new is the equivalent path on the current machine. Each pair is split on the last =, so = signs within path components are preserved. Pairs are comma-separated; path components containing commas are not supported. The first matching prefix wins — list more-specific prefixes before broader ones when they overlap.

    Example (Linux index reused on Windows):

    JCODEMUNCH_PATH_MAP=/home/user/Dev=C:\Users\user\Dev
    
  • JCODEMUNCH_CONTEXT_PROVIDERS=0 Disables context-provider enrichment during indexing.

  • JCODEMUNCH_EMBED_MODEL Activates local embedding via sentence-transformers. Set to a model name such as all-MiniLM-L6-v2. Install the optional dep with pip install jcodemunch-mcp[semantic].

  • OPENAI_EMBED_MODEL Activates OpenAI embedding (requires OPENAI_API_KEY also set). Example: text-embedding-3-small.

  • GOOGLE_EMBED_MODEL Activates Gemini embedding (requires GOOGLE_API_KEY also set). Example: models/text-embedding-004.

Restart Claude Desktop after saving.

Debug logging

If you need to troubleshoot indexing or server startup, use a log file instead of stderr:

{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": [
        "jcodemunch-mcp",
        "--log-level", "DEBUG",
        "--log-file", "/tmp/jcodemunch.log"
      ]
    }
  }
}

Cursor

Open Settings → Tools & MCP → New MCP Server, then add:

{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"]
    }
  }
}

Save and confirm the server starts successfully.


VS Code

Add to .vscode/settings.json:

{
  "mcp.servers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"],
      "env": {
        "GITHUB_TOKEN": "ghp_..."
      }
    }
  }
}

Google Antigravity

  1. Open the Agent pane
  2. Click the menu
  3. Choose MCP ServersManage MCP Servers
  4. Open View raw config
  5. Add:
{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"],
      "env": {
        "GITHUB_TOKEN": "ghp_...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Restart the MCP server afterward.


3. The step people skip, then blame the software for

Tell your agent to use jCodeMunch

Installing the server makes the tools available.

It does not guarantee your agent will stop opening giant files like a confused tourist with a flashlight.

Note: For a comprehensive guide on enforcing these rules through agent hooks and prompt policies, see AGENT_HOOKS.md.

Give it an instruction like this:

Use jcodemunch-mcp for code lookup whenever available. Prefer symbol search, outlines, and targeted retrieval over reading full files.

That one sentence can be the difference between:

  • “this is incredible” and
  • “I installed it and saw no change”

4. Your first useful workflows

Explore a GitHub repository

index_repo: { "url": "fastapi/fastapi" }
get_repo_outline: { "repo": "fastapi/fastapi" }
get_file_tree: { "repo": "fastapi/fastapi", "path_prefix": "fastapi" }
get_file_outline: { "repo": "fastapi/fastapi", "file_path": "fastapi/main.py" }

Use this when:

  • you are new to a repo
  • you want the lay of the land before reading code
  • you want to avoid blind file spelunking

Explore a local project

index_folder: { "path": "/home/user/myproject" }
resolve_repo: { "path": "/home/user/myproject" }
get_repo_outline: { "repo": "myproject" }
search_symbols: { "repo": "myproject", "query": "main" }

Use this when:

  • you want fast local indexing
  • you are working on private code
  • you want repeat retrieval without re-scanning the repo every time

Local-folder enrichment

When indexing local folders, jCodeMunch can detect ecosystem tools and enrich the index with domain-specific metadata. The current built-in example is dbt support, which can fold model descriptions, tags, and column metadata into summaries and search keywords. (GitHub)


Find and read a function

search_symbols: { "repo": "owner/repo", "query": "authenticate", "kind": "function" }
get_symbol_source: { "repo": "owner/repo", "symbol_id": "src/auth.py::authenticate#function" }

This is one of the core jCodeMunch loops:

  1. search
  2. identify the symbol
  3. fetch only that symbol

That is where a lot of the token savings come from.


Understand a class without reading the entire file

get_file_outline: { "repo": "owner/repo", "file_path": "src/auth.py" }
get_symbol_source: {
  "repo": "owner/repo",
  "symbol_ids": [
    "src/auth.py::AuthHandler.login#method",
    "src/auth.py::AuthHandler.logout#method"
  ]
}

Use get_file_outline first to see the API surface, then retrieve only the methods you care about.


Search for text that is not a symbol

search_text: {
  "repo": "owner/repo",
  "query": "TODO",
  "file_pattern": "*.py",
  "context_lines": 1
}

Use this for:

  • string literals
  • comments
  • configuration values
  • weird text fragments
  • anything that is not likely to appear as a symbol name

Read only part of a file

get_file_content: {
  "repo": "owner/repo",
  "file_path": "src/main.py",
  "start_line": 20,
  "end_line": 40
}

This is useful when the thing you need is line-oriented rather than symbol-oriented.


Verify source has not drifted

get_symbol_source: {
  "repo": "owner/repo",
  "symbol_id": "src/main.py::process#function",
  "verify": true
}

Check _meta.content_verified in the response.

This tells you whether the retrieved source still matches the indexed version.


Force a re-index

invalidate_cache: { "repo": "owner/repo" }
index_repo: { "url": "owner/repo" }

Use this when:

  • the index is stale
  • the repo changed substantially
  • you want a clean reset

For GitHub repos, newer builds also store the Git tree SHA so unchanged incremental re-index runs can return immediately instead of re-downloading the universe just to discover nothing changed. (GitHub)


5. Core mental model

What jCodeMunch stores

Each symbol is indexed with structured metadata such as:

  • signature
  • kind
  • qualified name
  • one-line summary
  • byte offsets into the original file

That lets jCodeMunch fetch the exact source later by byte offset rather than opening and re-parsing the entire file on every request. (GitHub)

Stable symbol IDs

Symbol IDs look like this:

{file_path}::{qualified_name}#{kind}

Examples:

src/main.py::UserService#class
src/main.py::UserService.login#method
src/utils.py::authenticate#function
config.py::MAX_RETRIES#constant

These IDs stay stable across re-indexing as long as path, qualified name, and kind stay the same. (GitHub)


6. Tool reference

Indexing & Repository Management

Tool What it does Key parameters
index_repo Index a GitHub repository url, incremental, use_ai_summaries, extra_ignore_patterns
index_folder Index a local folder path, incremental, use_ai_summaries, extra_ignore_patterns, follow_symlinks
index_file Re-index one file — faster than index_folder for surgical updates path, use_ai_summaries, context_providers
embed_repo Precompute and cache all symbol embeddings for semantic search in one pass (optional warm-up; embeddings are also computed lazily on first semantic query) repo, batch_size, force
list_repos List all indexed repositories
resolve_repo Resolve a filesystem path to its repo ID — O(1) lookup, preferred over list_repos when you know the path path
invalidate_cache Delete cached index and force a full re-index repo

Discovery & Outlines

Tool What it does Key parameters
suggest_queries Surface useful entry-point files, keywords, and example queries for an unfamiliar repo repo
get_repo_outline High-level overview: directories, file counts, language breakdown, symbol counts repo
get_file_tree Browse file structure, optionally filtered by path prefix repo, path_prefix, include_summaries
get_file_outline All symbols in a file with full signatures and summaries; supports batch via file_paths repo, file_path, file_paths

Retrieval

Tool What it does Key parameters
get_symbol_source Retrieve symbol source: symbol_id (single, flat response) or symbol_ids[] (batch, {symbols,errors}); supports verify and context_lines repo, symbol_id, symbol_ids, verify, context_lines
get_context_bundle Symbol + its imports + optional callers in one bundle; supports multi-symbol, Markdown output, and token budgeting (token_budget, budget_strategy: most_relevant/core_first/compact, include_budget_report) repo, symbol_id, symbol_ids, include_callers, output_format, token_budget, budget_strategy, include_budget_report
get_ranked_context Query-driven token-budgeted context assembler — returns the best-fit symbols for a task, ranked by relevance + centrality and greedily packed to fit the budget repo, query, token_budget, strategy, include_kinds, scope
get_file_content Read cached file content, optionally sliced to a line range repo, file_path, start_line, end_line

Search

Tool What it does Key parameters
search_symbols Search symbol index by name, signature, summary, or docstring; supports kind/language/file_pattern filters, fuzzy matching (fuzzy, fuzzy_threshold, max_edit_distance), centrality-aware ranking (sort_by: relevance/centrality/combined), and optional semantic/hybrid search (semantic, semantic_weight, semantic_only) repo, query, kind, language, file_pattern, max_results, token_budget, detail_level, fuzzy, sort_by, semantic
search_text Full-text search across indexed file contents; supports regex, context lines, and optional semantic search repo, query, is_regex, file_pattern, max_results, context_lines, semantic
search_columns Search column metadata across dbt / SQLMesh / database catalog models repo, query, model_pattern, max_results

Relationship & Impact Analysis

Tool What it does Key parameters
find_importers Find all files that import a given file; supports batch via file_paths; each result includes has_importers flag for spotting transitive dead-code chains repo, file_path, file_paths, max_results
find_references Find all files that import or reference a given identifier; supports batch via identifiers repo, identifier, identifiers, max_results
check_references Quick dead-code check: is an identifier referenced anywhere? Combines import + content search repo, identifier, identifiers, search_content, max_content_results
get_dependency_graph File-level dependency graph up to 3 hops; direction = imports, importers, or both repo, file, direction, depth
get_blast_radius Which files break if this symbol changes? Returns confirmed/potential impacted files, overall_risk_score, direct_dependents_count; set include_depth_scores=true for impact_by_depth grouped by BFS layer repo, symbol, depth, include_depth_scores
get_symbol_importance Rank symbols by architectural centrality using PageRank or in-degree on the import graph; surfaces the most load-bearing symbols in a repo repo, top_n, algorithm, scope
find_dead_code Find symbols and files unreachable from any entry point via the import graph; entry points auto-detected (main, init, CLI decorators, etc.) repo, granularity, min_confidence, include_tests, entry_point_patterns
get_changed_symbols Map a git diff to affected symbols; detects added/modified/removed/renamed symbols between two commits; optionally includes blast radius per changed symbol repo, since_sha, until_sha, include_blast_radius, max_blast_depth
get_class_hierarchy Full inheritance chain (ancestors + descendants) across Python, TS, Java, C#, and more repo, class_name
get_related_symbols Symbols related to a given symbol via co-location, shared importers, and name-token overlap repo, symbol_id, max_results
get_symbol_diff Diff symbol sets of two indexed repo snapshots; detects added, removed, and changed symbols repo_a, repo_b

Utilities

Tool What it does Key parameters
get_session_stats Token savings, cost avoided, and per-tool breakdown for the current session

Workflow patterns

New / unfamiliar repo?
  → suggest_queries → get_repo_outline → get_file_tree

Looking for a symbol by name?
  → search_symbols  (add kind= / language= / file_pattern= to narrow)

Typo or partial name? (fuzzy)
  → search_symbols(fuzzy=true)

Concept search — "database connection" when the code says "db_pool"?
  → search_symbols(semantic=true)  (requires embedding provider)

What are the most architecturally important symbols?
  → get_symbol_importance  (PageRank on the import graph)

Get the best-fit context for a task without blowing the token budget?
  → get_ranked_context(query="...", token_budget=4000)

Looking for text, strings, or comments?
  → search_text  (supports regex and context_lines)

Need to read a function or class?
  → get_file_outline → get_symbol_source

Need symbol + its imports in one shot?
  → get_context_bundle  (add token_budget= to cap size)

What imports this file?
  → find_importers

Where is this identifier used?
  → find_references  (or check_references for a quick yes/no)

What breaks if I change this symbol?
  → get_blast_radius(include_depth_scores=true) → find_importers

What symbols actually changed since the last commit?
  → get_changed_symbols  (add include_blast_radius=true for downstream impact)

Is this code dead / unreachable?
  → find_dead_code  (or check_references for a single identifier)

Class hierarchy?
  → get_class_hierarchy

File dependency graph?
  → get_dependency_graph

What changed between two repo snapshots?
  → get_symbol_diff

Database column search (dbt / SQLMesh)?
  → search_columns

7. How search works

search_symbols is not a naive grep dressed up in a fake mustache.

The search logic uses weighted scoring across things like:

  • exact name match
  • name substring match
  • word overlap
  • signature terms
  • summary terms
  • docstring and keyword matches

Filters like kind, language, and file_pattern narrow the field before scoring. Zero-score results are discarded. (GitHub)

Fuzzy matching — pass fuzzy=true to enable a trigram Jaccard + Levenshtein fallback that fires when BM25 confidence is low. Useful for typos or partial names (connconnection_pool). Fuzzy results include match_type, fuzzy_similarity, and edit_distance fields. Zero behavioral change when fuzzy=false (default).

Centrality-aware ranking — pass sort_by="centrality" to rank results by PageRank on the import graph, or sort_by="combined" to blend BM25 and PageRank. Default stays "relevance" (pure BM25).

Semantic / hybrid search — pass semantic=true to enable embedding-based search alongside BM25. Requires a configured embedding provider (JCODEMUNCH_EMBED_MODEL, OPENAI_API_KEY + OPENAI_EMBED_MODEL, or GOOGLE_API_KEY + GOOGLE_EMBED_MODEL). semantic_weight controls the BM25/embedding blend (default 0.5). semantic_only=true skips BM25 entirely. Zero performance impact when semantic=false (default).

Practical takeaway:

  • use a precise query when you know the symbol name
  • add kind when you know whether you want a function, class, method, etc.
  • use file_pattern or language when a repo is large or polyglot
  • use fuzzy=true for typos, partials, or snake_case mismatches
  • use semantic=true for concept-level queries when you don't know the exact symbol name

8. Token savings: what it means and what it does not mean

jCodeMunch can produce very large token savings because it changes the workflow from:

read everything to find something

to:

find something, then read only that

Typical task categories in the project’s own token-savings material show very large reductions for repo exploration, finding specific functions, and reading targeted implementations. (GitHub)

But keep the mental model honest:

  • savings happen when the agent actually uses targeted retrieval
  • savings are strongest in retrieval-heavy workflows
  • installing the MCP is not the same as changing agent behavior

That is why onboarding and prompting matter.


9. Live token savings counter

If you use Claude Code, you can surface a running savings counter in the status line.

Example:

Claude Sonnet 4.6 | my-project | ░░░░░░░░░░ 0% | 1,280,837 tkns saved · $6.40 saved on Opus

The data comes from:

~/.code-index/_savings.json

It tracks cumulative token savings and can be used to estimate avoided cost at a given model rate.


10. Community savings meter

By default, jCodeMunch can contribute an anonymous savings delta to a global counter.

Only two values are sent:

  • token savings delta
  • a random anonymous install ID

No code, repo names, file paths, or identifying project data are transmitted, according to the guide.

To disable it:

{
  "mcpServers": {
    "jcodemunch": {
      "command": "uvx",
      "args": ["jcodemunch-mcp"],
      "env": {
        "JCODEMUNCH_SHARE_SAVINGS": "0"
      }
    }
  }
}

11. Local LLM tuning for summaries

You can generate summaries with a local OpenAI-compatible server such as LM Studio by setting:

"env": {
  "OPENAI_API_BASE": "http://127.0.0.1:1234/v1",
  "OPENAI_MODEL": "qwen/qwen3-8b",
  "OPENAI_API_KEY": "local-llm"
}

Useful tuning knobs:

  • OPENAI_CONCURRENCY
  • OPENAI_BATCH_SIZE
  • OPENAI_MAX_TOKENS

For hosted OpenAI-compatible providers, use explicit provider selection instead:

"env": {
  "JCODEMUNCH_SUMMARIZER_PROVIDER": "minimax",
  "MINIMAX_API_KEY": "..."
}
"env": {
  "JCODEMUNCH_SUMMARIZER_PROVIDER": "glm",
  "ZHIPUAI_API_KEY": "..."
}

If you document this section, I would keep it framed as optional power-user tuning, not required setup.


12. Storage and indexing

By default, indexes live under:

~/.code-index/

Typical layout:

~/.code-index/
├── owner-repo.json
└── owner-repo/
    └── src/main.py

The JSON index stores metadata, hashes, and symbol records. Raw files are stored separately for precise later retrieval. (GitHub)


13. Troubleshooting

“Repository not found”

Use owner/repo or a full GitHub URL. For private repos, set GITHUB_TOKEN.

“No source files found”

The repo may not contain supported source files, or everything useful may have been excluded by skip patterns.

Rate limiting

Set GITHUB_TOKEN to increase GitHub API limits.

AI summaries are missing

Set one of ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_BASE, MINIMAX_API_KEY, or ZHIPUAI_API_KEY. You can also force a specific provider with JCODEMUNCH_SUMMARIZER_PROVIDER. Without a configured provider, summaries fall back to docstrings or signatures.

The index seems stale

Use invalidate_cache followed by a fresh index_repo or index_folder.

The client cannot find the executable

Use uvx, or configure the absolute path to jcodemunch-mcp.

Debug logs broke my MCP client

Do not log to stderr during stdio MCP sessions. Use --log-file or JCODEMUNCH_LOG_FILE instead. (GitHub)


14. Best practices

  1. Start with suggest_queries on any unfamiliar repo, then get_repo_outline.
  2. Use get_file_outline before pulling source — see API surface before reading code.
  3. Use search_symbols before get_file_content whenever possible.
  4. Use get_symbol_source with symbol_ids[] or get_context_bundle for related items instead of repeated single-symbol calls.
  5. Use search_text for comments, strings, and non-symbol content.
  6. Use verify: true when freshness matters.
  7. Re-index when the codebase changes materially. Use index_file for single-file updates.
  8. Tell your agent to prefer jCodeMunch, or it may fall back to old brute-force habits.

15. Best practices for prompting the agent

Good:

  • “Use jcodemunch to locate the authentication flow.”
  • “Start with the repo outline, then find the class responsible for retries.”
  • “Use symbol search instead of reading full files.”
  • “Retrieve only the exact methods related to billing.”
  • “Verify the symbol before quoting the implementation.”

Bad:

  • “Read the whole repo and tell me what it does.”
  • “Open every likely file.”
  • “Search manually through source until you find it.”

You are trying to teach the model to navigate, not rummage.


16. Final advice

jCodeMunch works best when you treat it like a precision instrument, not a lucky rabbit’s foot.

Index the repo. Ask for outlines. Search by symbol. Retrieve narrowly. Batch related symbols. Re-index when needed. And most importantly, make your agent use the tools on purpose.

That is where the speed comes from. That is where the accuracy comes from. And that is where the ugly token bill finally starts to shrink.