All notable changes to jcodemunch-mcp are documented here.
- Lazy tool imports — all 45 tool module imports in
server.pyare now deferred to the firstcall_tool()dispatch for each tool. Previously, importingserver.pyloaded every tool module (and their transitive dependencies: tree-sitter, httpx, pathspec, subprocess wrappers) regardless of which tools the session actually uses. Now only 7 tool modules load at startup (via the watcher'sindex_folderchain). Tools not called in a session are never imported. This reduces cold-start overhead for query-only sessions that never trigger indexing. _build_tools_list()helper —list_tools()now delegates to a named_build_tools_list()function, making the tool list construction easier to test and reason about independently of the MCP decorator.- Test patch targets updated — tests that previously patched
jcodemunch_mcp.server.xxx(wherexxxis a tool function) now correctly patchjcodemunch_mcp.tools.xxx_module.xxx_func, which is where the name is looked up during dispatch. This follows Python'sunittest.mock.patchbest practice: patch where the name is looked up, not where it is defined. - No API or output schema changes. Zero new tools, zero removed tools, zero field changes.
assessmentfield onget_hotspotsentries — each hotspot now includesassessment: "low" | "medium" | "high"based onhotspot_scorethresholds (low ≤ 3, medium ≤ 10, high > 10). Allows an LLM to relay findings directly without interpreting the raw score.architecture.layersdocumented in README — the.jcodemunch.jsoncreference now includes the fullarchitectureblock schema with a worked example for a typical layered Python project (api → service → repo → db). Used byget_layer_violations.- 2 new tests (1624 total, 7 skipped):
test_assessment_field_present,test_high_complexity_no_churn_is_low.
- Session-level LRU result cache —
get_blast_radiusandfind_references(single-identifier mode) now cache their results for the duration of the MCP session. Repeated calls with the same arguments return instantly from the in-process cache with_meta.cache_hit: trueinstead of re-running the expensive BFS traversal and file-content scans. Cache is a 256-entry LRU (OrderedDict); oldest entries are evicted first. Thread-safe via the existing_Statelock. - Automatic cache invalidation — the result cache is cleared after any
index_repo,index_folder,index_file, orinvalidate_cachecall so stale results are never served after re-indexing. get_session_stats—result_cachefield — the existingget_session_statstool now includes aresult_cachesection:{total_hits, total_misses, hit_rate, cached_entries}. Useful for tuning and for verifying that the cache is working in real sessions.- 18 new tests (1622 total, 7 skipped):
test_result_cache.pycovers get/put, hit/miss counters, by-tool breakdown, invalidation (all-repos and repo-specific), LRU eviction at maxsize, and theresult_cachefield inget_session_stats.
get_symbol_complexity(symbol_id)— returns cyclomatic complexity, max nesting depth, parameter count, line count, and a human-readableassessment("low" / "medium" / "high") for any indexed function or method. Data is read directly from the index (no re-parsing); requires INDEX_VERSION 7 (jcodemunch-mcp >= 1.16).get_churn_rate(target, days=90)— returns git commit count, unique authors, first-seen date, last-modified date, andchurn_per_weekfor a file or symbol over a configurable look-back window.assessmentfield: "stable" (≤1/week), "active" (≤3/week), "volatile" (>3/week). Accepts a relative file path or a symbol ID. Requires a locally indexed repo.get_hotspots(top_n=20, days=90, min_complexity=2)— ranks functions and methods byhotspot_score = cyclomatic × log(1 + commits_last_N_days). Surfaces code that is both complex and frequently changed — the highest bug-introduction risk in the repo. Identical methodology to Adam Tornhill's CodeScene hotspot analysis. Falls back gracefully when git is unavailable (complexity-only scoring).get_repo_health(days=90)— one-call triage snapshot: total files/symbols, dead-code %, average cyclomatic complexity, top-5 hotspots, dependency cycle count, and unstable module count. Produces asummarystring suitable for immediate relay. Designed to be the first tool called in any new session. Thin aggregator — delegates to individual tools, no duplicated logic.- Bug fix: complexity data now correctly persisted through
save_index— the symbol serialization dict insave_indexwas missingcyclomatic,max_nesting, andparam_countfields (they were computed by the parser but silently dropped before DB write). Fixed by including these fields in the serialized dict. All tools depending on complexity data (get_extraction_candidates,get_symbol_complexity,get_hotspots) now return accurate values after a freshindex_folder. - 36 new tests (1604 total, 7 skipped):
test_symbol_complexity.py,test_churn_rate.py,test_hotspots.py,test_repo_health.py.
check_rename_safe(symbol_id, new_name)— new tool that detects name collisions before renaming a symbol. Scans the symbol's defining file and every file that imports it, checking for an existing symbol already using the proposed new name. Returns{safe, conflicts, checked_files}. Use before any rename/refactor to avoid silent breakage.get_dead_code_v2()— enhanced dead-code detection with three independent evidence signals per function/method: (1) the symbol's file is not reachable from any entry point via the import graph, (2) no indexed symbol calls this symbol in the call graph, (3) the symbol name is not re-exported from any__init__or barrel file. Each result includes aconfidencescore (0.33 = 1 signal, 0.67 = 2 signals, 1.0 = all 3). More reliable than single-signal detection. Acceptsmin_confidence(default 0.5) andinclude_testsparameters.get_extraction_candidates(file_path, min_complexity, min_callers)— new tool that identifies functions worth extracting to a shared module. A candidate must have high cyclomatic complexity (doing a lot) AND be called from multiple other files (already implicitly shared). Results ranked byscore = cyclomatic × caller_file_count.- Complexity metrics stored at index time —
INDEX_VERSIONbumped from 6 to 7. Three new fields per symbol (functions and methods only):cyclomatic(McCabe complexity),max_nesting(bracket-nesting depth),param_count. Computed from symbol body text at index time viaparser/complexity.py. Existing indexes are automatically migrated (columns added as NULL; re-index to populate). Consumed byget_extraction_candidates. - 37 new tests (1568 total, 7 skipped):
test_complexity.py,test_check_rename_safe.py,test_dead_code_v2.py,test_extraction_candidates.py.
INDEX_VERSIONis now 7 (was 6). Re-index required to populate complexity fields; existing indexes load and operate correctly with complexity = 0.
config --upgrade— new CLI flag that adds missing keys from the current version's template into an existingconfig.jsonc, preserving all user-set values. Useful after upgrading jcodemunch-mcp to a newer version that introduces new config keys. Updates the"version"field automatically and reports which keys were injected. Addresses the gap implied by the"version"field / "additive migrations" comment inconfig.jsonc. Requested by nikolai-vysotskyi in issue #191.
summarize_repo(repo, force)— new MCP tool that re-runs AI summarization on all symbols in an existing index. Useful whenindex_foldercompleted without AI summaries (deferred background thread was interrupted, AI was disabled at index time, or the provider wasn't configured). Withforce=true, clears all existing summaries and re-runs the full 3-tier pipeline (docstring → AI → signature fallback). Returns{success, symbol_count, updated, skipped, duration_seconds}. Reported by nikolai-vysotskyi in issue #190.- AI summarization progress logging —
summarize_batch(bothBaseSummarizerandOpenAIBatchSummarizer) now logs progress at INFO level every ~10% of batches:"AI summarization: N/M symbols (P%)". Start and completion are also logged. Previously there was zero feedback during 10–30 minute summarization runs on large codebases. summarization_deferredfield inindex_folderresponse — when the watcher-driven fast path fires a background summarization thread, the response now includes"summarization_deferred": trueand a note suggestingsummarize_repoas a synchronous fallback.
- Deferred summarization thread logging promoted to INFO — thread start (
"Deferred AI summarization started for owner/repo (N symbols)") and completion ("Deferred AI summarization saved N symbols for owner/repo") are now logged at INFO instead of DEBUG, making them visible in default logging configurations.
- Empty-array false positive in singular/batch mode detection —
get_symbol_source,find_references,check_references,find_importers, andget_file_outlineeach support a singular param (e.g.symbol_id) and a batch param (e.g.symbol_ids). Some MCP clients (observed with OpenCode + GPT codex) pass the batch param as an empty array[]even when invoking singular mode. Since[] is not NoneisTrue, the mutual-exclusivity guard fired and returned"Provide symbol_id or symbol_ids, not both."/"Internal error processing find_references". Fixed by normalizing empty lists toNonebefore the guard check in all five tools. Reported by razorree in issue #189.
get_dependency_cycles()— new tool detecting circular import chains in the repository. Uses Kosaraju's algorithm (iterative, no recursion limit) on the file-level import graph. Returns each strongly-connected component (set of files mutually reachable via imports) as a cycle. Useful for finding architectural problems and test-isolation blockers.get_coupling_metrics(module_path)— new tool returning afferent coupling (Ca, how many files import this module), efferent coupling (Ce, how many files this module imports), instability score I = Ce/(Ca+Ce), and a human-readableassessment("stable" | "neutral" | "unstable" | "isolated"). Identifies fragile modules and guides refactoring priorities.get_layer_violations(rules?)— new tool validating inter-module imports against declared architectural layer boundaries. Reports every import that crosses a forbidden boundary. Rules can be passed directly or defined in.jcodemunch.jsoncunderarchitecture.layers. Output includesfile,file_layer,import_target,target_layer,rule_violatedper violation.architectureconfig key — new.jcodemunch.jsonc/ global config key (type: dict) for per-project layer definitions. Structure:{"layers": [{"name": str, "paths": [str], "may_not_import": [str]}]}. Consumed byget_layer_violationswhen no inlinerulesare provided.- 36 new tests (1527 total, 9 skipped) in
tests/test_architecture_tools.py.
get_call_hierarchy(symbol_id, direction, depth)— new tool returning incoming callers and outgoing callees for any indexed symbol, N levels deep (default 3). Uses AST-derived detection: callers = symbols in importing files whose bodies mention the name; callees = imported symbols mentioned in the symbol's source body. No LSP required. Results include{id, name, kind, file, line, depth}per entry andsource: "ast"in_meta.get_impact_preview(symbol_id)— new tool answering "what breaks if I delete or rename this?". DFS over the call graph transitively, returns all affected symbols grouped by file (affected_by_file) with call-chain paths (call_chains) showing how each symbol is reached from the target._call_graph.py— shared internal module withfind_direct_callers,find_direct_callees,bfs_callers,bfs_calleesused by all call-graph tools.
get_blast_radius— new optionalcall_depthparam (default 0, disabled). Whencall_depth > 0, addscallerslist of symbols that actually call the target symbol (call-level analysis) alongside the existing import-levelconfirmed/potentiallists. All existing fields unchanged; fully backwards-compatible.find_references— new optionalinclude_call_chainparam (default false, singular mode only). When true, each reference entry gainscalling_symbols: symbols in that file whose source bodies mention the identifier. Batch mode ignores this flag.
- Per-project language config ignored during parsing —
parse_file()was callingis_language_enabled(language)without forwarding therepopath, so it always consulted the global config and never the per-project.jcodemunch.jsonc. Projects that declared their own"languages"list gotsymbol_count: 0when the global config had"languages": [](the recommended default). Fixed by threadingrepofrom everyparse_filecall site (index_folder,index_file,get_changed_symbols, and all three pipeline functions in_indexing_pipeline) down to the language-gate check.index_repois unaffected (remote repos have no local project config). Reported and root-caused by AmaralVini in issue #187.
get_repo_outline2-level directory grouping for large repos — when a repository has more than 500 indexed files,directoriesnow groups by two path components (e.g.,src/api/,src/models/) instead of only the top-level directory. Results are capped at 40 entries (highest file-count dirs first). Small repos (≤ 500 files) retain the existing 1-level behavior. Agents navigating large monorepos get actionable directory hints rather than a single coarse bucket.
- Cross-repository dependency tracking — import graph tools (
find_importers,get_blast_radius,get_dependency_graph,get_changed_symbols) now accept an opt-incross_repo: boolparameter (defaultfalse). When enabled, the tools traverse repo boundaries using a package registry built from manifest files (pyproject.toml,package.json,go.mod,Cargo.toml,*.csproj). Cross-repo results are annotated with"cross_repo": trueand"source_repo". Zero behavior change whencross_repois omitted. get_cross_repo_maptool — new tool that returns the full cross-repository dependency map at the package level, or filtered to a single repo. Showsdepends_onanddepended_on_byfor each indexed repo, plus a flatcross_repo_edgeslist.package_namesfield onCodeIndex— package names are extracted from manifest files at index time (bothindex_folderandindex_repo) and stored in the SQLite meta table. Old indexes load cleanly withpackage_names = [].package_registry.py— new module providingextract_package_names()(5 ecosystems: Python, JS/TS, Go, Rust, C#),extract_root_package_from_specifier()(language-aware root extraction),build_package_registry()(in-memory registry with mtime-based cache), andresolve_cross_repo_file().cross_repo_defaultconfig key — boolean default for thecross_repoparameter across all import graph tools. Env var:JCODEMUNCH_CROSS_REPO_DEFAULT. Default:false.- 53 new tests (1431 total, 9 skipped).
- QUICKSTART.md Step 3 — upgraded AGENT_HOOKS.md footnote to an
[!IMPORTANT]callout naming the "pressure bypass" failure mode (agent sees CLAUDE.md rule, ignores it under load) and explaining why hooks are needed for hard enforcement. - QUICKSTART.md Troubleshooting — added entry for "Claude uses jCodeMunch in simple tasks but falls back to Read/Grep in complex ones" pointing to AGENT_HOOKS.md.
- AGENT_HOOKS.md intro — sharpened to explicitly name the failure mode: the agent sees the rule and skips it anyway because native tools feel faster under pressure or in long sessions.
- Tri-state
use_ai_summaries(PR #186 — contributed by MariusAdrian88) — Config key andJCODEMUNCH_USE_AI_SUMMARIESenv var now accept three values:"auto"(new default; auto-detect provider from API keys, identical to previoustrue),true(use explicitsummarizer_provider+summarizer_modelfrom config),false(disable AI summarization entirely). Existing booleantrue/falseconfigs are fully backward-compatible. summarizer_modelconfig key — Override the default model for any provider via config orJCODEMUNCH_SUMMARIZER_MODELenv var. Priority: config key > provider-specific env var (ANTHROPIC_MODEL,GOOGLE_MODEL, etc.) > hardcoded default. Applies to all providers.summarizer_max_failuresconfig key — Circuit breaker threshold (default 3). After this many consecutive batch failures the summarizer stops calling the API and falls back to signature summaries for all remaining symbols. A successful batch resets the counter. Set 0 to disable. Thread-safe (threading.Lock). Configurable viaJCODEMUNCH_SUMMARIZER_MAX_FAILURES.- OpenRouter provider — New provider via
OPENROUTER_API_KEYusing the OpenAI-compatible API atopenrouter.ai/api/v1. Default model:meta-llama/llama-3.3-70b-instruct:free(zero cost). Auto-detect priority: last in chain (after GLM-5). Explicit selection:summarizer_provider: "openrouter"orJCODEMUNCH_SUMMARIZER_PROVIDER=openrouter.jcodemunch-mcp confignow shows active OpenRouter section. test_summarizerdiagnostic tool — Sends a probe request to the configured AI summarizer and reports status:ok,disabled,no_provider,misconfigured,fallback,timeout, orerror. Disabled by default (remove fromdisabled_toolsin config to enable). Optionaltimeout_msparameter (default 15000).strict_timeout_msconfig key — Configures the maximum milliseconds to block infreshness_mode: strictbefore proceeding with a stale index (previously hardcoded at 500ms). Default: 500.embed_modelconfig key — PromotesJCODEMUNCH_EMBED_MODELenv var to a config file setting. Configures the sentence-transformers model for local semantic embeddings. Config key takes priority over env var.summarizer_providerconfig key — PromotesJCODEMUNCH_SUMMARIZER_PROVIDERenv var to a config file setting. Takes priority over env var.- 60+ new tests (1397 total, 7 skipped).
languages_adaptiveconfig key (PR #185 — contributed by MariusAdrian88) — New boolean config key that enables automatic language detection based on files actually found in the indexed folder, overriding thelanguagesallowlist for that run. Useful when indexing polyglot repos without maintaining an explicit language list.meta_fieldsdefault changed to[](PR #185) — Previously defaulted tonull(all meta fields included); now defaults to[](no_metablock) for token-efficient responses. Set tonullin config to restore all meta fields.
- MiniMax and GLM-5 summarizer providers (PR #184 — contributed by SkaldeStefan) —
MINIMAX_API_KEYauto-detects MiniMax M2.7 (api.minimax.io/v1) andZHIPUAI_API_KEYauto-detects GLM-5 (api.z.ai), both via the existing OpenAI-compatible summarizer path.JCODEMUNCH_SUMMARIZER_PROVIDERenv var added for explicit selection (anthropic,gemini,openai,minimax,glm,none). Auto-detect priority: Anthropic → Gemini → OpenAI-compatible → MiniMax → GLM-5. Remote providers (including MiniMax/GLM) still requireallow_remote_summarizer: trueinconfig.jsonc.get_provider_name()exported fromjcodemunch_mcp.summarizer.jcodemunch-mcp confignow shows active provider and new MiniMax/GLM sections. 10 new tests (1332 total).
test_get_provider_name_unknown_falls_back_to_autotest isolation — test did not clear higher-priority env vars before auto-detecting MiniMax, causing falseanthropicresult in environments whereANTHROPIC_API_KEYis set.
- Gemini
CODE_RETRIEVAL_QUERYKeyError on legacy SDK (follow-up to #181) — The legacygoogle-generativeaipackage does not includeCODE_RETRIEVAL_QUERYin itsTaskTypeproto enum (it was introduced in the newergoogle-genaiSDK). Passing that string togenai.embed_contentcaused aKeyErrorduring semantic search. A new_normalise_gemini_task_typehelper probes the installed SDK'sTaskTypeenum at runtime and falls backCODE_RETRIEVAL_QUERY→RETRIEVAL_QUERYon legacy installs, producing equivalent retrieval quality. New SDK installs withCODE_RETRIEVAL_QUERYare unaffected. 5 new tests (1322 total).
- YAML and Ansible parser support (PR #183 — contributed by SkaldeStefan) —
.yamland.ymlfiles are now indexed as first-class symbols. A path-heuristic layer (_looks_like_ansible_path) automatically promotes Ansible-structured files (playbooks, roles, group_vars, host_vars, tasks, handlers, defaults) to theansiblelanguage so they receive Ansible-aware symbol extraction: plays asclass, tasks asfunction, roles and handlers astype, and variables asconstant. Generic YAML falls back to a structural walker that emits container keys astypeand scalar keys asconstant. Multi-document YAML (multiple---sections) is handled correctly. pyyaml is already a base dependency — no extra install step. 8 new tests (1317 total).
- Task-aware embedding for Gemini (closes #181) — When
GOOGLE_EMBED_MODELis configured,embed_reponow passestask_type="RETRIEVAL_DOCUMENT"togenai.embed_contentfor document indexing, andsearch_symbolspassestask_type="CODE_RETRIEVAL_QUERY"when embedding the search query. Models that support task types (e.g.text-embedding-004, Gemini Embedding 2) produce measurably better code retrieval results; models that do not simply ignore the parameter. Other providers (sentence-transformers, OpenAI) are unaffected. GEMINI_EMBED_TASK_AWAREenv var — Set to0/false/no/offto opt out of task-type routing (default: on). Useful if your Gemini model predates task-type support.embed_task_typestored in meta — The task type used when building the embedding index is now persisted. If you toggleGEMINI_EMBED_TASK_AWARE,embed_repodetects the mismatch and automatically forces a re-embed so query and document embeddings always come from the same task-type space.task_typefield inembed_reporesponse — Present when a task type was applied; absent for providers that do not use one.- 7 new tests (1309 total):
_gemini_task_awaredefault/opt-out, Gemini document task type inembed_repo,CODE_RETRIEVAL_QUERYrouting insearch_symbols, opt-out disables task types, task-type change triggers re-embed,EmbeddingStoretask type round-trip.
- Cross-process LRU cache invalidation — SQLite WAL mode does not always update the
.dbfile's mtime on commit. The watcher (a separate process) was writing new index data that the MCP server's in-memory cache never detected, causing agents to see stale results. New_db_mtime_ns()helper checksmax(db_mtime, db-wal_mtime)so WAL writes are detected without an explicit cache eviction call.os.utime()added aftersave_index()andincremental_save()as a belt-and-suspenders measure;os.utime()runs before_cache_put()so the cached mtime matches what cross-process readers compute. get_file_treesilently ignoredmax_files— the parameter was present in the MCP schema but was never passed throughcall_tooldispatch.- Config template stale entries —
wait_for_fresh(removed v1.12.0) was still listed indisabled_toolstemplate; staleness_metafields (index_stale,reindex_in_progress,stale_since_ms) were still listed inmeta_fieldstemplate.
file_tree_max_filesconfig key — configures theget_file_treeresult cap viaconfig.jsoncorJCODEMUNCH_FILE_TREE_MAX_FILESenv var (default 500). Per-callmax_filesparam still overrides.gitignore_warn_thresholdconfig key — configures the missing-.gitignorewarning threshold inindex_folderviaconfig.jsoncorJCODEMUNCH_GITIGNORE_WARN_THRESHOLDenv var (default 500). Set0to disable entirely.- Config template overhaul — all keys now have inline documentation; tools and meta_fields lists sorted alphabetically; all missing keys added (
trusted_folders_whitelist_mode,exclude_secret_patterns,path_map, watcher params, transport docs);versionfield added for future migration tooling. Note: the template now defaults to"meta_fields": [](no_metain responses) rather thannull(all fields) — better for token efficiency; users who want_metashould uncomment the desired fields. - 5 new tests covering
_db_mtime_ns(no-WAL, WAL-newer, WAL-older) and the full cross-process cache invalidation scenario (1302 total). Contributed by MariusAdrian88 (PR #180).
.razor(Blazor component) file support —.razorextension now mapped to therazorlanguage spec alongside.cshtml._parse_razor_symbolsextended to emit@pageroute directives and@injectdependency injection bindings as constant symbols, making Blazor component routes and injected services first-class navigable symbols. IncludesCounter.razortest fixture and 8 new tests (1298 total). Contributed by drax1222 (PR #182).
get_file_treetoken overflow on large indexes (closes #178) — results are now capped atmax_files(default 500). When truncated, the response includestruncated: true,total_file_count, and ahintsuggestingpath_prefixto scope the query.max_filesis exposed as a tool parameter so callers can raise it explicitly if needed.index_foldersilent over-inclusion (closes #178) — when no.gitignoreis present in the repo root and ≥ 500 files are indexed, a warning is now included in the result advising the user to add a.gitignoreand re-index.- 10 new tests (1288 total).
check_freshnessandwait_for_freshMCP tools — no client ever consumed these; removing them saves ~400 schema tokens per call. Server-side freshness management viafreshness_modeconfig key (relaxed/strict) remains fully functional.- Staleness
_metafields (index_stale,reindex_in_progress,stale_since_ms) — ~30-50 tokens of annotated noise per response. The watcher still manages freshness internally; strict mode blocks silently incall_toolbefore returning clean results. powered_byremoved from_metacommon fields.
- Watcher config layering —
_get_watcher_enabled()previously bypassedconfig_module.get()and readJCODEMUNCH_WATCHenv var directly, silently ignoring the"watch"key inconfig.jsonc. Precedence is now: CLI flag > config file (with env var as fallback only when key absent). - Hash-cache miss reindex skip — when the watcher's in-memory hash cache missed, the fallback read the file from disk. By the time
watchfilesdelivers the event the file already has new content, makingold_hash == new_hashand silently skipping the change. Fixed with a"__cache_miss__"sentinel that guarantees re-parse on any cache miss. - Flaky Windows tests from SQLite WAL cache contamination — tests that modified the DB directly didn't invalidate the in-memory LRU cache; WAL mode on Windows doesn't always update file mtime on write, so the cache key matched stale data. Fixed via
tests/conftest.pyautouse fixtures for cache clear and config reset, plus targeted_cache_evict()calls after direct DB writes. test_openai_summarizer_timeout_confignow correctly flowsallow_remote_summarizerthroughload_config()instead of reading fromconfig.get()directly.
- Config-driven watcher parameters — all watcher options are now configurable via
config.jsonc(CLI flags remain as overrides). New keys:watch_debounce_ms(int, default 2000) — was wired in config.py but not forwarded to watcher kwargswatch_paths(list, default[]→ CWD) — folders to watchwatch_extra_ignore(list, default[]) — additional gitignore-style patternswatch_follow_symlinks(bool, defaultfalse)watch_idle_timeout(int or null, defaultnull) — auto-stop after N minutes idlewatch_log(str or null, defaultnull) — log watcher output to file;"auto"= temp file
- 25 new tests (1285 total).
- Optional semantic / embedding search (Feature 8) — hybrid BM25 + vector search, opt-in only, zero mandatory new dependencies.
search_symbolsgains three new params:semantic(bool, defaultfalse),semantic_weight(float 0–1, default 0.5),semantic_only(bool, defaultfalse). Whensemantic=false(default) there is zero performance impact and zero new imports.- New
embed_repotool — precomputes and caches all symbol embeddings in one pass (batch_size,forceparams). Optional warm-up;search_symbolslazily embeds missing symbols on first semantic query. - New
EmbeddingStore— thin SQLite CRUD layer (symbol_embeddingstable) in the existing per-repo.dbfile. Embeddings serialised as float32 BLOBs via stdlibarraymodule. Persists across restarts; invalidatable per-symbol for incremental reindex. - Three embedding providers (priority order): local
sentence-transformers(JCODEMUNCH_EMBED_MODELenv var), Gemini (GOOGLE_API_KEY+GOOGLE_EMBED_MODEL), OpenAI (OPENAI_API_KEY+OPENAI_EMBED_MODEL).OPENAI_API_KEYalone does not activate embeddings (prevents conflation with local-LLM summariser use). - Hybrid ranking:
combined = (1−w) × bm25_normalised + w × cosine_similarity. BM25 normalised by max score over the candidate set.semantic_weight=0.0produces identical results to pure BM25. - Pure Python cosine similarity —
math.sqrt+sum(), no numpy required. semantic=truewith no provider configured returns{"error": "no_embedding_provider", "message": "..."}(structured error, not a crash).- New optional dep:
pip install jcodemunch-mcp[semantic]installssentence-transformers>=2.2.0. - 22 new tests.
- Token-budgeted context assembly (Feature 5) — two new capabilities:
get_context_bundlegainstoken_budget,budget_strategy, andinclude_budget_reportparams. Whentoken_budgetis set, symbols are ranked and trimmed to fit.budget_strategycontrols how:most_relevant(default) ranks by file import in-degree,core_firstkeeps the primary symbol first then ranks the rest by centrality,compactstrips all source bodies and returns signatures only.include_budget_report=trueadds abudget_reportfield showingbudget_tokens,used_tokens,included_symbols,excluded_symbols, andstrategy. Fully backward-compatible: all new params default to existing behavior.- New
get_ranked_contexttool — standalone token-budgeted context assembler. Takes aquery+token_budget(default 4000) and returns the best-fit symbols with their full source, greedy-packed by combined score.strategycontrols ranking:combined(BM25 + PageRank weighted sum, default),bm25(pure text relevance),centrality(PageRank only). Optionalinclude_kindsandscopeparams restrict the candidate set. Response includes per-itemrelevance_score,centrality_score,combined_score,tokens, andsource. Token counting useslen(text) // 4heuristic with optionaltiktokenupgrade (no hard dep). No new dependencies. 19 new tests.
get_changed_symbolstool — maps a git diff to affected symbols. Given two commits (since_sha/until_sha, defaulting to index-time SHA vs HEAD), returnsadded_symbols,removed_symbols, andchanged_symbols(withchange_type: "added", "removed", "modified", or "renamed").renameddetection fires when body hash is identical but name differs. Setinclude_blast_radius=trueto also return downstream importers (withmax_blast_depthhop limit). Requires a locally indexed repo (index_folder); GitHub-indexed repos return a clear error. Requiresgiton PATH; graceful error if not available. Filters index-storage files (e.g..index/) from the diff when the storage dir is inside the repo. No new dependencies. 12 new tests.
find_dead_codetool — finds files and symbols unreachable from any entry point using the import graph. Entry points auto-detected by filename (main.py,__main__.py,conftest.py,manage.py, etc.),__init__.pypackage roots, andif __name__ == "__main__"guards (Python only). Returnsdead_filesanddead_symbolswith confidence scores:1.0= zero importers, no framework decoration;0.9= zero importers in a test file;0.7= all importers are themselves dead (cascading). Parameters:granularity("symbol"/"file"),min_confidence(default 0.8),include_tests(bool),entry_point_patterns(additional glob roots). No new dependencies. 13 new tests.
- Manifest watcher reliability — replaced
watchfiles.awatch()in_manifest_watcherwith a simple 0.5s polling loop.watchfileswas unreliable on Windows (especially in temp directories used by tests and agent hooks), causing the manifest watcher to silently miss create/remove events. Polling the manifest file's size every 500ms is sufficient for this append-only JSONL file and works reliably on all platforms.
- PageRank / centrality ranking — new
get_symbol_importancetool returns the most architecturally important symbols in a repo, ranked by full PageRank or simple in-degree on the import graph. Parameters:top_n(default 20),algorithm("pagerank" or "degree"),scope(subdirectory filter). Response includessymbol_id,rank,score,in_degree,out_degree,kind,iterations_to_converge. Newsort_byparameter onsearch_symbols("relevance" | "centrality" | "combined") — "centrality" filters by BM25 query match but ranks by PageRank; "combined" adds PageRank as weighted boost to BM25 score; "relevance" (default) is unchanged (backward compatible).get_repo_outlinenow includesmost_central_symbols(top 10 symbols by PageRank score, one representative per file, alongside the existingmost_imported_files). PageRank implementation: damping=0.85, convergence threshold=1e-6, max 100 iterations, dangling-node correction, cached in_bm25_cacheperCodeIndexload. 23 new tests.
- Fuzzy symbol search —
search_symbolsgains three new parameters:fuzzy(bool, defaultfalse),fuzzy_threshold(float, default0.4), andmax_edit_distance(int, default2). When enabled, a trigram Jaccard + Levenshtein pass runs as fallback when BM25 confidence is low (top score < 0.1) or when explicitly requested. Fuzzy results carrymatch_type="fuzzy",fuzzy_similarity, andedit_distancefields; BM25 results carrymatch_type="exact". Zero behavioral change whenfuzzy=false(default). No new dependencies — pure stdlib (frozensettrigrams + Wagner-Fischer edit distance). 21 new tests.
- Blast radius depth scoring —
get_blast_radiusnow always returnsdirect_dependents_count(depth-1 count) andoverall_risk_score(0.0–1.0, weighted by hop distance using1/depth^0.7). Newinclude_depth_scores=trueparameter addsimpact_by_depth(files grouped by BFS layer, each with arisk_score). Flatconfirmed/potentiallists are preserved unchanged (backward compatible). 14 new tests.
- Windows CI: trusted_folders tests —
_platform_path_strwas usingstr(Path(...))which on Windows returns backslash paths (C:\work). When embedded raw into f-string JSON literals in tests, the backslash produced invalid\escapesequences, causingconfig.jsoncparse failures across all 4 Windows matrix legs (6 tests failing). Fixed by switching to.as_posix(), which returns forward-slash paths (C:/work) that are valid in both JSON and Windows pathlib.
trusted_foldersallowlist forindex_folder(PR #175, credit: @tmeckel) — newtrusted_foldersconfig key (plustrusted_folders_whitelist_mode) restricts or blocks indexing by path. Whitelist mode (default) allows only explicitly named roots; blacklist mode blocks specific paths while trusting all others. Path-aware matching (not string-prefix). Project config supports.,./subdir, and bare relative paths. Escape-attempt paths are rejected. Empty list preserves existing behavior (backward compatible). Env var fallback viaJCODEMUNCH_TRUSTED_FOLDERS.
check_freshnesstool — compares the git HEAD SHA recorded at index time against the current HEAD for locally indexed repos. Returnsfresh(bool),indexed_sha,current_sha, andcommits_behind. GitHub repos returnis_local: falsewith an explanatory message.get_repo_outlinestaleness check upgraded to SHA-based comparison (accurate) with time-based fallback for GitHub/no-git repos;is_staleadded to_meta. 8 new tests.
- Structured file-cap warnings —
index_folderandindex_reponow surfacefiles_discovered,files_indexed, andfiles_skipped_capfields plus a human-readablewarningwhen the file cap is hit. Previously a silent "note". _metahint on single-symbol responses —search_symbolsandget_symbol_sourcesingle-symbol responses now include a_metahint pointing toget_context_bundle.
- Benchmark docs —
METHODOLOGY.mdexpanded with a "Common Misreadings" section; reproducible results table added to README.
tsconfig.json/jsconfig.jsonparsed as JSONC — previouslyjson.loads()silently failed on commented tsconfigs (TypeScript projects commonly use//comments in tsconfig.json), leavingalias_mapempty and causingfind_importers/get_blast_radiusto return 0 alias-based results. Now parsed with the same JSONC stripper used forconfig.jsonc. Also adds a test for nested layouts with specific@/lib/*overrides. Closes #170. 5 new tests.
- TypeScript/SvelteKit path alias resolution —
find_importers,get_blast_radius,get_dependency_graph, and 5 other import-graph tools now resolve@/*,$lib/*, and other configured aliases by readingcompilerOptions.pathsfromtsconfig.json/jsconfig.jsonat the project root. Also resolves TypeScript's ESM.js→.tsextension convention.alias_mapis auto-loaded fromsource_rootand cached at module level. Closes #169. 10 new tests.
- Debug logging for silent skip paths — all three skip paths (
skip_dir,skip_file,secret) now emit debug-level log lines.skip_dirandskip_filecounters added to the discovery summary.exclude_secret_patternsconfig option suppresses specificSECRET_PATTERNSentries (workaround for*secret*glob false-positives on full relative paths in Go monorepos). (PR #168, credit: @DrHayt) 6 new tests.
resolve_repohang on Windows — addedstdin=subprocess.DEVNULLto the git subprocess call in_git_toplevel(). Without it, the git child process inherits the MCP stdio pipe and blocks indefinitely. Same pattern fixed in v1.1.7 forindex_folder. Closes #166.parse_git_worktreeshang on Windows (watcher) — same missingstdin=subprocess.DEVNULLfix, preventative.
find_importers:has_importersflag — each result now includeshas_importers: bool. Whenfalse, the importer itself has no importers, revealing transitive dead code chains without requiring recursive calls. Implemented as one additional O(n) pass over the import graph; no re-indexing required. Closes #132. Identified via 50-iteration dead code A/B test (#130).
get_file_outlinetool description — now explicitly states "full signatures (including parameter names)" and adds "Use signatures to review naming at parameter granularity without reading the full file." Parameter names were always present in thesignaturefield; the description now makes this discoverable. Closes #131.
- Dynamic
import()detection in JS/TS/Vue —find_importersnow detects Vue Router lazy routes and other code-splitting patterns usingimport('specifier')call syntax. Previously these files appeared to have zero importers and were misclassified as dead. Identified via 50-iteration dead code A/B test (#130, @Mharbulous); 4 Vue view files affected.
- Supply-chain integrity check —
verify_package_integrity()added tosecurity.pyand called at startup. Usesimportlib.metadata.packages_distributions()to identify the distribution that actually owns the running code. If it differs from the canonicaljcodemunch-mcp, aSECURITY WARNINGis printed to stderr. Catches the fork-republishing attack class described at https://news.ycombinator.com/item?id=47428217. Silent for source/editable installs.
authorsand[project.urls]inpyproject.toml— PyPI pages now display official provenance metadata (author, homepage, issue tracker).
- JS/TS const extraction — top-level
constandexport constdeclarations in JavaScript, TypeScript, and TSX are now indexed asconstantsymbols. Arrow functions and function expressions assigned to consts are correctly skipped (handled by existing function extraction). Accepts all identifier naming conventions for JS/TS. index_filetool (PR #126, credit: @thellMa) — re-index a single file instantly after editing. Locates the correct index by scanningsource_rootof all indexed repos (picks most specific match), validates security, computes hash + mtime, and exits early if the file is unchanged. Parses with tree-sitter, runs context providers, and callsincremental_save()for a surgical single-file update. Registered as a new MCP tool withpath,use_ai_summaries, andcontext_providersparameters.- mtime optimization (PR #126, credit: @thellMa) —
index_folderandindex_reponow check file modification time (st_mtime_ns) before reading or hashing. Files with unchanged mtimes are skipped entirely; hashes are computed lazily only for files whose mtime changed. Indexes store afile_mtimesdict; old indexes without mtime data fall back to hash-all for backward compatibility. watch-claudeCLI subcommand — auto-discover and watch Claude Code worktrees via two complementary modes:- Hook-driven mode (recommended): install
WorktreeCreate/WorktreeRemovehooks that calljcodemunch-mcp hook-event create|remove. Events are written to~/.claude/jcodemunch-worktrees.jsonlandwatch-claudereacts instantly via filesystem watch. --reposmode:jcodemunch-mcp watch-claude --repos ~/project1 ~/project2pollsgit worktree list --porcelainand filters for Claude-created worktrees (branches matchingclaude/*orworktree-*).- Both modes can run simultaneously. When a worktree is removed, the watcher stops and the index is invalidated.
- Hook-driven mode (recommended): install
hook-eventCLI subcommand —jcodemunch-mcp hook-event create|removereads Claude Code's hook JSON from stdin and appends to the JSONL manifest. Designed to be called from Claude Code'sWorktreeCreate/WorktreeRemovehooks.
- Shared indexing pipeline (PR #126, credit: @thellMa) — new
_indexing_pipeline.pyconsolidates logic previously duplicated acrossindex_folder,index_repo, and the newindex_file:file_languages_for_paths(),language_counts(),complete_file_summaries(),parse_and_prepare_incremental(), andparse_and_prepare_full(). All three tools now call the shared pipeline functions. main()subcommand set expanded to includehook-eventandwatch-claude.
- Stale
context_metadataon incremental save —{}from active providers was treated as falsy, silently preserving old metadata instead of clearing it. Changed tois not Nonecheck. _resolve_descriptiondiscarding surrounding text —"Prefix {{ doc('name') }} suffix"now preserves both prefix and suffix instead of returning only the doc block content.- dbt tags only extracted from
config.tags— top-levelmodel.tags(valid in dbt schema.yml) are now merged withconfig.tags, deduplicated. - Redundant
posixpath.sepcheck inresolve_specifier— removed duplicate of adjacent"/" not incheck. - Inaccurate docstring on
_detect_dbt_project— said "max 2 levels deep" but only checks root + immediate children.
- Concurrent AI summarization —
BaseSummarizer.summarize_batch()now usesThreadPoolExecutor(default 4 workers) for Anthropic and Gemini providers. Configurable viaJCODEMUNCH_SUMMARIZER_CONCURRENCYenv var. Matches the pattern already used byOpenAIBatchSummarizer. ~4x faster on large projects. - O(1) stem resolution —
resolve_specifierstem-matching fallback now uses a cached dict lookup instead of O(n) linear scan. Significant perf improvement for dbt projects with thousands of files, called in tight loops across 7 tools. collect_metadatacollision warning — logs a warning when two providers emit the same metadata key, instead of silently overwriting viadict.update().find_importers/find_referencestool descriptions — now note that{{ source() }}edges are extracted but not resolvable since sources are external.search_columnscleanup — movedimport fnmatchto top-level; documented empty-query +model_patternbehavior (acts as "list all columns for matching models").
- Centrality ranking —
search_symbolsBM25 scores now include a log-scaled bonus for symbols in frequently-imported files, surfacing core utilities as tiebreakers when relevance scores are otherwise equal. get_symbol_diff— diff two indexed snapshots by(name, kind). Reports added, removed, and changed symbols usingcontent_hashfor change detection. Index the same repo under two names to compare branches.get_class_hierarchy— traverse inheritance chains upward (ancestors viaextends/implements/Python parentheses) and downward (subclasses/implementors) from any class. Handles external bases not in the index.get_related_symbols— find symbols related to a given one via three heuristics: same-file co-location (weight 3.0), shared importers (1.5), name-token overlap (0.5/token).- Git blame context provider —
GitBlameProviderauto-activates duringindex_folderwhen a.gitdirectory is present. Runs a singlegit logat index time and attacheslast_author+last_modifiedto every file via the existing context provider plugin system. suggest_queries— scan the index and get top keywords, most-imported files, kind/language distribution, and ready-to-run example queries. Ideal first call when exploring an unfamiliar repository.- Markdown export —
get_context_bundlenow acceptsoutput_format="markdown", returning a paste-ready document with import blocks, docstrings, and fenced source code.
watchCLI subcommand (PR #113, credit: @DrHayt) —jcodemunch-mcp watch <path>...monitors one or more directories for filesystem changes and triggers incremental re-indexing automatically. Useswatchfiles(Rust-based, async) for OS-native notifications with configurable debounce. Install withpip install jcodemunch-mcp[watch].watchfiles>=1.0.0optional dependency under[watch]and[all]extras.
main()refactored to use argparse subcommands (serve,watch). Full backwards compatibility preserved — barejcodemunch-mcpand legacy flags like--transportcontinue to work unchanged.
get_context_bundlemulti-symbol bundles — newsymbol_ids(list) parameter fetches multiple symbols in one call. Import statements are deduplicated when symbols share a file. Newinclude_callers=trueflag appends the list of files that directly import each symbol's defining file.
- Single
symbol_id(string) remains fully backward-compatible.
get_blast_radiustool — find every file affected by changing a symbol. Given a symbol name or ID, traverses the reverse import graph (up to 3 hops) and text-scans each importing file. Returnsconfirmed(imports the file + references the symbol name) andpotential(imports the file only — wildcard/namespace imports). Handles ambiguous names by listing all candidate IDs.
- BM25 search — replaced hand-tuned substring scoring in
search_symbolswith proper BM25 + IDF. IDF is computed over all indexed symbols at query time (no re-indexing required). CamelCase/snake_case tokenization splitsgetUserByIdintoget,user,by,idfor natural language queries. Per-field repetition weights: name 3×, keywords 2×, signature 2×, summary 1×, docstring 1×. Exact name match retains a +50 bonus.debug=truenow returns per-field BM25 score breakdowns.
get_dependency_graphtool — file-level import graph with BFS traversal up to 3 hops.directionparameter:imports(what this file depends on),importers(what depends on this file), orboth. Returns nodes, edges, and per-node neighbor map. Built from existing index data — no re-indexing required.
get_session_statstool — process-lifetime token savings dashboard. Reports tokens saved and cost avoided (current session + all-time cumulative), per-tool breakdown, session duration, and call counts.
- Tiered loading (
detail_levelonsearch_symbols) —compactreturns id/name/kind/file/line only (~15 tokens/result, ideal for discovery);standardis unchanged (default);fullinlines source, docstring, and end_line. byte_lengthfield added to allsearch_symbolsresult entries regardless of detail level.
- Token budget search (
token_budget=Nonsearch_symbols) — greedily packs results by byte length until the budget is exhausted. Overridesmax_results. Reportstokens_usedandtokens_remainingin_meta.
- Microsoft Dynamics 365 Business Central AL language support (PR #110, credit: @DrHayt) —
.alfiles are now indexed. Extracts procedures, triggers, codeunits, tables, pages, reports, and XML ports.
tokens_savedalways reporting 0 inget_file_outlineandget_repo_outline.
- Benchmark reproducibility —
benchmarks/METHODOLOGY.mdwith full reproduction details. - HTTP bearer token auth —
JCODEMUNCH_HTTP_TOKENenv var secures HTTP transport endpoints. JCODEMUNCH_REDACT_SOURCE_ROOTenv var redacts absolute local paths from responses.- Schema validation on index load — rejects indexes missing required fields.
- SHA-256 checksum sidecars — index integrity verification on load.
- GitHub rate limit retry — exponential backoff in
fetch_repo_tree. TROUBLESHOOTING.mdwith 11 common failure scenarios and solutions.- CI matrix extended to Windows and Python 3.13.
- Token savings labeled as estimates;
estimate_methodfield added to all_metaenvelopes. search_textraw byte count now only includes files with actual matches.VALID_KINDSmoved to afrozensetinsymbols.py; server-side validation rejects unknown kinds.
- Cross-process file locking via
filelock— prevents index corruption under concurrent access. - LRU index cache with mtime invalidation — re-reads index JSON only when the file changes on disk.
- Metadata sidecars —
list_reposreads lightweight sidecar files instead of loading full index JSON. - Streaming file indexing — peak memory reduced from ~1 GB to ~500 KB during large repo indexing.
- Bounded heap search —
O(n log k)instead ofO(n log n)for bounded result sets. BaseSummarizerbase class — deduplicates_build_prompt/_parse_responseacross AI summarizers.- +13 new tests covering
search_columns,get_context_bundle, and ReDoS hardening.
- ReDoS protection in
search_text— pathological regex patterns are rejected before execution. - Symlink-safe temp files — atomic index writes use
tempfilerather than direct overwrite. - SSRF prevention — API base URL validation rejects non-HTTP(S) schemes.
- Assembly language support (PR #105, credit: @astrobleem) — WLA-DX, NASM, GAS, and CA65 dialects.
.asm,.s,.wlafiles indexed. Extracts labels, macros, sections, and directives as symbols. "asm"added tosearch_symbolslanguage filter enum.
- Cross-process token savings loss —
token_trackernow uses additive flush so savings accumulated in one process are not overwritten by a concurrent flush from another.
- XML
nameandkeyattribute extraction — elements withname=orkey=attributes are now indexed asconstantsymbols (closes #102).
- Minimal CLI (
cli/cli.py) — 47-line command-line interface over the shared~/.code-index/store covering all jMRI ops:list,index,outline,search,get,text,file,invalidate. cli/README.md— explains MCP as the preferred interface and documents CLI usage.
- README onboarding improved: added "Step 3: Tell Claude to actually use it" with copy-pasteable
CLAUDE.mdsnippets.
- AutoHotkey hotkey indexing — all three hotkey syntax forms are now extracted as
kind: "constant"symbols: bare triggers (F1::), modifier combos (#n::), and single-line actions (#n::Run "notepad"). Only indexed at top level (not inside class bodies). #HotIfdirective indexing — both opening expressions (#HotIf WinActive(...)) and bare reset (#HotIf) are indexed, searchable by window name or expression string.- Public benchmark corpus —
benchmarks/tasks.jsondefines the 5-task × 3-repo canonical task set in a tool-agnostic format. Any code retrieval tool can be evaluated against the same queries and repos. benchmarks/README.md— full methodology documentation: baseline definition, jMunch workflow, how to reproduce, how to benchmark other tools.benchmarks/results.md— canonical tiktoken-measured results (95.0% avg reduction, 20.2x ratio, 15 task-runs). Replaces the obsolete v0.2.22 proxy-based benchmark files.- Benchmark harness now loads tasks from
tasks.jsonwhen present, falling back to hardcoded values.
- OpenAPI / Swagger support —
.openapi.yaml,.openapi.yml,.openapi.json,.swagger.yaml,.swagger.yml,.swagger.jsonfiles are now indexed. Well-known basenames (openapi.yaml,swagger.json, etc.) are auto-detected regardless of directory. Extracts: API info block, paths asfunctionsymbols, schema definitions asclasssymbols, and reusable component schemas. get_language_for_pathnow checks well-known OpenAPI basenames before compound-extension matching."openapi"added tosearch_symbolslanguage filter enum.
get_context_bundletool — returns a self-contained context bundle for a symbol: its definition source, all direct imports, and optionally its callers/implementers. Replaces the commonget_symbol+find_importers+find_referencesround-trip with a single call. Scoped to definition + imports in this release.
- C# properties, events, and destructors (PR #100) —
get { set {property accessors,event EventHandler Name, and~ClassName()destructors are now extracted as symbols alongside existing C# method/class support.
- XML / XUL language support (PR #99) —
.xmland.xulfiles are now indexed. Extracts: document root element as atypesymbol, elements withidattributes asconstantsymbols, and<script src="...">references asfunctionsymbols. Preceding<!-- -->comments captured as docstrings.
- GitHub blob SHA incremental indexing —
index_reponow stores per-file blob SHAs from the GitHub tree response and diffs them on re-index. Only files whose SHA changed are re-downloaded and re-parsed. Previously, every incremental run downloaded all file contents before discovering what changed. - Tokenizer-true benchmark harness —
benchmarks/harness/run_benchmark.pymeasures real tiktokencl100k_basetoken counts for the jMunch retrieval workflow vs an "open every file" baseline on identical tasks. Produces per-task markdown tables and a grand summary.
- Search debug mode —
search_symbolsnow acceptsdebug=Trueto return per-result field match breakdown (name score, signature score, docstring score, keyword score). Makes ranking decisions inspectable.
search_columnstool — structured column metadata search across indexed models. Framework-agnostic: auto-discovers any provider that emits a*_columnskey incontext_metadata(dbt, SQLMesh, database catalogs, etc.). Returns model name, file path, column name, and description. Supportsmodel_patternglob filtering and source attribution when multiple providers contribute. 77% fewer tokens than grep for column discovery.- dbt import graph —
find_importersandfind_referencesnow work for dbt SQL models. Extracts{{ ref('model') }}and{{ source('source', 'table') }}calls as import edges, enabling model-level lineage and impact analysis out of the box. - Stem-matching resolution —
resolve_specifier()now resolves bare dbt model names (e.g.,dim_client) to their.sqlfiles via case-insensitive stem matching. No path prefix needed. get_metadata()on ContextProvider — new optional method for providers to persist structured metadata at index time.collect_metadata()pipeline function aggregates metadata from all active providers with error isolation.context_metadataon CodeIndex — new field for persisting provider metadata (e.g., column info) in the index JSON. Survives incremental re-indexes.- Updated
CONTEXT_PROVIDERS.mdwith column metadata convention (*_columnskey pattern),get_metadata()API docs, architecture data flow, and provider ideas table
search_columnstool description updated to reflect framework-agnostic design_LANGUAGE_EXTRACTORSnow includes"sql"mapping to_extract_sql_dbt_imports()
- Context provider framework (PR #89, credit: @paperlinguist) — extensible plugin system for enriching indexes with business metadata from ecosystem tools. Providers auto-detect their tool during
index_folder, load metadata from project config files, and inject descriptions, tags, and properties into AI summaries, file summaries, and search keywords. Zero configuration required. - dbt context provider — the first built-in provider. Auto-detects
dbt_project.yml, parses{% docs %}blocks andschema.ymlfiles, and enriches symbols with model descriptions, tags, and column metadata. Install withpip install jcodemunch-mcp[dbt]. JCODEMUNCH_CONTEXT_PROVIDERS=0env var andcontext_providers=Falseparameter to disable provider discovery entirelycontext_enrichmentkey inindex_folderresponse reports stats from all active providersCONTEXT_PROVIDERS.md— architecture docs, dbt provider details, and community authoring guide for new providers
- Eliminated redundant file downloads on incremental GitHub re-index (fixes #86) —
index_reponow stores the GitHub tree SHA after every successful index and compares it on subsequent calls before downloading any files. If the tree SHA is unchanged, the tool returns immediately ("No changes detected") without a single file download. Previously, every incremental run fetched all file contents from GitHub before discovering nothing had changed, causing 25–30 minute re-index sessions. The fast-path adds only one API call (the tree fetch, which was already required) and exits in milliseconds when the repo hasn't changed. list_reposnow exposesgit_head— so AI agents can reason about index freshness without triggering any download. Whengit_headis absent or doesn't match the current tree SHA, the agent knows a re-index is warranted.
- Massive folder indexing speedup (PR #80, credit: @briepace) — directory pruning now happens at the
os.walklevel by mutatingdirnames[:]before descent. Previously, skipped directories (node_modules, venv, .git, dist, etc.) were fully walked and their files discarded one by one. Now the walker never enters them at all. Real-world result: 12.5 min → 30 sec on a vite+react project.- Fixed
SKIP_FILES_REGEXto use.search()instead of.match()so suffix patterns like.min.jsand.bundle.jsare correctly matched against the end of filenames - Fixed regex escaping on
SKIP_FILESentries (re.escape) and the xcodeproj/xcworkspace patterns inSKIP_DIRECTORIES
- Fixed
- Performance: eliminated per-call disk I/O in token savings tracker —
record_savings()previously did a disk read + write on every single tool call. Now uses an in-memory accumulator that flushes to disk every 10 calls and at process exit viaatexit. Telemetry is also batched at flush time instead of spawning a new thread per call. Fixes noticeable latency on rapid tool use sequences (get_file_outline, search_symbols, etc.).
- SQL language support —
.sqlfiles are now indexed viatree-sitter-sql(derekstride grammar)- CREATE TABLE, VIEW, FUNCTION, INDEX, SCHEMA extracted as symbols
- CTE names (
WITH name AS (...)) extracted as function symbols - dbt Jinja preprocessing:
{{ }},{% %},{# #}stripped before parsing - dbt directives extracted as symbols:
{% macro %},{% test %},{% snapshot %},{% materialization %} - Docstrings from preceding
--comments and{# #}Jinja block comments - 27 new tests covering DDL, CTEs, Jinja preprocessing, and all dbt directive types
- Context provider framework — extensible plugin system for enriching indexes with business metadata from ecosystem tools. Providers auto-detect their tool during
index_folder, load metadata from project config files, and inject descriptions, tags, and properties into AI summaries, file summaries, and search keywords. Zero configuration required. - dbt context provider — the first built-in provider. Auto-detects
dbt_project.yml, parses{% docs %}blocks andschema.ymlfiles, and enriches symbols with model descriptions, tags, and column metadata. context_enrichmentkey inindex_folderresponse reports stats from all active providers- New optional dependency:
pip install jcodemunch-mcp[dbt]for schema.yml parsing (pyyaml) CONTEXT_PROVIDERS.mddocumentation covering architecture, dbt provider details, and guide for writing new providers- 58 new tests covering the context provider framework, dbt provider, and file summary integration
test_respects_env_file_limitnow usesJCODEMUNCH_MAX_FOLDER_FILES(the correct higher-priority env var) instead of the legacyJCODEMUNCH_MAX_INDEX_FILES
staleness_warningfield inget_repo_outlineresponse when the index is 7+ days old — configurable viaJCODEMUNCH_STALENESS_DAYSenv var
duration_secondsfield in allindex_folderandindex_reporesult dicts (full, incremental, and no-changes paths) — total wall-clock time rounded to 2 decimal placesJCODEMUNCH_USE_AI_SUMMARIESenv var now mentioned inindex_folderandindex_repoMCP tool descriptions for discoverability- Integration test verifying
index_folderis dispatched viaasyncio.to_thread(guards against event-loop blocking regressions)
First stable release. The MCP tool interface, index schema (v3), and symbol data model are now considered stable.
Python, JavaScript, TypeScript, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP, Swift, Kotlin, Dart, Elixir, Gleam, Bash, Nix, Vue SFC, EJS, Verse (UEFN), Laravel Blade, HTML, and plain text.
- Tree-sitter AST parsing for structural, not lexical, symbol extraction
- Byte-offset content retrieval —
get_symbolreads only the bytes for that symbol, never the whole file - Incremental indexing — re-index only changed files on subsequent runs
- Atomic index saves (write-to-tmp, then rename)
.gitignoreawareness and configurable ignore patterns- Security hardening: path traversal prevention, symlink escape detection, secret file filtering, binary file detection
- Token savings tracking with cumulative cost-avoided reporting
- AI-powered symbol summaries (optional, requires
anthropicextra) get_symbolsbatch retrievalcontext_linessupport onget_symbolverifyflag for content hash drift detection
get_symbol/get_symbols: O(1) symbol lookup via in-memory dict (was O(n))- Eliminated redundant JSON index reads on every symbol retrieval
SKIP_PATTERNSconsolidated to a single source of truth insecurity.py
slugify()removed from the publicparserpackage export (was unused)- Index schema v3 is incompatible with v1 indexes — existing indexes will be automatically re-built on first use