mascot: π’ a careful little turtle taking snapshots
version: 0.1.1
type: portable POSIXshscript
snapper.sh lets you:
- snap: take a text-only snapshot of selected files in your project (paths + contents), ideal to paste or upload as LLM context.
- build: recreate those files and directories later from the snapshot.
- tree: list the file structure of a project without content, giving an LLM a high-level overview.
- strip comments: remove comments (
//,/* */, and#) to reduce token usage when feeding code to LLMs. - remove blank lines: optionally strip all blank lines and trailing whitespace for maximum compactness.
it's intentionally small, dependency-light, and predictable: it prefers git ls-files (respecting .gitignore), falls back to find, and never includes binaries.
# 1) put snapper.sh somewhere on your PATH
chmod +x snapper.sh
mv snapper.sh ~/.local/bin/snapper
# 2) from a project directory, capture all go and markdown files + the README
snapper snap -o snapshot.txt '*.go' '**/*.md' '/README.md'
# 3) exclude test files when snapshotting
snapper snap -o snapshot.txt -e '*_test.go' '*.go'
# 4) remove comments to save tokens
snapper snap -r -o snapshot.txt '*.go' '*.js' '*.py'
# 5) remove comments AND blank lines for maximum compactness
snapper snap -r -w -o snapshot.txt '*.go' '*.js' '*.py'
# 6) use parallel processing for faster performance (8 jobs)
snapper snap -r -j 8 -o snapshot.txt '*.go' '*.js' '*.py'
# 7) rebuild into a temp folder (create the folder if missing)
mkdir -p /tmp/restore
snapper build -C /tmp/restore -p -i snapshot.txtresult:
snapshot.txtcontains a lean, token-efficient list of file paths, each followed by its content fenced as markdown code. there are no headers or other metadata./tmp/restorewill have the same files re-materialized with the original relative paths.
no build needed. any POSIX shell should work.
curl -fsSL -o snapper.sh https://raw.githubusercontent.com/code-n-go/snapper/main/snapper.sh
chmod +x snapper.sh
# optional: install to PATH
install -m 0755 snapper.sh /usr/local/bin/snappersnapper.sh snap [options] -o <snapshot.txt> <pattern> [pattern...]
snapper.sh build [options] -i <snapshot.txt>snapβ recursively scan a project and write a text-only snapshot of selected files.buildβ recreate files and directories from a snapshot produced bysnap.
you can mix and match patterns. matching is portable globbing with a couple of rules:
- globs (path or basename):
*.go,**/*.md,src/**/*.go,**/*.(js|ts) - explicit project-root path:
/README.md,/docs/INSTALL.md(leading/means relative to the project root, not the filesystem root.) **is treated like*for portability.- patterns without
/match basenames anywhere (e.g.,*.go). - patterns with
/match full relative paths (e.g.,cmd/**/main.go).
| option | arg | default | description |
|---|---|---|---|
-o |
<path> |
required | output snapshot to write. if it exists, script warns and exits (code 3) unless -f is set. |
-C |
<dir> |
. |
project root directory to scan from. |
-m |
<kb> |
200 |
max size per file in KB. 0 disables limit. files exceeding the limit are skipped. |
-s |
<num> |
0 |
split output into multiple files, each containing at most <num> files. 0 disables. |
-r |
β | off | remove comments: strip //, /* */, and # comments to reduce token usage. note: # removal is skipped for document formats (.md, .txt, .rst, .doc, .docx, .rtf, .pdf, .org, .adoc, .asciidoc) to preserve structure like markdown headers. |
-w |
β | off | remove blank lines: strip all blank lines and trailing whitespace from files. can be combined with -r for maximum token reduction. |
-j |
<num> |
4 |
parallel jobs: number of parallel jobs for processing files. 0 disables parallelization. higher values speed up processing of large projects. |
-e |
<pat> |
β | exclude pattern. can be used multiple times. files matching any exclude pattern will be skipped even if they match an include pattern. |
-t |
β | off | tree-only: output only the paths of matched files, without any content. |
-a |
β | off | all dirs: disable default ignores (see below). |
-q |
β | off | quiet progress/skips (final metrics still printed). |
-f |
β | off | force overwrite of output snapshot if it exists. |
-h |
β | β | show help and exit. |
--version |
β | β | print version and exit. |
| option | arg | default | description |
|---|---|---|---|
-i |
<path> |
required | input snapshot file. use - to read from stdin. |
-C |
<dir> |
. |
target root directory to build into. |
-f |
β | off | force overwrite of existing files (default is to skip). |
-p |
β | off | create the build root directory if it doesn't exist (like mkdir -p). |
-h |
β | β | show help and exit. |
--version |
β | β | print version and exit. |
-
text-only: binaries are always skipped (detected via
file --mimeif available; otherwise assumed text). -
git-aware: if inside a git repo,
snapuses:git ls-files -co --exclude-standard
this honors
.gitignoreand includes untracked files. if not a repo, it falls back tofind. -
default ignores (when not using
-a):.git, .hg, .svn, node_modules, vendor, dist, build, .cache, .idea, .vscode, target, bin, out, coverage -
exclude patterns: files matching any
-eexclude pattern are skipped even if they match include patterns. exclude patterns use the same matching rules as include patterns. -
comment removal (with
-r):- strips
//line comments (everything from//to end of line) - strips
/* */block comments (including multi-line) - strips
#line comments (everything from#to end of line) #removal is automatically skipped for document formats:.md,.txt,.rst,.doc,.docx,.rtf,.pdf,.org,.adoc,.asciidocto preserve structure (e.g., markdown headers)- preserves original blank lines (lines that were blank before comment removal)
- removes comment-only lines entirely
- removes trailing whitespace from code lines
- works on any text file
- note: simple pattern-based removal; does not parse strings, so
//,/* */, or#inside string literals will also be removed
- strips
-
blank line removal (with
-w):- removes all blank lines from files
- removes all trailing whitespace after the last non-whitespace character on each line
- can be combined with
-rto remove comments first, then blank lines - useful for maximum token reduction when feeding code to LLMs
-
performance optimization:
- comment removal uses
awkinstead of character-by-character processing for significantly faster performance - parallel processing (with
-j) processes multiple files concurrently usingxargs -P - default is 4 parallel jobs; increase for faster processing of large projects (e.g.,
-j 8or-j 16) - set
-j 0to disable parallelization if needed
- comment removal uses
-
language fences: snapshot wraps each file's content in a markdown code fence with a best-guess language from its extension. this is helpful for LLMs.
-
lean format: snapshots contain no headers, footers, or metadata. the format is just the file path followed by its fenced content, maximizing token efficiency.
-
metrics: after snapping, summary metrics print to stdout (not embedded in the snapshot) including count by extension and skip reasons.
snapper snap -o snapshot.txt '*.go' '**/*.md'the output snapshot.txt is lean and ready for an LLM:
README.md
` ` `markdown
# Project Title
` ` `
internal/service/foo.go
` ` `go
package service
// ...contents...
` ` `snapper snap -o snapshot.txt -e '*_test.go' '*.go'this captures all .go files except those ending in _test.go.
snapper snap -r -o snapshot.txt '*.go' '*.js' '*.py'this captures source files and strips all comments (//, /* */, and #), significantly reducing the token count for LLM context. note that # will be preserved in markdown and other document files automatically.
snapper snap -r -w -o snapshot.txt '*.go' '*.js' '*.py'combining -r and -w provides maximum token reduction by first removing comments, then removing all blank lines and trailing whitespace. this creates the most compact possible snapshot.
snapper snap -r -o snapshot.txt -e '*_test.go' -e '*.pb.go' '*.go'this removes test files and generated protobuf files, then strips comments from the remaining go files.
snapper snap -o snapshot.txt -e '*_test.go' -e '*.pb.go' -e 'mock_*.go' '*.go'this excludes test files, protocol buffer generated files, and mock files.
snapper snap -r -j 8 -o snapshot.txt '**/*.go' '**/*.js' '**/*.py'processing files with 8 parallel jobs significantly speeds up comment removal and file processing on multi-core systems. adjust the number based on your CPU cores (e.g., -j 16 for 16 cores).
use the -t (--tree-only) flag to generate a simple file list. this is useful for giving an LLM a high-level overview of the project architecture.
snapper snap -t -o tree.txt '**/*'for codebases that exceed an LLM's context window, use the -s flag to split the output. this command creates snapshots with 20 files each.
snapper snap -s 20 -o parts.txt '**/*.js'this produces parts.txt, parts-2.txt, parts-3.txt, etc.
to rebuild a project from split files, cat them into the build command via a pipe.
cat parts-*.txt | snapper build -C /tmp/restore -p -i -snapper snap -a -m 1024 -o all.txt '**/*'snapper snap -o snapshot.txt -e 'vendor/**' -e '**/generated/**' -e '*.gen.go' '*.go'- the snapshot file is always written to the
-opath (creating or truncating the file) unless prevented by an existing file without-f. - metrics are printed to stdout as a separate report, e.g.:
== snapper snap metrics ==
version: 0.1.1
project_root: /home/you/myproj
output: /home/you/myproj/snapshot.txt (and subsequent numbered files if split)
files written: 42
comments: removed
blank lines: removed
by extension:
.go: 31
.md: 8
.json: 3
skipped: size=2 binary=1 excluded=5 no_match=117
| code | meaning |
|---|---|
0 |
success |
3 |
snap: output snapshot already exists and -f not provided |
>0 |
other error (bad args, cannot cd, unreadable file, etc.) |
during build, snapper looks for simple blocks of a path on one line followed by a markdown code fence.
<relative/path/to/file.ext>
` ` `<lang?>
<file contents go here>
...
` ` `
- the parser expects the file path to be on its own line.
- content is everything between the opening and closing triple backticks.
- the builder creates parent directories as needed.
- provide structure first: use
snapper snap -tto give the LLM the project's file structure. this helps it understand the overall layout before seeing any code. - remove comments: use
snapper snap -rto strip comments from source files. comments often contain verbose explanations that consume tokens without adding value for code analysis. - maximize compactness: combine
-r -wto remove both comments and blank lines for the smallest possible snapshot. - split large projects: use
snapper snap -s <num>to break large codebases into manageable chunks that fit within your model's context window. - exclude noise: use
-eto skip test files, generated code, or vendor dependencies that aren't relevant to your prompt (e.g.,-e '*_test.go' -e '*.pb.go' -e 'vendor/**'). - keep snapshots lean: use patterns,
-m,-r,-w, and-eto stay within model context limits. - include key configs: e.g.,
/Dockerfile,**/*.yaml,/Makefile,/go.mod,/README.md. - exclude generated or vendor: rely on git-aware mode or keep
-aoff, and use-efor additional filtering.
- snapper never includes binary files. still, review snapshots for secrets before sharing.
- consider adding patterns to exclude secret files (e.g.,
-e '**/*.env' -e '**/secrets.yaml') or rely on.gitignore+ not tracking secrets. - the
-rflag removes comments but does not parse language syntax, so it may inadvertently modify string literals containing//or/* */.
- requires a POSIX shell (
/bin/sh). - uses
gitif available (and inside a repo); otherwisefind,sort,awk,sed,wc,tr, and standard coreutils. - uses
filefor MIME detection if present; otherwise assumes text.
q: why not include binaries behind a flag?
a: to keep snapshots safe and small for LLMs; binaries bloat context and aren't useful in text prompts.
q: how does comment removal work?
a: the -r flag uses a simple pattern-based approach that removes // (line comments), /* */ (block comments), and # (line comments). it does not parse language syntax, so it will also remove these patterns if they appear in string literals. important: # is automatically preserved in document formats (.md, .txt, .rst, etc.) to avoid removing markdown headers and similar structures. for most use cases where you're feeding code to an LLM, this trade-off is acceptable since it significantly reduces token usage.
q: will comment removal break my code?
a: the -r flag is only used during snapshot creation (with snap). it does not modify your original files. if you later use build to recreate files from a snapshot that had comments removed, those files will not have comments, but they should still be syntactically valid code.
q: how do exclude patterns work with include patterns?
a: files are first matched against include patterns, then checked against exclude patterns. if a file matches any exclude pattern, it's skipped regardless of include matches. this allows you to do things like '*.go' -e '*_test.go' to get all go files except tests.
q: can i exclude entire directories?
a: yes, use patterns like -e 'vendor/**' or -e '**/generated/**' to exclude directory trees.
q: how accurate are the language fences?
a: they're heuristic, keyed off file extensions only. unknown extensions fall back to plain fenced code.
q: does -m round by bytes or kilobytes?
a: it's a per-file byte check against KB * 1024 (default 200KB). files larger than the threshold are skipped.
q: can i snapshot from outside the repo root?
a: yes, with -C you can point to any directory; leading / in patterns is relative to that root.
- optional manifest of file hashes to detect drift.
- include/exclude pattern lists via file (e.g.,
.snapperinclude,.snapperexclude). - more language mappings for fences.
- a
--dry-runfor snap. - language-aware comment removal that handles strings properly.
- keep POSIX sh compatibility (avoid bash-only features).
- prefer simple, readable pipelines.
- add tests where possible (e.g., via a portable shell test harness).
- 0.1.1 β BUGFIX: fix
-e(exclude patterns) flag to work correctly. the script was using bashism$'\n'which isn't POSIX sh compatible, causing exclude patterns to be treated as literal strings instead of proper patterns. now uses POSIX-compatible newline handling. - 0.1.0 β MAJOR RELEASE:
snap: massive performance improvements! replace character-by-character comment removal with fastawkprocessing (10-100x faster). add-jflag for parallel file processing (default 4 jobs) to further speed up large projects (additional 3-8x speedup). add-wflag to remove all blank lines and trailing whitespace for maximum token reduction. fix-rto properly preserve original blank lines while removing comment-only lines. combined, these changes make snapper 20-75x faster on typical projects! - 0.0.8 β
snap: add#line comment removal support (shell/Python-style). automatically exempt document formats (.md, .txt, .rst, .doc, .docx, .rtf, .pdf, .org, .adoc, .asciidoc) from#removal to preserve structure like markdown headers. important clarification: the-rflag only affects snapshot content, never modifies original source files. - 0.0.7 β
snap: add-rflag to remove C-style comments (//and/* */) from files before snapshotting, reducing token usage for LLM context. - 0.0.6 β
snap: add-eflag for exclude patterns to skip specific files even if they match include patterns. - 0.0.5 β
snap: add-t(--tree-only) flag to output file paths without content. - 0.0.4 β
snap: remove all headers and footers for a leaner, token-efficient output. add-sflag to split snapshots into multiple files.build: update parser for new format and support reading fromstdin(-i -). - 0.0.3 β
build: add-pto create build root;-ipath resolved to absolute before-C; fence parsing fix; CRLF-tolerant reader. - 0.0.2 β
build: parsing order fix for closing fence; snapshot path absolutized prior to-C. - 0.0.1 β initial release:
snap(git-aware),build, text-only, per-file size limit, metrics, language fences.