This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
lc (licensechecker) is a CLI tool that recursively scans directories to identify software licenses in files. It uses SPDX license identification with multiple detection strategies: SPDX header parsing, filename matching, and content analysis (keyword matching, vector space, Levenshtein distance). Written in Go, currently at v2.0.0 alpha.
go build # Build binary
go test -cover -race ./... # Run all tests with coverage and race detection
go test -v -run TestName ./processor/ # Run a single test
gofmt -s -w ./.. # Format code
golangci-lint run --enable gofmt ./... # Lint
./check.sh # Full verification (fmt, test, lint, race, cross-compile)./generate_database.sh # Build DB, copy JSON, run go generate, testThis builds assets/database/, produces database_keywords.json, then go generate (via scripts/include.go) embeds it as base64 in processor/constants.go.
Entry point: main.go — Cobra CLI that creates processor.NewProcess(".") and calls StartProcess().
processor/ package (active v2 code):
processor.go— Orchestrator: walks files viagocodewalker, reads content (max 100KB), routes through detection pipelinedetector_spdx.go— ParsesSPDX-License-Identifier:headers from source files (100% confidence)detector_license.go— Detects licenses in dedicated license files (LICENSE, COPYING, etc.) using filename regex matchingguesser.go—LicenceGuesserinterface and framework; two instances: common licenses (fast path) and full databaseguesser_keyword.go— Keyword-based license detection using Aho-Corasickguesser_vectorspace.go— TF-IDF vector space similarity matchingguesser_blended.go— Combines keyword + vector space scoresconstants.go— Auto-generated; contains base64-encoded license database (do not edit manually)structs.go— Core data types (FileResult,LicenseMatch, etc.)common.go— Shared utilities and compiled regex patterns for license filename detection
parsers/ and pkg/ — Legacy v1 code, deprecated and scheduled for removal.
assets/database/ — Database builder that processes 425+ SPDX license definition files into database_keywords.json.
- Check if file is binary (null byte detection) — skip unless
--binaryflag - For files matching license filename patterns (license, copying, mit, apache, etc.): run through
LicenceGuesser(keyword → vector space → blended) - For all other files: scan for
SPDX-License-Identifier:headers