Skip to content

Commit 9102f33

Browse files
authored
feat(loader): implement persistent metadata cache (#6630)
* feat(loader): implement persistent metadata cache for template filtering optimization. Introduce a new template metadata indexing system with persistent caching to dramatically improve template loading perf when filters are applied. The implementation adds a new index pkg that caches lightweight template metadata (ID, tags, authors, severity, .etc) and enables filtering templates before expensive YAML parsing occurs. The index uses an in-memory LRU cache backed by `otter` pkg for efficient memory management with adaptive sizing based on entry weight, defaulting to approx. 40MB for 50K templates. Metadata is persisted to disk using gob encoding at "~/.cache/nuclei/index.gob" with atomic writes to prevent corruption. The cache automatically invalidates stale entries using `ModTime` to detect file modifications, ensuring metadata freshness w/o manual intervention. Filtering has been refactored from the previous `TagFilter` and `PathFilter` approach into a unified `index.Filter` type that handles all basic filtering ops including severity, authors, tags, template IDs with wildcard support, protocol types, and path-based inclusion and exclusion. The filter implements OR logic within each field type and AND logic across different field types, with exclusion filters taking precedence over inclusion filters and forced inclusion via `IncludeTemplates` and `IncludeTags` overriding exclusions. The `loader` integration creates an index filter from store configuration via `buildIndexFilter` and manages the cache lifecycle through `loadTemplatesIndex` and `saveTemplatesIndex` methods. When `LoadTemplatesOnlyMetadata` or `LoadTemplatesWithTags` is called, the system first checks the metadata cache for each template path. If cached metadata exists and passes validation, the filter is applied directly against the metadata without parsing. Only templates matching the filter criteria proceed to full YAML parsing, resulting in significant performance gains. Advanced filtering via "-tc" flag (`IncludeConditions`) still requires template parsing as these are expression-based filters that cannot be evaluated from metadata alone. The `TagFilter` has been simplified to handle only `IncludeConditions` while all other filtering ops are delegated to the index-based filtering system. Cache management is fully automatic with no user configuration required. The cache gracefully handles errors by logging warnings & falling back to normal op w/o caching. Cache files use schema versioning to invalidate incompatible cache formats across nuclei updates (well, specifically `Index` and `Metadata` changes). This optimization particularly benefits repeated scans with the same filters, CI/CD pipelines running nuclei regularly, development and testing workflows with frequent template loading, and any scenario with large template collections where filtering would exclude most templates. * test(loader): adds `BenchmarkLoadTemplates{,OnlyMetadata}` benchs Signed-off-by: Dwi Siswanto <git@dw1.io> * ci: cache nuclei-templates index Signed-off-by: Dwi Siswanto <git@dw1.io> * chore(index): satisfy lints Signed-off-by: Dwi Siswanto <git@dw1.io> * fix(index): correct metadata filter logic for proper template matching. The `filter.matchesIncludes()` was using OR logic across different filter types, causing incorrect template matching. Additionally, ID matching was case-sensitive, failing to match patterns like 'CVE-2021-*'. The filter now correctly implements: (author1 OR author2) AND (tag1 OR tag2) AND (severity1 OR severity2) - using OR within each filter type and AND across different types. Signed-off-by: Dwi Siswanto <git@dw1.io> * test(index): resolve test timing issue in CI environments. Some test was failing in CI due to filesystem timestamp resolution limitations. On filesystems with 1s ModTime granularity (common in CI), modifying a file immediately after capturing its timestamp resulted in identical ModTime values, causing IsValid() to incorrectly return true. Signed-off-by: Dwi Siswanto <git@dw1.io> * ci: cache nuclei with composite action Signed-off-by: Dwi Siswanto <git@dw1.io> * fix(index): file locking issue on Windows during cache save/load. Explicitly close file handles before performing rename/remove ops in `Save` and `Load` methods. * In `Save`, close temp file before rename. * In `Load`, close file before remove during error handling/version mismatch. Signed-off-by: Dwi Siswanto <git@dw1.io> * test(index): flaky index tests on Windows Fix path separator mismatch in `TestCacheSize` and `TestCachePersistenceWithLargeDataset` by using `filepath.Join` consistently instead of hardcoded forward slashes. Signed-off-by: Dwi Siswanto <git@dw1.io> * test(cmd): init logger to prevent nil pointer deref The integration tests were panicking with a nil pointer dereference in `pkg/catalog/loader` because the logger was not init'ed. When `store.saveMetadataIndexOnce` attempted to log the result of the metadata cache op, it dereferenced the nil logger, causing a crash. Signed-off-by: Dwi Siswanto <git@dw1.io> * fix(loader): resolve include/exclude paths for metadata cache filter. The `indexFilter` was previously init'ed using raw relative paths from the config for `IncludeTemplates` and `ExcludeTemplates`. But the persistent metadata cache stores templates using their absolute paths. This mismatch caused the `matchesPath` check to fail, leading to templates being incorrectly excluded even when explicitly included via flags (e.g., "-include-templates loader/excluded-template.yaml"). This commit updates `buildIndexFilter` to resolve these paths to their absolute versions using `store.config.Catalog.GetTemplatesPath` before creating the filter, ensuring consistent path matching against the metadata cache. Signed-off-by: Dwi Siswanto <git@dw1.io> * feat(index): adds `NewMetadataFromTemplate` func Signed-off-by: Dwi Siswanto <git@dw1.io> * refactor(index): return metadata when `(*Index).cache` is nil Signed-off-by: Dwi Siswanto <git@dw1.io> * refactor(loader): restore pre‑index behavior semantics Signed-off-by: Dwi Siswanto <git@dw1.io> --------- Signed-off-by: Dwi Siswanto <git@dw1.io>
1 parent 3dab87b commit 9102f33

File tree

13 files changed

+2328
-32
lines changed

13 files changed

+2328
-32
lines changed

.github/workflows/generate-pgo.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
- uses: actions/checkout@v6
3232
- uses: projectdiscovery/actions/setup/git@v1
3333
- uses: projectdiscovery/actions/setup/go@v1
34+
- uses: projectdiscovery/actions/cache/nuclei@v1
3435
- name: Generate list
3536
run: for i in {1..${{ matrix.targets }}}; do echo "https://honey.scanme.sh/?_=${i}" >> "${LIST_FILE}"; done
3637
# NOTE(dwisiswant0): use `-no-mhe` flag to get better samples.

.github/workflows/perf-regression.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ jobs:
1414
- uses: actions/checkout@v6
1515
- uses: projectdiscovery/actions/setup/go@v1
1616
- uses: projectdiscovery/actions/cache/go-rod-browser@v1
17+
- uses: projectdiscovery/actions/cache/nuclei@v1
1718
- run: make build-test
1819
- run: ./bin/nuclei.test -test.run - -test.bench=. -test.benchmem ./cmd/nuclei/ | tee $BENCH_OUT
1920
env:

.github/workflows/tests.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ jobs:
3939
- uses: actions/checkout@v6
4040
- uses: projectdiscovery/actions/setup/go@v1
4141
- uses: projectdiscovery/actions/cache/go-rod-browser@v1
42+
- uses: projectdiscovery/actions/cache/nuclei@v1
4243
- uses: projectdiscovery/actions/free-disk-space@v1
4344
with:
4445
llvm: 'false'
@@ -66,6 +67,7 @@ jobs:
6667
- uses: actions/checkout@v6
6768
- uses: projectdiscovery/actions/setup/go@v1
6869
- uses: projectdiscovery/actions/cache/go-rod-browser@v1
70+
- uses: projectdiscovery/actions/cache/nuclei@v1
6971
- name: "Simple"
7072
run: go run .
7173
working-directory: examples/simple/
@@ -88,6 +90,7 @@ jobs:
8890
steps:
8991
- uses: actions/checkout@v6
9092
- uses: projectdiscovery/actions/setup/go@v1
93+
- uses: projectdiscovery/actions/cache/nuclei@v1
9194
- uses: projectdiscovery/actions/setup/python@v1
9295
- uses: projectdiscovery/actions/cache/go-rod-browser@v1
9396
- run: bash run.sh "${{ matrix.os }}"
@@ -108,6 +111,7 @@ jobs:
108111
steps:
109112
- uses: actions/checkout@v6
110113
- uses: projectdiscovery/actions/setup/go@v1
114+
- uses: projectdiscovery/actions/cache/nuclei@v1
111115
- uses: projectdiscovery/actions/setup/python@v1
112116
- uses: projectdiscovery/actions/cache/go-rod-browser@v1
113117
- run: bash run.sh

cmd/integration-test/library.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import (
1515
"github.com/logrusorgru/aurora"
1616
"github.com/pkg/errors"
1717
"github.com/projectdiscovery/goflags"
18+
"github.com/projectdiscovery/gologger"
1819
"github.com/projectdiscovery/nuclei/v3/pkg/catalog/config"
1920
"github.com/projectdiscovery/nuclei/v3/pkg/catalog/disk"
2021
"github.com/projectdiscovery/nuclei/v3/pkg/catalog/loader"
@@ -70,6 +71,7 @@ func executeNucleiAsLibrary(templatePath, templateURL string) ([]string, error)
7071

7172
defaultOpts := types.DefaultOptions()
7273
defaultOpts.ExecutionId = "test"
74+
defaultOpts.Logger = gologger.DefaultLogger
7375

7476
mockProgress := &testutils.MockProgressClient{}
7577
reportingClient, err := reporting.New(&reporting.Options{ExecutionId: defaultOpts.ExecutionId}, "", false)

go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ require (
8787
github.com/leslie-qiwa/flat v0.0.0-20230424180412-f9d1cf014baa
8888
github.com/lib/pq v1.10.9
8989
github.com/mattn/go-sqlite3 v1.14.28
90+
github.com/maypok86/otter/v2 v2.2.1
9091
github.com/mholt/archives v0.1.5
9192
github.com/microsoft/go-mssqldb v1.9.2
9293
github.com/ory/dockertest/v3 v3.12.0

go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,8 @@ github.com/mattn/go-sqlite3 v1.14.28 h1:ThEiQrnbtumT+QMknw63Befp/ce/nUPgBPMlRFEu
707707
github.com/mattn/go-sqlite3 v1.14.28/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
708708
github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0=
709709
github.com/matttproud/golang_protobuf_extensions v1.0.4/go.mod h1:BSXmuO+STAnVfrANrmjBb36TMTDstsz7MSK+HVaYKv4=
710+
github.com/maypok86/otter/v2 v2.2.1 h1:hnGssisMFkdisYcvQ8L019zpYQcdtPse+g0ps2i7cfI=
711+
github.com/maypok86/otter/v2 v2.2.1/go.mod h1:1NKY9bY+kB5jwCXBJfE59u+zAwOt6C7ni1FTlFFMqVs=
710712
github.com/mholt/acmez v1.2.0 h1:1hhLxSgY5FvH5HCnGUuwbKY2VQVo8IU7rxXKSnZ7F30=
711713
github.com/mholt/acmez v1.2.0/go.mod h1:VT9YwH1xgNX1kmYY89gY8xPJC84BFAisjo8Egigt4kE=
712714
github.com/mholt/archives v0.1.5 h1:Fh2hl1j7VEhc6DZs2DLMgiBNChUux154a1G+2esNvzQ=

0 commit comments

Comments
 (0)