CLAUDE.md

Instructions for Claude Code working on this Terraform Provider for Proxmox VE.

Critical Rules

Never violate these — they cause bugs, test failures, or provider misbehavior.

Never Do	Reason
Start work without a GitHub issue	All work must be tracked
Make assumptions without verification	Always verify with code/tests/mitmproxy
Skip acceptance tests	Tests reproduce and verify fixes
Commit without running linter	Always `make lint` first
Commit without explicit user request	User controls git operations
Add changes beyond what's requested	Only implement what's asked

Always Do	Reason
Verify GitHub issue exists first	No issue = flag deficiency, offer to help
Ask questions when uncertain	Never assume; clarify before proceeding
Create acceptance test BEFORE fixing	Proves issue exists, proves fix works
Verify API calls with mitmproxy	Tests passing ≠ correct API calls
Maintain session state for multi-step work	Enables context recovery across sessions
Run full checklist before completion	See Production Readiness Checklist

GitHub Issue Requirements

All work on fixes or features MUST have a corresponding GitHub issue.

Before Starting Work

Verify issue exists — Search for an existing issue
If no issue exists — Flag deficiency, do NOT proceed

When No Issue Exists

Flag this to the user:

"No GitHub issue found for this work. All fixes and features must be tracked with an issue before implementation begins."

Then offer to help create one:

Ask: "Would you like me to help draft a GitHub issue?"
Determine type: Bug or Feature/Enhancement
Draft content following the template structure
Provide draft for user to submit at: https://github.com/bpg/terraform-provider-proxmox/issues/new/choose
Wait for issue number before proceeding

Naming Conventions

Artifact	Format	Example
Branch	`{type}/{issue}-{desc}`	`fix/1234-clone-timeout`
Plans	`.dev/{issue}_PLAN.md`	`.dev/1234_PLAN.md`
PR body	`.dev/{issue}_PR_BODY.md`	`.dev/1234_PR_BODY.md`
Session state	`.dev/{issue}_SESSION_STATE.md`	`.dev/1234_SESSION_STATE.md`
Test names	Descriptive, NO issue numbers	`TestAccResourceVMClone`
VM names	Descriptive, NO issue numbers	`test-vm-clone`
Commits	Conventional, NO issue numbers	`fix(vm): handle clone timeout`

Quick Reference

Essential Commands

make build              # Build provider binary
make lint               # Run Go linter (auto-fixes formatting and most issues)
make test               # Run unit tests
make docs               # Regenerate Framework resource/datasource docs (not SDK)
./testacc TestName      # Run specific acceptance test
npx --yes markdownlint-cli2 --fix "path/to/*.md"  # Lint markdown files

Linting Rules

Never manually format or lint code. Always use the appropriate linter tool.

File type	Linter command	When to run
Go `.go`	`make lint`	After editing any `.go` file
Markdown	`npx --yes markdownlint-cli2 --fix "file.md"`	After editing any `.md` file

Acceptance Test Script (`./testacc`)

./testacc TestAccResourceVM           # Run single test
./testacc "TestAccResource.*"         # Run tests matching pattern
./testacc --tier light                # Light tests only (~30s)
./testacc --tier medium               # Medium tests only (~3 min)
./testacc --tier heavy                # Heavy tests only (~15 min)
./testacc --tier light,medium         # Combine tiers
./testacc --tier all                  # All tiers with smart parallelism (~15 min)
./testacc --resource vm               # All VM-related tests
./testacc --resource sdn              # All SDN tests
./testacc --no-proxy TestName         # Run without mitmproxy
./testacc TestName -- -count 2        # Pass flags through to go test

Test Tiers

Tests are classified via //testacc:tier=X annotations in test files:

Tier	Description	Parallelism	Time
light	API-only, no VMs or containers	-p 8	~30s
medium	Simple VMs with unique IDs	-p 4	~3 min
heavy	Cloud images, shared state	-p 1	~15 min

Resource targeting via //testacc:resource=X annotations: vm, container, firewall, sdn, file, pool, acme, access, backup, ha, hardwaremapping, metrics, options, replication, apt, datastores, storage, network, misc

Requires testacc.env with:

TF_ACC=1
PROXMOX_VE_API_TOKEN="root@pam!<token>=<value>"
PROXMOX_VE_ENDPOINT="https://<host>:8006/"
PROXMOX_VE_SSH_AGENT="true"
PROXMOX_VE_SSH_USERNAME="root"
# Optional: PROXMOX_VE_ACC_NODE_NAME, PROXMOX_VE_ACC_NODE_SSH_ADDRESS, etc.

Production Readiness Checklist

Run /bpg:ready to execute automatically.

make build — Must pass
make lint — Must show 0 issues
make test — All unit tests pass
./testacc TestAccYourFeature — Acceptance tests pass
/bpg:debug-api — Verify API calls with mitmproxy
make docs — Regenerate Framework docs if schema changed
/bpg:prepare-pr — Generate PR body from template

Commit Guidelines

See CONTRIBUTING.md. Key rules:

Format: {type}({scope}): {description}
Types: feat, fix, chore
Scopes: vm, lxc, provider, core, docs, ci
Lowercase, no period, under 72 chars, NO issue numbers
DCO sign-off required: use git commit -s (adds Signed-off-by line)

Agent Development Practices

Parallel Agents

Use parallel agents for independent tasks to speed up work:

Good candidates for parallel execution:

Research tasks (explore different parts of codebase simultaneously)
Running independent test suites
Searching for patterns across different directories
Gathering context from multiple unrelated files

Not suitable for parallel execution:

Tasks with dependencies (B needs output of A)
File modifications (risk of conflicts)
Sequential workflows (test → fix → verify)

How to request: Ask for agents to run "in parallel" explicitly.

State Persistence

LLMs have no memory between sessions. Externalize state to files:

Session state file — The agent's memory across context resets
Update before ANY context switch — End of session, new task, long operation
Write "next action" for a stranger — Assume no prior context

Track Decisions, Not Just Actions

User decisions — Never re-ask; record in session state
Agent assumptions — Make explicit; mark verified/rejected
Reasoning — "Why" matters more than "what"

Hypothesis-Driven Debugging

Form hypothesis → test → record result
Prevents circular debugging across sessions
Use "Hypotheses Tested" table in session state

Minimize Re-exploration

Cache code patterns and file locations in session state
Record dead ends so they're not re-explored
Note key file:line references for quick restoration

Atomic Commits

Each commit = working, resumable state
If session dies mid-work, resume from last commit

Proof Over Trust

"Tests pass" ≠ correct behavior
Verify with mitmproxy when available, OR use behavioral assertions in tests (uptime checks, API status queries) to prove the behavior change
Include evidence in PR proof of work section

Context Window Management

For long-running tasks:

Checkpoint frequently — Update session state after every successful test run
Summarize completed work — Don't keep raw exploration in context; distill findings
Chunk large changes — Break into atomic commits to create resume points
Use /bpg:resume — Start new sessions by loading session state, not from memory

Error Recovery

When things go wrong:

Test failures — Record in session state, add to "Hypotheses Tested", don't mark complete
API errors — Capture in mitmproxy log, document in session state
Context loss — Always resume from session state file using /bpg:resume
Blocked work — Update session status to "Blocked", document blocker, move to next task

Session Handoff

When handing off work:

To another agent — Ensure "Quick Context Restore" is complete and current
To human — Create PR using /bpg:prepare-pr, reference session state location
From human — Use /bpg:resume, ask about any "Unverified" assumptions

Project Architecture

Prerequisites

Go 1.25+ required
golangci-lint 2.8.0 — installed automatically by make lint
Line length limit: 150 characters (enforced by linter)
Comment line wrap: ~120 characters (not 70–80; the linter allows 150, so narrow wrapping wastes vertical space)

Overview

Dual-provider: SDK v2 (proxmoxtf/) and Plugin Framework (fwprovider/)
New features: Framework only; SDK is feature-frozen

Directory Structure

├── proxmox/           # Shared API client
│   └── retry/         # Unified retry logic (TaskOperation, APICallOperation, PollOperation)
├── fwprovider/        # Framework provider ← NEW CODE HERE
│   ├── test/          # Shared test utilities and acceptance tests
│   ├── config/        # Provider configuration types (Resource, DataSource)
│   ├── attribute/     # Attribute helpers (ResourceID, CheckDelete, IsDefined)
│   ├── types/         # Custom attribute types (stringset, etc.)
│   └── validators/    # Custom validators
├── proxmoxtf/         # Legacy SDK provider (feature-frozen)
├── utils/             # Shared utilities (maps, sets, strings, IP)
├── .dev/              # Development tools, plans, and session files
├── example/           # Example Terraform configurations
├── templates/         # Doc templates for Framework resources/datasources
└── docs/              # Provider documentation (mixed: see Documentation section)

API Client

proxmox.Client
├── Node(name) → nodes.Client
├── Cluster() → cluster.Client
├── Access() → access.Client
├── Pool() → pools.Client
├── Storage() → storage.Client
├── Version() → version.Client
├── API() → api.Client (raw HTTP)
└── SSH() → ssh.Client

Development Workflow

Fixing Issues

Verify GitHub issue exists — Flag deficiency if not
Create branch: fix/{issue}-description
Create session state: .dev/{issue}_SESSION_STATE.md
Create acceptance test that reproduces the issue
Verify test fails with current code
Implement fix
Verify test passes
Run linter: make lint
Verify with mitmproxy
Complete checklist

Adding Features

Verify GitHub issue exists — Flag deficiency if not
Create branch: feat/{issue}-description
Create session state: .dev/{issue}_SESSION_STATE.md
Implement in Framework provider only (fwprovider/)
Add validation, acceptance tests, documentation
Complete checklist

Code Patterns

Framework (fwprovider/)

Each resource has 3 files: resource_*.go (CRUD), *_model.go (API mapping), resource_*_test.go (acceptance tests). Client access flows through config.Resource → cfg.Client.Domain().SubClient().

schema.StringAttribute{
    Required: true,
    Validators: []validator.String{
        stringvalidator.OneOf("a", "b"),
    },
}
resp.Diagnostics.AddError("Unable to Create Resource", err.Error())

Error diagnostic conventions: New code should use "Unable to [Action] [Resource]" format (see ADR-005). Include the resource name/ID in the summary (e.g., fmt.Sprintf("Unable to Read VM %q", name)) — domain clients do not reliably include it in err.Error(). No trailing period. Pass err.Error() as the detail string — never double-wrap. Legacy prefixes ("Could not", "Error") are acceptable in existing code.

Datasource Schema Attributes

In a datasource, attributes that are purely output (populated by the provider during Read) must be Computed: true only — never Optional. This applies to all attributes except lookup keys (which are Required).

Attribute role	Schema flags	Example
Lookup key	`Required: true`	`id`, `node_name`
Read-only output	`Computed: true`	`name`, `status`, `tags`, `cpu` block

Why not Optional on outputs? Optional on a datasource output lets users write values in config that are silently ignored — misleading UX and confusing docs (attributes appear under "Optional" instead of "Read-Only").

Nil API values in Computed fields: After Read, Computed attributes must have a known value — null means "unknown" which is only valid during planning. Convert nil API pointers to sensible defaults: "" for strings, false for bools, empty collections for sets/maps. Use types.StringValue("") instead of types.StringPointerValue(nil).

Nested blocks in datasources (e.g., cpu, vga, rng): The datasource should have its own DataSourceSchema() with Computed: true on the block and all inner attributes. Do not reuse ResourceSchema() which has Optional: true, Computed: true for resource write semantics.

Comma-Separated API Values

When the Proxmox API uses comma-separated strings (e.g., vmid=100,101,102), always expose them as Terraform list or set attributes — never as raw comma-separated strings. Convert in toAPI() (join) and fromAPI() (split). See ADR-004 for details and code examples.

Retry Patterns (proxmox/retry/)

Three operation types — choose based on the API call pattern:

// Async UPID tasks (create, clone, delete, start):
op := retry.NewTaskOperation("name", retry.WithRetryIf(retry.IsTransientAPIError))
op.DoTask(ctx, dispatchFn, waitFn)

// Synchronous blocking calls (PUT /config):
op := retry.NewAPICallOperation("name", retry.WithRetryIf(retry.ErrorContains("got timeout")))
op.Do(ctx, fn)

// Polling loops (wait for status, config unlock):
op := retry.NewPollOperation("name", retry.WithRetryIf(func(err error) bool { ... }))
op.DoPoll(ctx, fn)

Delete predicate trap: ErrResourceDoesNotExist can arrive via HTTP 500, so IsTransientAPIError alone will match it. Delete operations must combine predicates:

retry.WithRetryIf(func(err error) bool {
    return retry.IsTransientAPIError(err) && !errors.Is(err, api.ErrResourceDoesNotExist)
})

See ADR-005: Error Handling for full details.

SDK (proxmoxtf/) — Legacy Only

"key": {
    Type:     schema.TypeString,
    Required: true,
    ValidateDiagFunc: validation.ToDiagFunc(
        validation.StringInSlice([]string{"a", "b"}, false)),
}

When fixing validation issues, update BOTH providers where applicable.

Testing Notes

VMs with started = true need boot disk with cloud image; use stop_on_destroy = true
Naming: Descriptive names only, NO issue numbers
API verification: Use /bpg:debug-api for mitmproxy workflow
Behavioral assertions: When verifying side effects (reboots, state changes), use direct API checks in test check functions rather than relying only on Terraform state attributes. Example: use te.NodeClient().VM(vmID).GetVMStatus(ctx) to check uptime before/after to detect reboots. See resource_vm_hotplug_test.go and resource_vm_disks_test.go for patterns.
TDD acceptance tests: Tests MUST actually fail without the fix. If a test passes both with and without the fix, it doesn't prove anything — add behavioral assertions (uptime, status, API checks) that detect the actual behavior change.
Functional coverage: Tests must cover ALL major use cases for the resource — not just one happy path. Different input modes (e.g., all vs vmid vs pool), list attributes with multiple elements, compound fields, nested objects, and import round-trips must each have test scenarios. PRs with insufficient functional coverage will be rejected. See ADR-006.

Documentation

Docs under docs/ are a mix of auto-generated and manually maintained files.

Provider	Docs generation	Edit where
Framework (`fwprovider/`)	Auto-generated by `make docs` from schema + optional `templates/` overrides	Edit `templates/resources/<name>.md.tmpl` (or `templates/data-sources/<name>.md.tmpl`). If no custom template exists, docs come from the schema `MarkdownDescription` fields in Go code.
SDK (`proxmoxtf/`)	Manually maintained	Edit `docs/` files directly

Key rules:

make docs only regenerates Framework resource/datasource docs and guides with templates; SDK docs are untouched
Manual edits to docs/ files for Framework resources will be lost on make docs — always edit the template or schema description instead
Manual edits to docs/ files for SDK resources are safe — they are the source of truth
Custom templates in templates/ override default tfplugindocs generation for specific Framework resources

Guides use two patterns:

Pattern	Source of truth	Examples
A (template-driven)	`templates/guides/<name>.md.tmpl` with `{{ codefile }}` directives; examples in `examples/guides/<name>/`	`clone-vm`, `vm-lifecycle`
B (direct markdown)	`docs/guides/<name>.md` edited directly; inline HCL blocks	`multi-node`, `upgrade`, `migration-vm-clone`, `cloned-vm`

For Pattern A guides, edit the template — docs/guides/<name>.md is auto-generated by make docs and will be overwritten.

Session Management

For multi-step work, maintain session state using .dev/SESSION_STATE_TEMPLATE.md.

Location: .dev/{issue}_SESSION_STATE.md

Key sections to maintain:

Quick Context Restore — For fast agent bootstrap
User Decisions — Prevent re-asking
Assumptions Made — Track verification status
Context Gathered — Save re-reading files
Hypotheses Tested — For debugging sessions

Update triggers:

Before ending session
Before context-heavy operations
After completing a phase
When blocked or switching tasks

Communication Style

Do	Don't
Be concise and direct	Apologize
Use technical terminology	Summarize changes made
Explain reasoning	Make up information
Admit uncertainty	Show implementation unless asked

Skills

Skill	Purpose
`/bpg:start-issue`	Start work on a GitHub issue (branch + session state)
`/bpg:resume`	Resume work from a previous session
`/bpg:ready`	Run production readiness checklist
`/bpg:debug-api`	Debug API calls with mitmproxy
`/bpg:prepare-pr`	Prepare PR body from template with proof of work

See .dev/README.md for detailed workflow documentation and how skills connect together.

References

CONTRIBUTING.md — Contributing guide
docs/adr/ — Architecture Decision Records and reference examples
.dev/DEBUGGING.md — Debugging guide
.dev/SESSION_STATE_TEMPLATE.md — Session template
Proxmox API
Terraform Plugin Framework

Uh oh!

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Critical Rules

GitHub Issue Requirements

Before Starting Work

When No Issue Exists

Naming Conventions

Quick Reference

Essential Commands

Linting Rules

Acceptance Test Script (./testacc)

Test Tiers

Production Readiness Checklist

Commit Guidelines

Agent Development Practices

Parallel Agents

State Persistence

Track Decisions, Not Just Actions

Hypothesis-Driven Debugging

Minimize Re-exploration

Atomic Commits

Proof Over Trust

Context Window Management

Error Recovery

Session Handoff

Project Architecture

Prerequisites

Overview

Directory Structure

API Client

Development Workflow

Fixing Issues

Adding Features

Code Patterns

Framework (fwprovider/)

Datasource Schema Attributes

Comma-Separated API Values

Retry Patterns (proxmox/retry/)

SDK (proxmoxtf/) — Legacy Only

Testing Notes

Documentation

Session Management

Communication Style

Skills

References

Acceptance Test Script (`./testacc`)