Add X Bookmarks workspace and core library (OAuth, X API, grouping tools)#22
Add X Bookmarks workspace and core library (OAuth, X API, grouping tools)#22
Conversation
WalkthroughThis PR introduces a complete X Bookmarks workspace within a monorepo structure, adding OAuth2 PKCE authentication, bookmark synchronization from X API, and folder grouping proposal/approval workflows. It includes an auth server, sync/proposal scripts, shared core utilities, configuration management, and documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AuthServer as Auth Server
participant XAPI as X OAuth<br/>Endpoint
participant Callback as Callback Handler
participant Storage as Token Storage
User->>AuthServer: GET /auth/login
AuthServer->>AuthServer: Generate PKCE pair<br/>(verifier, challenge, state)
AuthServer->>Storage: Save {state, verifier, createdAt}
AuthServer-->>User: 302 Redirect to X /authorize<br/>(client_id, challenge, state, ...)
User->>XAPI: GET /authorize
User->>XAPI: User grants permission
XAPI-->>Callback: 302 Redirect /callback<br/>?code=...&state=...
Callback->>Storage: Load saved {state, verifier}
Callback->>Callback: Validate state match
Callback->>XAPI: POST /token<br/>(code, verifier, client_id, ...)
XAPI-->>Callback: {access_token, refresh_token, expires_in}
Callback->>Storage: Save tokens
Callback->>XAPI: GET /users/me
XAPI-->>Callback: {id, name, username}
Callback-->>User: 200 Success<br/>(user details + token path)
sequenceDiagram
participant User
participant SyncScript as Sync Script
participant XClient as X API Client
participant FileSystem as File System
participant Storage as Vault Storage
User->>SyncScript: pnpm sync
SyncScript->>Storage: Load tokens from data/tokens.json
SyncScript->>SyncScript: Check token expiration<br/>Refresh if needed
SyncScript->>XClient: getMe()
XClient->>FileSystem: (validate auth)
FileSystem-->>XClient: ✓
XClient-->>SyncScript: {userId, name}
SyncScript->>XClient: getBookmarkFolders(userId)
XClient-->>SyncScript: [folders...]
loop For each folder
SyncScript->>XClient: getBookmarksByFolder(userId, folderId)
XClient-->>SyncScript: [bookmarks...]
loop For each bookmark
SyncScript->>SyncScript: Compute save directory<br/>(vault + date bucket + mapping)
SyncScript->>FileSystem: ensureDir()
FileSystem-->>SyncScript: ✓
SyncScript->>FileSystem: writeJsonFile(bookmark.md)
FileSystem-->>SyncScript: ✓
end
SyncScript->>FileSystem: buildFolderIndexMarkdown()
SyncScript->>FileSystem: writeJsonFile(_index.md)
end
SyncScript->>FileSystem: writeJsonFile(x_folder_stats.json)
SyncScript-->>User: Summary: N posts synced
sequenceDiagram
participant User
participant ProposeScript as Propose Script
participant Analysis as Analysis Engine
participant FileSystem as File System
User->>ProposeScript: pnpm propose:grouping
ProposeScript->>FileSystem: Load analysis/x_folder_stats.json
FileSystem-->>ProposeScript: {folders: {name: count, ...}}
ProposeScript->>Analysis: detectFolderGroupings([folder_names])
Analysis->>Analysis: Tokenize folders<br/>(remove dates, split on separators)
Analysis->>Analysis: Group by prefix/suffix tokens
Analysis->>Analysis: Validate: length≥2,<br/>not stopword, ≥3 children
Analysis-->>ProposeScript: {proposals, skipped, timestamp}
ProposeScript->>FileSystem: renderGroupingProposalMarkdown(analysis)
FileSystem-->>ProposeScript: markdown_text
ProposeScript->>FileSystem: writeJsonFile(analysis_YYYYMMDD.md)
ProposeScript->>FileSystem: writeJsonFile(analysis_YYYYMMDD.json)
ProposeScript-->>User: Generated proposal<br/>(no mapping yet)
User->>User: Review proposal
User->>User: pnpm approve:grouping
ProposeScript->>FileSystem: Load latest analysis.json
FileSystem-->>ProposeScript: FolderGroupingAnalysis
ProposeScript->>FileSystem: writeFolderMapping(analysis)
FileSystem-->>ProposeScript: x_folder_mapping.json
ProposeScript-->>User: Mapping approved<br/>(ready for next sync)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
This step only runs tsc against the root tsconfig.json (TypeScript help says plain tsc compiles the project in the current working directory), so the new apps/* projects are never validated in CI. In this commit, apps/sync already fails its own config (pnpm exec tsc -p apps/sync/tsconfig.json reports TS2379 errors), which means CI can pass while workspace build/typecheck are broken. Please run recursive workspace type checks (e.g., pnpm run typecheck) in this workflow.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| const refreshed = await refreshAccessToken(config, tokens.refresh_token); | ||
| await saveTokens(config.tokensPath, refreshed); |
There was a problem hiding this comment.
Preserve refresh token when rotating access tokens
When a token refresh succeeds, the code persists only the response payload; if the OAuth server omits refresh_token on refresh, this overwrites the stored credentials without any refresh token, and the next expiry will hard-fail with refresh_token is missing until the user re-runs the full auth flow. Keep the previous tokens.refresh_token when the refreshed payload does not include one before calling saveTokens.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (12)
packages/core/src/fs/file-utils.ts (1)
27-35: Cover reserved cross-platform filenames.The sanitizer can still return names such as
CON,NUL, or names ending in., which fail on Windows-backed vaults. Since this is used for Obsidian output paths, it is worth hardening.🧹 Proposed hardening
export function sanitizeFileName(input: string): string { const normalized = input .normalize("NFKC") .replace(/[\\/:*?"<>|]/g, "-") .replace(/[\u0000-\u001f]/g, "") .replace(/\s+/g, " ") .trim(); - return normalized.length > 0 ? normalized : "untitled"; + const safe = normalized.replace(/[. ]+$/g, ""); + const fallback = safe.length > 0 ? safe : "untitled"; + return /^(con|prn|aux|nul|com[1-9]|lpt[1-9])(\..*)?$/i.test(fallback) + ? `_${fallback}` + : fallback; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/fs/file-utils.ts` around lines 27 - 35, The sanitizeFileName function needs to also guard against Windows reserved device names and names ending in dots/spaces; update sanitizeFileName to (1) remove trailing dots and trailing spaces from the normalized result, (2) detect Windows reserved names (case-insensitive list: CON, PRN, AUX, NUL, COM1..COM9, LPT1..LPT9) both as bare names and when they appear as the stem before an extension (e.g., "CON.txt"), and (3) if a reserved name or empty result remains, replace it with a safe fallback like "untitled" (or append/replace with an underscore) so Obsidian/Windows-backed vaults won’t fail; modify the logic inside sanitizeFileName to perform these checks after the existing normalization and trimming steps.apps/auth/package.json (1)
1-4: Mark this app workspace as private.Root
"private": truedoes not protect child workspace packages from being published directly or through recursive publish commands.🔧 Proposed fix
{ "name": "auth", + "private": true, "version": "1.0.0", "type": "module",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/auth/package.json` around lines 1 - 4, The package.json for the "auth" workspace is missing the private flag; add "private": true to the root of this package.json (alongside "name": "auth", "version": "1.0.0", and "type": "module") so the auth workspace cannot be published directly or via recursive publish.apps/sync/package.json (1)
1-4: Mark this app workspace as private.Root
"private": truedoes not apply to nested workspace packages, so this runnable app package can still be picked up by direct or recursive publish flows.🔧 Proposed fix
{ "name": "sync", + "private": true, "version": "1.0.0", "type": "module",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/sync/package.json` around lines 1 - 4, The package.json for the sync app is missing the "private": true flag so the runnable workspace can be published accidentally; open the sync package.json (the root object in the file shown) and add "private": true at the top-level of the manifest (alongside "name", "version", "type") to prevent npm/yarn from publishing this workspace package.apps/sync/src/approve-grouping.ts (1)
11-12: Make approval independent of the caller’s working directory.Like
propose-grouping.ts, this resolves correctly only when started fromapps/sync. Use the entrypoint location so root-level scripts do not approve/read proposals from the wrong directory.♻️ Proposed fix
import fs from "node:fs/promises"; import path from "node:path"; +import { fileURLToPath } from "node:url"; import { loadEnvConfig, fileExists, @@ } from "@obsidian-ai-pipeline/core"; -const repoRoot = path.resolve(process.cwd(), "../.."); +const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "../../.."); const config = loadEnvConfig(repoRoot);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/sync/src/approve-grouping.ts` around lines 11 - 12, The current repoRoot calculation in approve-grouping.ts uses process.cwd() which makes approval depend on the caller's working directory; change it to compute the repo root relative to the file entrypoint (use __dirname of approve-grouping.ts) instead of process.cwd(), then pass that repoRoot into loadEnvConfig so approvals/read proposals resolve the same way as propose-grouping.ts regardless of where the script is invoked; update the repoRoot variable and keep loadEnvConfig(repoRoot) usage.apps/sync/src/propose-grouping.ts (1)
12-13: Resolve the repo root from the module path instead ofcwdfor robustness.This pattern is fragile: it only works when invoked via
pnpm --filter sync, which setscwdtoapps/sync. Direct invocation (e.g.,tsx apps/sync/src/propose-grouping.tsfrom the repo root) would resolverepoRootoutside the repository.The same issue exists in
sync.tsandapprove-grouping.ts. Useimport.meta.urlwithfileURLToPathto make the path resolution independent of the current working directory. This pattern is already used elsewhere in the codebase (e.g.,packages/core/src/config/env.ts).♻️ Proposed fix
import path from "node:path"; import fs from "node:fs/promises"; +import { fileURLToPath } from "node:url"; import { detectFolderGroupings, loadEnvConfig, @@ } from "@obsidian-ai-pipeline/core"; -const repoRoot = path.resolve(process.cwd(), "../.."); +const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "../../.."); const config = loadEnvConfig(repoRoot);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/sync/src/propose-grouping.ts` around lines 12 - 13, The repoRoot resolution using path.resolve(process.cwd(), "../..") is fragile; change it to derive the script directory from import.meta.url via fileURLToPath and path.dirname, then pass that repo root into loadEnvConfig so the code in propose-grouping.ts (and the same pattern in sync.ts and approve-grouping.ts) no longer depends on the current working directory; update the variable named repoRoot and ensure loadEnvConfig(repoRoot) is called with the new repoRoot computed from import.meta.url/fileURLToPath.apps/auth/src/server.ts (2)
79-86: Handle OAuth error responses on the callback.X can redirect to
/auth/callback?error=access_denied&error_description=...(e.g. user cancels consent). Currently that path falls through theif (!code || !state)branch and returns a genericMissing code/state, which is misleading. Consider surfacingurl.searchParams.get("error")/error_descriptionexplicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/auth/src/server.ts` around lines 79 - 86, In the /auth/callback handling block, check url.searchParams.get("error") and url.searchParams.get("error_description") before treating missing code/state as a generic error; if an OAuth error is present, return a 400 (via send(res, ...)) that includes the error and description (and optionally log it) so callers see the real cause instead of "Missing code/state"; update the logic around the existing url, code, state variables in the callback branch to prefer surfacing OAuth errors first.
19-20:repoRootderived fromprocess.cwd()is fragile.
path.resolve(process.cwd(), "../..")assumes the process is started fromapps/auth/. Runningnode dist/...or the pnpm script from a different working directory will silently resolve the wrong repo root, causingloadEnvConfigto read a non-existent.env, and tokens/PKCE files to be written to the wrong location. The same pattern is repeated inapps/sync/src/sync.ts(line 28).Prefer resolving from
import.meta.url(viafileURLToPath), or acceptREPO_ROOTfrom an env var with a sensible fallback.🛠️ Illustrative fix
-import path from "node:path"; +import path from "node:path"; +import { fileURLToPath } from "node:url"; ... -const repoRoot = path.resolve(process.cwd(), "../.."); +const here = path.dirname(fileURLToPath(import.meta.url)); +const repoRoot = process.env.REPO_ROOT ?? path.resolve(here, "../../..");Adjust the relative depth to match the compiled output layout (
apps/auth/dist/...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/auth/src/server.ts` around lines 19 - 20, The repoRoot calculation using path.resolve(process.cwd(), "../..") is fragile; update the initialization of repoRoot (the variable used before calling loadEnvConfig) to compute a stable project root by either reading a REPO_ROOT environment variable fallback or resolving from the current module location via fileURLToPath(import.meta.url) and path.resolve(..., "../../..") adjusted for compiled layout; apply the same change pattern where repoRoot is computed in the sync entry (the other similar variable), ensuring loadEnvConfig and any token/PKCE file writes use the corrected repoRoot.apps/sync/src/sync.ts (1)
31-44: Token refresh buffer is tight for long syncs.
bufferMs = 60 * 1000only guards the first API call. A long pagination run (multiple folders × multiple pages) can easily cross the expiry boundary mid-sync, after whichclientstill holds the originaltokensobject and subsequent requests will 401. Consider either (a) widening the buffer to e.g. 5 minutes and/or (b) havingXApiClientrefresh on 401 and persist the new tokens mid-flight.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/sync/src/sync.ts` around lines 31 - 44, ensureValidTokens currently uses bufferMs = 60 * 1000 which is too small for long syncs and leaves XApiClient using the old tokens mid-run; increase the pre-expiry buffer (e.g., set bufferMs to 5 * 60 * 1000) in ensureValidTokens (tokens, bufferMs) or make it a configurable constant, and additionally add token-refresh-on-401 logic inside XApiClient (catch 401 responses, call refreshAccessToken with tokens.refresh_token, call saveTokens to persist the refreshed XTokenSet, update the client’s in-memory tokens, and retry the failed request) so long-running pagination won't hit a mid-flight 401.packages/core/src/x-folder-grouping/grouping-detector.ts (1)
102-117: Dedup/merge branch is effectively unreachable.Keys are
${match_type}:${token}. Within a singledetectForPositioncall,tokenMapsalready enforces token uniqueness, and prefix vs. suffix proposals produce distinctmatch_type, sodeduped.get(key)will never return an existing entry. The merge path (lines 112-116) is dead code. Either drop the "ambiguous collision" branch or change the key so that cross-position collisions on the sametokenactually merge (if that was the intent).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/x-folder-grouping/grouping-detector.ts` around lines 102 - 117, The "ambiguous collision" branch is unreachable because keys use `${proposal.match_type}:${proposal.token}` and tokenMaps already ensure uniqueness per match_type; decide intent: if you want to merge proposals across match types (prefix vs suffix), change the dedup key to use only the token (e.g., const key = proposal.token) so deduped.get(key) can return an existing entry and the merge block (combining children and sorting) is executed; otherwise remove the merge branch entirely to eliminate dead code (update the loop in grouping-detector.ts where key is computed and the dedup map is populated).packages/core/src/config/env.ts (2)
66-79: Prefer thedotenvpackage over this hand-rolled parser.The current parser has a few quiet edge cases that will bite later:
- Line 76:
replace(/^['"]|['"]$/g, "")strips leading and trailing quote characters independently, so mismatched quotes likeKEY="value'becomevalue, and a legitimate value likeit'sis left alone but"it's"becomesit's(fine) while'quoted"drops both. Not an explicit bug today, but surprising.- No support for escaped characters, multi-line values, or
export KEY=...prefixes commonly used in.envfiles.- Inline
#comments after a value are kept as part of the value.Since
.envis loaded in the OAuth/sync paths, consider depending ondotenv(withoverride: falseto preserve the existing "don't overwriteprocess.env" semantics at line 75). It's ~2 lines of code and eliminates the surface area.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/config/env.ts` around lines 66 - 79, Replace the custom loadDotEnv implementation with the standard dotenv loader: remove or stop using the hand-rolled parsing in loadDotEnv and call dotenv.config({ path: envPath, override: false }) (importing dotenv at the top) so .env files are parsed correctly (handles quotes, escapes, multi-line values, export prefixes and inline comments) while preserving the current behavior of not overwriting existing process.env entries; keep the same function name loadDotEnv and signature so callers remain unchanged.
29-43: Wrapnew URL(xRedirectUri)with a descriptive error.If
X_REDIRECT_URIis set but malformed (e.g. missing scheme), line 33 throws a genericTypeError: Invalid URLwith no hint about which env var is wrong. Small quality-of-life win for anyone setting up the workspace for the first time.♻️ Proposed fix
const xRedirectUri = requireEnv("X_REDIRECT_URI"); - const parsedRedirect = new URL(xRedirectUri); + let parsedRedirect: URL; + try { + parsedRedirect = new URL(xRedirectUri); + } catch { + throw new Error(`Invalid X_REDIRECT_URI: ${xRedirectUri}`); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/config/env.ts` around lines 29 - 43, Wrap the URL parsing in loadEnvConfig so malformed X_REDIRECT_URI yields a descriptive error: when creating parsedRedirect = new URL(xRedirectUri) (using the xRedirectUri from requireEnv("X_REDIRECT_URI")), catch the thrown TypeError and rethrow a new Error that includes the env var name and the invalid value (e.g. "Invalid X_REDIRECT_URI: '<value>'"), so callers of loadEnvConfig get a clear, actionable message instead of the generic "Invalid URL" exception.packages/core/src/x-api/client.ts (1)
18-36: Consider adding a request timeout and 429/5xx handling.
fetchhas no default timeout, and the X API routinely returns429withx-rate-limit-reset. As written, a single stalled connection hangs the entire sync, and a transient 429 aborts the run with a raw error. Worth wiring in anAbortSignal.timeout(...)and minimal retry-with-backoff that honorsretry-after/x-rate-limit-resetbefore this lands in production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/x-api/client.ts` around lines 18 - 36, Update the private async request<T>(pathname: string, query?: URLSearchParams) method to add a request timeout using AbortSignal.timeout and to implement a minimal retry-with-backoff loop that specifically handles 429 and 5xx responses: create an AbortController for fetch (use AbortSignal.timeout for the per-request timeout), parse and respect Retry-After and x-rate-limit-reset headers to delay retries, apply exponential backoff with a cap for transient 5xx/429 responses, and only throw an error after retries are exhausted; ensure the Authorization header and existing JSON/error handling remain unchanged in request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Line 35: The CI step uses "pnpm install --no-frozen-lockfile" which allows CI
to resolve a different dependency graph than the committed lockfile; change that
run command to use "--frozen-lockfile" (i.e., "pnpm install --frozen-lockfile")
so CI fails when the lockfile is out of date, then update and commit the
pnpm-lock.yaml in your branch to make the pipeline green.
In `@apps/auth/src/server.ts`:
- Around line 22-28: savePkceState/loadPkceState currently persist a single PKCE
JSON blob to config.pkceStatePath with no TTL or invalidation, allowing replay,
stale use, and races; modify savePkceState to include a createdAt timestamp and
a short TTL (e.g., 10 minutes) and to write an atomic per-login unique entry
(e.g., include a request-specific id or write to a temp-then-rename file) to
avoid clobbering concurrent requests, update loadPkceState (and the
/auth/callback exchange logic) to reject expired entries by comparing createdAt
+ TTL and to fs.unlink the state file immediately after a successful token
exchange or on state-mismatch/failure so the PKCE state cannot be replayed, and
ensure errors during unlink/writes are logged and handled.
- Around line 87-91: The current code in the OAuth callback uses loadPkceState()
and then sends a 400 response that echoes saved.state and the incoming state;
change this to return a generic error message (e.g., "State mismatch") via
send(res, 400, ...) without including saved.state or the incoming state, and
move any detailed values into a server-side log (use existing logger or
console.error) that logs saved.state and state for debugging; update the block
around loadPkceState, saved.state and the send call accordingly so no
server-side state is leaked to the client.
In `@apps/sync/src/sync.ts`:
- Around line 115-130: The bug is that folderCounts and folderResults use
folder.name as the key and later unconditionally set folderCounts.root and
folderResults entries for the "root" collection, causing collisions if a user
folder is named "root"; change the map keys to a namespaced or id-based key
(e.g., use folder.id or prefix with "__folder__" when assigning in the loop that
calls client.getBookmarksByFolder and when assigning the root collection from
allItems) and update any lookups that read folderCounts or folderResults
(including the code that generates _index.md) to resolve keys back to the human
folder name for display; ensure you pick one scheme (id-based keys or reserved
prefix like "__root__") and apply it consistently for both folderCounts and
folderResults and when creating the root entry (allItems).
In `@package.json`:
- Around line 21-22: The workspace "typecheck" script fails on clean checkouts
because packages import `@obsidian-ai-pipeline/core` whose package.json points
types/exports to dist that doesn't exist; update the top-level "typecheck"
script to ensure the core package is built (or its types emitted) before running
the recursive no-emit typecheck — e.g., run a build for
`@obsidian-ai-pipeline/core` (or a workspace-wide build that includes core) first
and then run the existing pnpm -r typecheck so TypeScript can resolve the core
package's types.
In `@packages/core/src/auth/token-storage.ts`:
- Around line 6-11: The saveTokens function currently writes directly to
tokensPath so the mode: 0o600 only applies on create; to guarantee secure
permissions on overwrite, write the JSON to a temp file in the same directory
(e.g. create a temp name using path.dirname(tokensPath) + a unique suffix),
write the file with mode: 0o600, flush/close it, then atomically rename
(fs.rename) the temp file to tokensPath; ensure mkdir(path.dirname(tokensPath),
{ recursive: true }) still runs first and use the same tokensPath / XTokenSet
symbols to locate where to implement the temp-write-then-rename pattern.
In `@packages/core/src/auth/x-oauth-client.ts`:
- Around line 71-77: refreshAccessToken currently returns the token set from
exchange(...) verbatim which can drop the refresh_token if the provider omits
it; change refreshAccessToken (in packages/core/src/auth/x-oauth-client.ts) to
call exchange(config, body), then if the returned XTokenSet has no
refresh_token, set refresh_token to the original refreshToken argument (i.e.,
return a new token set that spreads the exchange result and falls back to the
provided refreshToken) so sync.ts and ensureValidTokens keep the existing
refresh token per RFC 6749.
- Around line 41-45: The POST to `${config.xApiBaseUrl}/2/oauth2/token`
currently calls fetch without a timeout; wrap the request with an
AbortController, start a timeout (e.g. configurable default like 10s), pass
controller.signal into the fetch call that assigns to `response`, and clear the
timeout after fetch completes; ensure you attach the same pattern wherever token
requests are made in this module (e.g. the function that constructs `headers`,
`body` and assigns `const response = await fetch(...)`) so stalled token
endpoint calls are aborted and the flow can recover.
In `@packages/core/src/fs/file-utils.ts`:
- Around line 18-24: fileExists currently swallows all fs.access errors making
permission or path errors appear as "not found"; change fileExists to catch the
error, inspect err.code and return false only when err.code === 'ENOENT' (or
'ENOENT' and 'ENOTFOUND' if you want to be defensive), otherwise rethrow the
original error so callers like token-storage.ts receive permission/access errors
(e.g., check error.code in the catch and throw the error for EACCES/ENOTDIR/etc.
instead of returning false).
In `@packages/core/src/markdown/markdown-builder.ts`:
- Around line 3-5: The escapeYaml function only escapes double quotes and misses
backslashes, which can break double-quoted YAML scalars; update the escapeYaml
function to first escape backslashes (convert "\" to "\\") and then escape
double quotes (convert `"` to `\"`) so backslashes are preserved and don't start
unintended YAML escapes—locate the escapeYaml function in markdown-builder.ts
and apply the two-step replacement (escape backslashes before escaping quotes).
- Around line 56-64: The YAML frontmatter in buildFolderIndexMarkdown is
vulnerable because folderName, sourceRoot and generatedAt are interpolated
directly into quoted YAML fields without escaping; update
buildFolderIndexMarkdown to pass params.folderName, params.sourceRoot and
params.generatedAt through the existing escapeYaml utility (the same way
buildBookmarkMarkdown does), and ensure escapeYaml is available (either export
it from its module or import/reference it locally) before using it so any
embedded quotes or special characters are properly escaped in the frontmatter.
In `@packages/core/src/types/shared.ts`:
- Around line 43-50: Change the XBookmarkPage interface so the data property is
optional (data?: XPost[]) to match the X API v2 behavior when no bookmarks are
returned; update any code that assumes page.data is always present to handle
undefined (e.g., use page.data ?? [] where iteration occurs) so callers like
consumers of XBookmarkPage won't crash when the API omits data.
In `@packages/core/src/x-api/client.ts`:
- Around line 88-109: getPaginated lacks the safety checks present in
getBookmarkFolders and can loop indefinitely; update getPaginated (used by
getBookmarksAll and getBookmarksByFolder) to enforce a max page cap and detect
repeated pagination tokens: introduce a MAX_PAGES constant (same or similar
value used in getBookmarkFolders), track the previous token(s) and break if the
next_token is unchanged or if pageCount exceeds MAX_PAGES, and surface/log a
clear error or throw when the cap is hit so upstream sync can handle the failure
gracefully.
---
Nitpick comments:
In `@apps/auth/package.json`:
- Around line 1-4: The package.json for the "auth" workspace is missing the
private flag; add "private": true to the root of this package.json (alongside
"name": "auth", "version": "1.0.0", and "type": "module") so the auth workspace
cannot be published directly or via recursive publish.
In `@apps/auth/src/server.ts`:
- Around line 79-86: In the /auth/callback handling block, check
url.searchParams.get("error") and url.searchParams.get("error_description")
before treating missing code/state as a generic error; if an OAuth error is
present, return a 400 (via send(res, ...)) that includes the error and
description (and optionally log it) so callers see the real cause instead of
"Missing code/state"; update the logic around the existing url, code, state
variables in the callback branch to prefer surfacing OAuth errors first.
- Around line 19-20: The repoRoot calculation using path.resolve(process.cwd(),
"../..") is fragile; update the initialization of repoRoot (the variable used
before calling loadEnvConfig) to compute a stable project root by either reading
a REPO_ROOT environment variable fallback or resolving from the current module
location via fileURLToPath(import.meta.url) and path.resolve(..., "../../..")
adjusted for compiled layout; apply the same change pattern where repoRoot is
computed in the sync entry (the other similar variable), ensuring loadEnvConfig
and any token/PKCE file writes use the corrected repoRoot.
In `@apps/sync/package.json`:
- Around line 1-4: The package.json for the sync app is missing the "private":
true flag so the runnable workspace can be published accidentally; open the sync
package.json (the root object in the file shown) and add "private": true at the
top-level of the manifest (alongside "name", "version", "type") to prevent
npm/yarn from publishing this workspace package.
In `@apps/sync/src/approve-grouping.ts`:
- Around line 11-12: The current repoRoot calculation in approve-grouping.ts
uses process.cwd() which makes approval depend on the caller's working
directory; change it to compute the repo root relative to the file entrypoint
(use __dirname of approve-grouping.ts) instead of process.cwd(), then pass that
repoRoot into loadEnvConfig so approvals/read proposals resolve the same way as
propose-grouping.ts regardless of where the script is invoked; update the
repoRoot variable and keep loadEnvConfig(repoRoot) usage.
In `@apps/sync/src/propose-grouping.ts`:
- Around line 12-13: The repoRoot resolution using path.resolve(process.cwd(),
"../..") is fragile; change it to derive the script directory from
import.meta.url via fileURLToPath and path.dirname, then pass that repo root
into loadEnvConfig so the code in propose-grouping.ts (and the same pattern in
sync.ts and approve-grouping.ts) no longer depends on the current working
directory; update the variable named repoRoot and ensure loadEnvConfig(repoRoot)
is called with the new repoRoot computed from import.meta.url/fileURLToPath.
In `@apps/sync/src/sync.ts`:
- Around line 31-44: ensureValidTokens currently uses bufferMs = 60 * 1000 which
is too small for long syncs and leaves XApiClient using the old tokens mid-run;
increase the pre-expiry buffer (e.g., set bufferMs to 5 * 60 * 1000) in
ensureValidTokens (tokens, bufferMs) or make it a configurable constant, and
additionally add token-refresh-on-401 logic inside XApiClient (catch 401
responses, call refreshAccessToken with tokens.refresh_token, call saveTokens to
persist the refreshed XTokenSet, update the client’s in-memory tokens, and retry
the failed request) so long-running pagination won't hit a mid-flight 401.
In `@packages/core/src/config/env.ts`:
- Around line 66-79: Replace the custom loadDotEnv implementation with the
standard dotenv loader: remove or stop using the hand-rolled parsing in
loadDotEnv and call dotenv.config({ path: envPath, override: false }) (importing
dotenv at the top) so .env files are parsed correctly (handles quotes, escapes,
multi-line values, export prefixes and inline comments) while preserving the
current behavior of not overwriting existing process.env entries; keep the same
function name loadDotEnv and signature so callers remain unchanged.
- Around line 29-43: Wrap the URL parsing in loadEnvConfig so malformed
X_REDIRECT_URI yields a descriptive error: when creating parsedRedirect = new
URL(xRedirectUri) (using the xRedirectUri from requireEnv("X_REDIRECT_URI")),
catch the thrown TypeError and rethrow a new Error that includes the env var
name and the invalid value (e.g. "Invalid X_REDIRECT_URI: '<value>'"), so
callers of loadEnvConfig get a clear, actionable message instead of the generic
"Invalid URL" exception.
In `@packages/core/src/fs/file-utils.ts`:
- Around line 27-35: The sanitizeFileName function needs to also guard against
Windows reserved device names and names ending in dots/spaces; update
sanitizeFileName to (1) remove trailing dots and trailing spaces from the
normalized result, (2) detect Windows reserved names (case-insensitive list:
CON, PRN, AUX, NUL, COM1..COM9, LPT1..LPT9) both as bare names and when they
appear as the stem before an extension (e.g., "CON.txt"), and (3) if a reserved
name or empty result remains, replace it with a safe fallback like "untitled"
(or append/replace with an underscore) so Obsidian/Windows-backed vaults won’t
fail; modify the logic inside sanitizeFileName to perform these checks after the
existing normalization and trimming steps.
In `@packages/core/src/x-api/client.ts`:
- Around line 18-36: Update the private async request<T>(pathname: string,
query?: URLSearchParams) method to add a request timeout using
AbortSignal.timeout and to implement a minimal retry-with-backoff loop that
specifically handles 429 and 5xx responses: create an AbortController for fetch
(use AbortSignal.timeout for the per-request timeout), parse and respect
Retry-After and x-rate-limit-reset headers to delay retries, apply exponential
backoff with a cap for transient 5xx/429 responses, and only throw an error
after retries are exhausted; ensure the Authorization header and existing
JSON/error handling remain unchanged in request.
In `@packages/core/src/x-folder-grouping/grouping-detector.ts`:
- Around line 102-117: The "ambiguous collision" branch is unreachable because
keys use `${proposal.match_type}:${proposal.token}` and tokenMaps already ensure
uniqueness per match_type; decide intent: if you want to merge proposals across
match types (prefix vs suffix), change the dedup key to use only the token
(e.g., const key = proposal.token) so deduped.get(key) can return an existing
entry and the merge block (combining children and sorting) is executed;
otherwise remove the merge branch entirely to eliminate dead code (update the
loop in grouping-detector.ts where key is computed and the dedup map is
populated).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 98421fd6-b856-43c2-9fb9-859fb191a3b5
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (36)
.env.example.github/workflows/ci.yml.gitignoreREADME.mdanalysis/.gitkeepapps/auth/package.jsonapps/auth/src/server.tsapps/auth/tsconfig.jsonapps/sync/package.jsonapps/sync/src/approve-grouping.tsapps/sync/src/propose-grouping.tsapps/sync/src/sync.tsapps/sync/tsconfig.jsondata/.gitkeeppackage.jsonpackages/core/package.jsonpackages/core/src/auth/pkce.tspackages/core/src/auth/token-storage.tspackages/core/src/auth/x-oauth-client.tspackages/core/src/config/env.tspackages/core/src/fs/file-utils.tspackages/core/src/index.tspackages/core/src/markdown/markdown-builder.tspackages/core/src/path/date-suffix-stripper.tspackages/core/src/path/x-bookmark-path-resolver.tspackages/core/src/types/shared.tspackages/core/src/x-api/client.tspackages/core/src/x-folder-grouping/grouping-detector.tspackages/core/src/x-folder-grouping/mapping-json-writer.tspackages/core/src/x-folder-grouping/proposal-markdown-renderer.tspackages/core/src/x-folder-grouping/stopwords.tspackages/core/src/x-folder-grouping/tokenizer.tspackages/core/tsconfig.jsonpnpm-workspace.yamltsconfig.base.jsonx_folder_mapping.json
|
|
||
| - name: Install dependencies | ||
| run: pnpm install --frozen-lockfile | ||
| run: pnpm install --no-frozen-lockfile |
There was a problem hiding this comment.
Keep the root CI install frozen.
--no-frozen-lockfile lets CI silently resolve a different dependency graph when pnpm-lock.yaml is stale, so PRs can pass while a clean frozen install fails. Prefer committing the lockfile update and keeping CI on pnpm install --frozen-lockfile.
🔧 Proposed fix
- run: pnpm install --no-frozen-lockfile
+ run: pnpm install --frozen-lockfile📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run: pnpm install --no-frozen-lockfile | |
| run: pnpm install --frozen-lockfile |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci.yml at line 35, The CI step uses "pnpm install
--no-frozen-lockfile" which allows CI to resolve a different dependency graph
than the committed lockfile; change that run command to use "--frozen-lockfile"
(i.e., "pnpm install --frozen-lockfile") so CI fails when the lockfile is out of
date, then update and commit the pnpm-lock.yaml in your branch to make the
pipeline green.
| async function savePkceState(pkce: PkceState): Promise<void> { | ||
| await writeJsonFile(config.pkceStatePath, pkce); | ||
| } | ||
|
|
||
| async function loadPkceState(): Promise<PkceState> { | ||
| return readJsonFile<PkceState>(config.pkceStatePath); | ||
| } |
There was a problem hiding this comment.
PKCE state has no TTL and is not invalidated after use.
savePkceState / loadPkceState share a single JSON file at config.pkceStatePath:
- After a successful exchange (line 98), the state file is never removed or rotated. Any replay of a valid callback URL (logs, browser history, referers) will re-enter the exchange branch and, if the authorization code is still accepted by X, produce new tokens that overwrite
tokens.json. - There is no
createdAtexpiration check. A stale state from hours/days ago will still be considered valid. - Concurrent
/auth/loginrequests clobber each other's state, so interleaved logins can fail or cross-authenticate.
Recommend: (1) stamp PKCE state with a short TTL (e.g. 10 minutes) and reject expired entries in /auth/callback; (2) fs.unlink the PKCE state file immediately after a successful token exchange (and on state-mismatch).
Also applies to: 87-98
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/auth/src/server.ts` around lines 22 - 28, savePkceState/loadPkceState
currently persist a single PKCE JSON blob to config.pkceStatePath with no TTL or
invalidation, allowing replay, stale use, and races; modify savePkceState to
include a createdAt timestamp and a short TTL (e.g., 10 minutes) and to write an
atomic per-login unique entry (e.g., include a request-specific id or write to a
temp-then-rename file) to avoid clobbering concurrent requests, update
loadPkceState (and the /auth/callback exchange logic) to reject expired entries
by comparing createdAt + TTL and to fs.unlink the state file immediately after a
successful token exchange or on state-mismatch/failure so the PKCE state cannot
be replayed, and ensure errors during unlink/writes are logged and handled.
| const saved = await loadPkceState(); | ||
| if (saved.state !== state) { | ||
| send(res, 400, `State mismatch. expected=${saved.state} actual=${state}`); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Don't echo saved/expected state in the error response.
send(res, 400, \State mismatch. expected=${saved.state} actual=${state}`)leaks the server-side PKCE state value into the browser (and any intermediaries/logs). While thestate` parameter isn't a long-term secret, exposing it defeats its tamper-detection purpose for the current flow. Return a generic message and keep detailed values in server logs only.
🛠️ Suggested fix
- if (saved.state !== state) {
- send(res, 400, `State mismatch. expected=${saved.state} actual=${state}`);
- return;
- }
+ if (saved.state !== state) {
+ console.warn("PKCE state mismatch", { expected: saved.state, actual: state });
+ send(res, 400, "State mismatch");
+ return;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/auth/src/server.ts` around lines 87 - 91, The current code in the OAuth
callback uses loadPkceState() and then sends a 400 response that echoes
saved.state and the incoming state; change this to return a generic error
message (e.g., "State mismatch") via send(res, 400, ...) without including
saved.state or the incoming state, and move any detailed values into a
server-side log (use existing logger or console.error) that logs saved.state and
state for debugging; update the block around loadPkceState, saved.state and the
send call accordingly so no server-side state is leaked to the client.
| const folderCounts: Record<string, number> = {}; | ||
| const folderResults: Array<{ | ||
| folderName: string; | ||
| entries: Array<{ | ||
| fileName: string; | ||
| postId: string; | ||
| authorDisplay: string; | ||
| url: string; | ||
| directory: string; | ||
| }>; | ||
| }> = []; | ||
|
|
||
| for (const folder of folders) { | ||
| const pages = await client.getBookmarksByFolder(me.id, folder.id); | ||
| const items = flattenPages(pages); | ||
| folderCounts[folder.name] = items.length; |
There was a problem hiding this comment.
folderCounts / folderResults key collision if a user-created folder is named root.
Line 130 writes folderCounts[folder.name] = items.length, and line 163 unconditionally sets folderCounts.root = allItems.length, silently overwriting any real folder named root. Same issue for folderResults (both entries are present, but downstream _index.md generation at lines 173-214 would write two indexes to the same folder directory and the second wins). Consider using a reserved key (e.g. __root__) or namespacing folder-id based keys.
Also applies to: 162-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/sync/src/sync.ts` around lines 115 - 130, The bug is that folderCounts
and folderResults use folder.name as the key and later unconditionally set
folderCounts.root and folderResults entries for the "root" collection, causing
collisions if a user folder is named "root"; change the map keys to a namespaced
or id-based key (e.g., use folder.id or prefix with "__folder__" when assigning
in the loop that calls client.getBookmarksByFolder and when assigning the root
collection from allItems) and update any lookups that read folderCounts or
folderResults (including the code that generates _index.md) to resolve keys back
to the human folder name for display; ensure you pick one scheme (id-based keys
or reserved prefix like "__root__") and apply it consistently for both
folderCounts and folderResults and when creating the root entry (allItems).
| "build": "pnpm -r build", | ||
| "typecheck": "pnpm -r typecheck" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect workspace manifests and tsconfigs for core source path mappings/project references.
# Expected: apps/* tsconfigs either map `@obsidian-ai-pipeline/core` to packages/core/src or the root typecheck builds core first.
set -euo pipefail
fd -e json '^(package|tsconfig.*)\.json$' \
| sort \
| xargs rg -n -C2 '"@obsidian-ai-pipeline/core"|"paths"|"references"|"main"|"types"|"exports"|"typecheck"'Repository: theosera/obsidian-ai-pipeline
Length of output: 2952
🏁 Script executed:
# Check the app tsconfig files for path mappings
for app in apps/sync apps/auth; do
echo "=== $app/tsconfig.json ==="
cat -n "$app/tsconfig.json"
doneRepository: theosera/obsidian-ai-pipeline
Length of output: 538
🏁 Script executed:
cat -n tsconfig.base.jsonRepository: theosera/obsidian-ai-pipeline
Length of output: 573
Make typecheck work from a clean workspace.
The app packages import @obsidian-ai-pipeline/core, and their tsconfigs have no path mappings to core source. The core package manifest points types/exports to dist, which doesn't exist on clean checkouts. A recursive no-emit typecheck will fail unless dist is built first.
🔧 Proposed fix
- "typecheck": "pnpm -r typecheck"
+ "typecheck": "pnpm --filter `@obsidian-ai-pipeline/core` build && pnpm -r typecheck"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "build": "pnpm -r build", | |
| "typecheck": "pnpm -r typecheck" | |
| "build": "pnpm -r build", | |
| "typecheck": "pnpm --filter `@obsidian-ai-pipeline/core` build && pnpm -r typecheck" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@package.json` around lines 21 - 22, The workspace "typecheck" script fails on
clean checkouts because packages import `@obsidian-ai-pipeline/core` whose
package.json points types/exports to dist that doesn't exist; update the
top-level "typecheck" script to ensure the core package is built (or its types
emitted) before running the recursive no-emit typecheck — e.g., run a build for
`@obsidian-ai-pipeline/core` (or a workspace-wide build that includes core) first
and then run the existing pnpm -r typecheck so TypeScript can resolve the core
package's types.
| export async function fileExists(filePath: string): Promise<boolean> { | ||
| try { | ||
| await fs.access(filePath); | ||
| return true; | ||
| } catch { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and read the function
fd -t f "file-utils.ts" | head -20Repository: theosera/obsidian-ai-pipeline
Length of output: 107
🏁 Script executed:
# Get current implementation of fileExists
if [ -f "packages/core/src/fs/file-utils.ts" ]; then
cat -n "packages/core/src/fs/file-utils.ts" | head -40
fiRepository: theosera/obsidian-ai-pipeline
Length of output: 1312
🏁 Script executed:
# Search for usages of fileExists to understand impact
rg "fileExists" --type ts -B2 -A2 | head -100Repository: theosera/obsidian-ai-pipeline
Length of output: 3088
🏁 Script executed:
# Search for Node.js fs.access error handling patterns in the codebase to understand expectations
rg "fs.access|ENOENT|EACCES|EISDIR|ENOTDIR" --type ts | head -30Repository: theosera/obsidian-ai-pipeline
Length of output: 234
🌐 Web query:
Node.js fs.access error codes ENOENT EACCES EISDIR ENOTDIR
💡 Result:
In Node.js fs.access (and fs.accessSync), which tests a path's accessibility according to specified modes (F_OK for existence, R_OK/W_OK/X_OK for read/write/execute permissions), the following standard POSIX-derived system error codes are returned via err.code when checks fail: - ENOENT: No such file or directory. The specified path does not exist. Commonly handled explicitly in examples: if (err.code === 'ENOENT') { console.error('myfile does not exist'); } - EACCES: Permission denied. The process lacks the required permissions for the requested mode (e.g., attempting W_OK without write access). - EISDIR: Illegal operation on a directory. Thrown when performing an operation inappropriate for a directory (though fs.access itself works on directories; this code appears more in other fs ops like readFile on dirs, but listed in query). - ENOTDIR: Not a directory. A component of the path prefix is not a directory (e.g., path traversal issue where a file exists in a directory path). These are system errors propagated from the underlying OS (via libuv), as documented in Node.js errors page and fs docs. Official fs docs recommend handling via err.code rather than deprecated fs.exists. Note: On Windows, ACLs may not be fully checked by access, leading to false positives.
Citations:
- 1: https://nodejs.org/api/errors.html
- 2: https://docs.deno.com/api/node/fs/~/access
- 3: https://runebook.dev/en/articles/node/fs/fsaccesspath-mode-callback
- 4: https://nodejs.org/api/fs.html
- 5: https://nodejs.org/download/rc/v5.11.0-rc.1/docs/api/fs.html
- 6: https://runebook.dev/en/articles/node/fs/fsaccesssyncpath-mode
Distinguish filesystem errors: only return false for missing files, not permission/access issues.
fileExists currently returns false for permission denied (EACCES), path component errors (ENOTDIR), and other failures, masking real configuration or permission problems. Callers like token-storage.ts will throw misleading "not found" errors when the actual issue is a permission error.
Recommended fix
export async function fileExists(filePath: string): Promise<boolean> {
try {
await fs.access(filePath);
return true;
- } catch {
+ } catch (error) {
+ const code = (error as NodeJS.ErrnoException).code;
+ if (code === "ENOENT" || code === "ENOTDIR") {
+ return false;
+ }
+ throw error;
- return false;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function fileExists(filePath: string): Promise<boolean> { | |
| try { | |
| await fs.access(filePath); | |
| return true; | |
| } catch { | |
| return false; | |
| } | |
| export async function fileExists(filePath: string): Promise<boolean> { | |
| try { | |
| await fs.access(filePath); | |
| return true; | |
| } catch (error) { | |
| const code = (error as NodeJS.ErrnoException).code; | |
| if (code === "ENOENT" || code === "ENOTDIR") { | |
| return false; | |
| } | |
| throw error; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/fs/file-utils.ts` around lines 18 - 24, fileExists
currently swallows all fs.access errors making permission or path errors appear
as "not found"; change fileExists to catch the error, inspect err.code and
return false only when err.code === 'ENOENT' (or 'ENOENT' and 'ENOTFOUND' if you
want to be defensive), otherwise rethrow the original error so callers like
token-storage.ts receive permission/access errors (e.g., check error.code in the
catch and throw the error for EACCES/ENOTDIR/etc. instead of returning false).
| function escapeYaml(input: string): string { | ||
| return input.replace(/"/g, '\\"'); | ||
| } |
There was a problem hiding this comment.
escapeYaml does not escape backslashes.
Only " is escaped. In a YAML double-quoted scalar, \ starts an escape sequence, so an author display name or folder name containing \ (e.g. Windows-style paths, or a literal \n in a display name) will produce malformed YAML. Escape \ before " to be safe:
🛠️ Suggested fix
function escapeYaml(input: string): string {
- return input.replace(/"/g, '\\"');
+ return input.replace(/\\/g, "\\\\").replace(/"/g, '\\"');
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function escapeYaml(input: string): string { | |
| return input.replace(/"/g, '\\"'); | |
| } | |
| function escapeYaml(input: string): string { | |
| return input.replace(/\\/g, "\\\\").replace(/"/g, '\\"'); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/markdown/markdown-builder.ts` around lines 3 - 5, The
escapeYaml function only escapes double quotes and misses backslashes, which can
break double-quoted YAML scalars; update the escapeYaml function to first escape
backslashes (convert "\" to "\\") and then escape double quotes (convert `"` to
`\"`) so backslashes are preserved and don't start unintended YAML
escapes—locate the escapeYaml function in markdown-builder.ts and apply the
two-step replacement (escape backslashes before escaping quotes).
| const lines: string[] = []; | ||
| lines.push("---"); | ||
| lines.push(`title: \"${params.folderName} Index\" | ||
| source: \"X\" | ||
| generated_at: \"${params.generatedAt}\"`); | ||
| lines.push("---", ""); | ||
| lines.push(`# ${params.folderName} Index`, ""); | ||
| lines.push(`- Source Root: ${params.sourceRoot}`); | ||
| lines.push(`- Total Posts: ${params.posts.length}`, ""); |
There was a problem hiding this comment.
buildFolderIndexMarkdown injects unescaped values into YAML frontmatter.
folderName, sourceRoot, and generatedAt are interpolated into double-quoted YAML fields without passing through escapeYaml (which buildBookmarkMarkdown does for equivalent fields). A folder name containing " will produce invalid YAML frontmatter in _index.md.
🛠️ Suggested fix
const lines: string[] = [];
lines.push("---");
- lines.push(`title: \"${params.folderName} Index\"
-source: \"X\"
-generated_at: \"${params.generatedAt}\"`);
+ lines.push(`title: "${escapeYaml(`${params.folderName} Index`)}"
+source: "X"
+generated_at: "${escapeYaml(params.generatedAt)}"`);
lines.push("---", "");You'll also want to export (or locally reference) escapeYaml for this call.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const lines: string[] = []; | |
| lines.push("---"); | |
| lines.push(`title: \"${params.folderName} Index\" | |
| source: \"X\" | |
| generated_at: \"${params.generatedAt}\"`); | |
| lines.push("---", ""); | |
| lines.push(`# ${params.folderName} Index`, ""); | |
| lines.push(`- Source Root: ${params.sourceRoot}`); | |
| lines.push(`- Total Posts: ${params.posts.length}`, ""); | |
| const lines: string[] = []; | |
| lines.push("---"); | |
| lines.push(`title: "${escapeYaml(`${params.folderName} Index`)}" | |
| source: "X" | |
| generated_at: "${escapeYaml(params.generatedAt)}"`); | |
| lines.push("---", ""); | |
| lines.push(`# ${params.folderName} Index`, ""); | |
| lines.push(`- Source Root: ${params.sourceRoot}`); | |
| lines.push(`- Total Posts: ${params.posts.length}`, ""); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/markdown/markdown-builder.ts` around lines 56 - 64, The
YAML frontmatter in buildFolderIndexMarkdown is vulnerable because folderName,
sourceRoot and generatedAt are interpolated directly into quoted YAML fields
without escaping; update buildFolderIndexMarkdown to pass params.folderName,
params.sourceRoot and params.generatedAt through the existing escapeYaml utility
(the same way buildBookmarkMarkdown does), and ensure escapeYaml is available
(either export it from its module or import/reference it locally) before using
it so any embedded quotes or special characters are properly escaped in the
frontmatter.
| export interface XBookmarkPage { | ||
| data: XPost[]; | ||
| includes?: { users?: XAuthor[] }; | ||
| meta?: { | ||
| result_count?: number; | ||
| next_token?: string; | ||
| }; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
X API v2 bookmarks endpoint response data field absent when no results
💡 Result:
Yes, when there are no bookmarks for the user, the X API v2 bookmarks endpoint (GET /2/users/{id}/bookmarks) returns a response where the "data" field is absent (not present in the JSON object). The OpenAPI schema for Get2UsersIdBookmarksResponse defines "data" as an array with minItems: 1, indicating it is only included when there are results. Official example responses always show "data" with content, but the schema implies omission for empty cases, consistent with other X API v2 list endpoints (e.g., user likes, followers) where "data" is omitted if no items match. Forum discussions report incomplete or missing data but do not contradict this behavior for zero results. Developers should check if "data" exists rather than its length.
Citations:
- 1: https://docs.x.com/x-api/users/get-bookmarks
- 2: https://generaltranslation.mintlify.app/x-api/users/get-bookmarks
- 3: https://docs.x.com/x-api/posts/bookmarks/quickstart/bookmarks-lookup
🏁 Script executed:
# Check the actual state of XBookmarkPage interface
cat -n packages/core/src/types/shared.ts | sed -n '40,55p'Repository: theosera/obsidian-ai-pipeline
Length of output: 469
🏁 Script executed:
# Find all usages of XBookmarkPage in the codebase
rg "XBookmarkPage" --type ts --type tsx -B 2 -A 2Repository: theosera/obsidian-ai-pipeline
Length of output: 100
🏁 Script executed:
# Check if there are similar interfaces for other list endpoints to understand the pattern
rg "export interface X.*Page" packages/core/src/types/ -A 5Repository: theosera/obsidian-ai-pipeline
Length of output: 425
🏁 Script executed:
# Search for usages of XBookmarkPage in TypeScript files
rg "XBookmarkPage" -t tsRepository: theosera/obsidian-ai-pipeline
Length of output: 771
🏁 Script executed:
# Look at similar list endpoint interfaces to understand the pattern
rg "export interface X.*Page|export interface X.*Response" packages/core/src/types/shared.ts -A 8Repository: theosera/obsidian-ai-pipeline
Length of output: 229
🏁 Script executed:
# Check how the code uses XBookmarkPage - look for .data access patterns
rg "\.data" packages/core/src/ -t ts -B 2 -A 1 | head -100Repository: theosera/obsidian-ai-pipeline
Length of output: 600
🏁 Script executed:
# View the getPaginated implementation to understand the full context
cat -n packages/core/src/x-api/client.ts | sed -n '1,200p' | grep -A 20 "getPaginated"Repository: theosera/obsidian-ai-pipeline
Length of output: 1170
🏁 Script executed:
# Check if there are other response interfaces with required data fields
rg "data:\s*\w+\[\]" packages/core/src/types/ -t tsRepository: theosera/obsidian-ai-pipeline
Length of output: 123
🏁 Script executed:
# Look for error response handling in the API client
rg "errors?" packages/core/src/x-api/client.ts -B 2 -A 2Repository: theosera/obsidian-ai-pipeline
Length of output: 55
🏁 Script executed:
# Check how the pages returned from getPaginated are used
rg "getBookmarksAll\|getBookmarksByFolder\|getPaginated" packages/core/src/ -t ts -A 5 | head -80Repository: theosera/obsidian-ai-pipeline
Length of output: 55
🏁 Script executed:
# Look at the complete getPaginated and how pages are consumed
cat -n packages/core/src/x-api/client.ts | sed -n '88,120p'Repository: theosera/obsidian-ai-pipeline
Length of output: 896
Make the data field optional in XBookmarkPage.
The X API v2 bookmarks endpoint omits the data field entirely when no bookmarks are returned. Declaring it as required creates a type-runtime mismatch: downstream code that iterates over page.data will crash when handling empty result sets. The codebase already shows runtime awareness of this pattern elsewhere (e.g., payload.data ?? []), but the type definition doesn't reflect the actual API contract.
Suggested type fix
export interface XBookmarkPage {
- data: XPost[];
+ data?: XPost[];
includes?: { users?: XAuthor[] };
meta?: {
result_count?: number;
next_token?: string;
};
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/types/shared.ts` around lines 43 - 50, Change the
XBookmarkPage interface so the data property is optional (data?: XPost[]) to
match the X API v2 behavior when no bookmarks are returned; update any code that
assumes page.data is always present to handle undefined (e.g., use page.data ??
[] where iteration occurs) so callers like consumers of XBookmarkPage won't
crash when the API omits data.
| private async getPaginated(pathname: string): Promise<XBookmarkPage[]> { | ||
| const pages: XBookmarkPage[] = []; | ||
| let token: string | undefined; | ||
|
|
||
| do { | ||
| const query = new URLSearchParams({ | ||
| max_results: BOOKMARK_FIELDS.maxResults, | ||
| "tweet.fields": BOOKMARK_FIELDS.tweetFields, | ||
| expansions: BOOKMARK_FIELDS.expansions, | ||
| "user.fields": BOOKMARK_FIELDS.userFields | ||
| }); | ||
| if (token) { | ||
| query.set("pagination_token", token); | ||
| } | ||
|
|
||
| const page = await this.request<XBookmarkPage>(pathname, query); | ||
| pages.push(page); | ||
| token = page.meta?.next_token; | ||
| } while (token); | ||
|
|
||
| return pages; | ||
| } |
There was a problem hiding this comment.
Add pagination safety to getPaginated to match getBookmarkFolders.
Unlike getBookmarkFolders (lines 46-73), this loop has no maxPages cap and no repeated-next_token detection. A misbehaving/response-cached upstream returning the same next_token — or a very large bookmark set — will cause an unbounded loop or unbounded memory growth on the request thread. Since this is invoked for both getBookmarksAll and getBookmarksByFolder during sync, the blast radius is the whole sync run.
🛡️ Proposed fix
private async getPaginated(pathname: string): Promise<XBookmarkPage[]> {
const pages: XBookmarkPage[] = [];
let token: string | undefined;
+ const seenTokens = new Set<string>();
+ const maxPages = 1000;
+ let pageCount = 0;
do {
+ if (pageCount >= maxPages) {
+ throw new Error(`Bookmark pagination exceeded limit (${maxPages}) for ${pathname}.`);
+ }
+ pageCount += 1;
const query = new URLSearchParams({
max_results: BOOKMARK_FIELDS.maxResults,
"tweet.fields": BOOKMARK_FIELDS.tweetFields,
expansions: BOOKMARK_FIELDS.expansions,
"user.fields": BOOKMARK_FIELDS.userFields
});
if (token) {
query.set("pagination_token", token);
}
const page = await this.request<XBookmarkPage>(pathname, query);
pages.push(page);
- token = page.meta?.next_token;
+ const nextToken = page.meta?.next_token;
+ if (nextToken && seenTokens.has(nextToken)) {
+ throw new Error(`Bookmark pagination repeated a next_token for ${pathname}. Stopping to avoid loop.`);
+ }
+ if (nextToken) seenTokens.add(nextToken);
+ token = nextToken;
} while (token);
return pages;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/x-api/client.ts` around lines 88 - 109, getPaginated lacks
the safety checks present in getBookmarkFolders and can loop indefinitely;
update getPaginated (used by getBookmarksAll and getBookmarksByFolder) to
enforce a max page cap and detect repeated pagination tokens: introduce a
MAX_PAGES constant (same or similar value used in getBookmarkFolders), track the
previous token(s) and break if the next_token is unchanged or if pageCount
exceeds MAX_PAGES, and surface/log a clear error or throw when the cap is hit so
upstream sync can handle the failure gracefully.
`pnpm build --ignore-workspace` forwards --ignore-workspace as an arg to the underlying `vite build` invocation, which errors out with `Unknown option --ignoreWorkspace`. To treat it as a pnpm flag it has to precede the subcommand: `pnpm --ignore-workspace run build`. This bug was inherited unchanged from main during PR #22 conflict resolution; confirmed reproducible locally (vite 5.4.21, pnpm 10.33.0).
PR #22 introduced --no-frozen-lockfile to work around a drift between package.json and pnpm-lock.yaml on its branch. With the lockfile regenerated cleanly in cc8a369 the drift is gone, so we can restore the lockfile consistency gate. --frozen-lockfile makes CI fail fast when package.json moves without a matching lockfile update, preventing silent dependency re-resolution. Verified locally: clean `rm -rf node_modules && pnpm install --frozen-lockfile` succeeds; `pnpm test` still 120/120. Addresses chatgpt-codex-connector[bot] review on PR #23.
Resolve PR #22 conflicts + fix pre-existing workspace typecheck breakage
Motivation
apps/authandapps/synccan share logic.Description
pnpm-workspace.yaml) and new packages:packages/coreplus appsapps/authandapps/sync, with TypeScript build and typecheck configs (tsconfig.*,tsconfig.base.json).config/env.ts), PKCE (auth/pkce.ts), token storage (auth/token-storage.ts), OAuth client (auth/x-oauth-client.ts), X API client (x-api/client.ts), path resolver (path/x-bookmark-path-resolver.ts), markdown builders (markdown/markdown-builder.ts), folder-grouping/tokenizer/stopwords/detector/renderer/writer, and FS helpers (fs/file-utils.ts).apps/authHTTP PKCE auth server (src/server.ts) andapps/syncCLI scripts (src/sync.ts,src/propose-grouping.ts,src/approve-grouping.ts) to obtain tokens and sync bookmarks into the Obsidian vault using the core utilities.dev:auth,sync,propose:grouping,approve:grouping,build,typecheck), add.env.exampleand README documentation for the X Bookmarks workspace, update.gitignoreto allow workspace artifacts, addx_folder_mapping.jsonplaceholder, and adjustpnpm-lock.yamland GitHub CI (.github/workflows/ci.yml) install/build steps.Testing
pnpm test(security tests) as configured in CI and the run completed successfully.pnpm exec tsc --noEmitand succeeded for the modified codebase.pnpm run buildin thechrome-extensionworkspace during CI without errors.dev:authandsyncscripts (build -> run) during development.Codex Task
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores