Skip to content

feat(sdk): add preview config to createFileSystemSerdes#523

Open
ParidelPooya wants to merge 10 commits intomainfrom
feat/filesystem-serdes-preview
Open

feat(sdk): add preview config to createFileSystemSerdes#523
ParidelPooya wants to merge 10 commits intomainfrom
feat/filesystem-serdes-preview

Conversation

@ParidelPooya
Copy link
Copy Markdown
Contributor

Add optional preview configuration that stores a subset of the serialized value inline in the checkpoint envelope alongside the file pointer. This makes data visible in the console and API without reading the full file.

Preview is generated whenever data is written to a file (both ALWAYS mode and OVERFLOW mode when payload exceeds threshold). Inline payloads in OVERFLOW mode do not get a preview since the full data is already in the checkpoint.

New types:

  • PreviewMode (INCLUDE_ALL | EXCLUDE_ALL): default visibility strategy
  • FieldMatchMode (ANYWHERE | PATH): how field names are matched
  • PreviewField: { name, match? } field selector
  • PreviewConfig: { mode, include?, exclude?, mask?, maskString?, maxPreviewBytes? }

Priority rules:

  • exclude always wins (even over mask)
  • mask implies visibility — masked fields are shown (with maskString) unless excluded
  • maxPreviewBytes (default 4096) caps the preview size

Also renames FileSystemSerdesConfig.mode to storageMode for clarity and adds full unit test coverage for all preview behaviors.

Add optional preview configuration that stores a subset of the serialized
value inline in the checkpoint envelope alongside the file pointer. This
makes data visible in the console and API without reading the full file.

Preview is generated whenever data is written to a file (both ALWAYS mode
and OVERFLOW mode when payload exceeds threshold). Inline payloads in
OVERFLOW mode do not get a preview since the full data is already in the
checkpoint.

New types:
- PreviewMode (INCLUDE_ALL | EXCLUDE_ALL): default visibility strategy
- FieldMatchMode (ANYWHERE | PATH): how field names are matched
- PreviewField: { name, match? } field selector
- PreviewConfig: { mode, include?, exclude?, mask?, maskString?, maxPreviewBytes? }

Priority rules:
- exclude always wins (even over mask)
- mask implies visibility — masked fields are shown (with maskString) unless excluded
- maxPreviewBytes (default 4096) caps the preview size

Also renames FileSystemSerdesConfig.mode to storageMode for clarity and
adds full unit test coverage for all preview behaviors.
… keys

buildPreview now mirrors the original object structure in the preview,
creating intermediate objects as needed. This is more readable and
consistent with the original data shape.
Guard against dangerous keys (__proto__, constructor, prototype) in
setNestedValue to prevent prototype pollution when building preview objects
from user-controlled field names.
Check all path segments upfront before traversal, use hasOwnProperty
instead of 'in' operator, and create intermediate objects with
Object.create(null) to eliminate prototype chain entirely.
Replace imperative traversal with reduceRight to build nested structure
from inside out, then merge via deepMerge. This eliminates the dynamic
obj[userInput] assignment pattern that static analyzers flag as prototype
pollution risk, while the DANGEROUS_KEYS upfront check remains as defense
in depth.
Refactor buildPreview to collect flat (path, value) pairs first, then
build the nested result using reduceRight + JSON spread — no mutation,
no dynamic obj[key] assignment. Remove deepMerge and setNestedValue
helpers entirely. This eliminates all patterns that static analyzers
flag as prototype pollution risk.
…d of O(n²))

Track maxPreviewBytes budget incrementally using flat path as key estimate
instead of re-serializing the full accumulated object for each field.
Build the nested result once from accepted pairs at the end.
Replace reduce+JSON.parse with direct O(1)-per-field traversal.
Keys are safe at this point — dangerous keys were already filtered
during the collect phase.
@ParidelPooya ParidelPooya marked this pull request as ready for review May 1, 2026 21:11
Copy link
Copy Markdown

@yaythomas yaythomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relatively complicated.

I wonder if just providing the caller with the ability to specify their own transform means the SDK can be less opinionated about this?

   export interface FileSystemSerdesConfig {
     storageMode?: FileSystemSerdesMode;
     preview?: (value: unknown) => unknown;
     maxPreviewBytes?: number;   // keep as sanity guard, default 4096
   }

Usage:

   createFileSystemSerdes("/mnt/s3", {
     preview: (v) => ({ id: v.id, status: v.status, email: "***" }),
   });

   // or with a utility:
   import { omit } from "lodash";
   createFileSystemSerdes("/mnt/s3", {
     preview: (v) => omit(v, ["password", "ssn"]),
   });

You could combine with a helper for masking:

      export const maskFields = (keys: string[], maskString = "***") =>
        (v: any) => keys.reduce((acc, k) => ({ ...acc, [k]: maskString }), v);

e.g

   maskFields(["ssn"]).

* filesystem via S3 Files, enabling durable, shared state across invocations
* and parallel function instances without checkpoint size constraints.
*
/** @internal */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray /**

if (
obj[key] !== null &&
typeof obj[key] === "object" &&
!Array.isArray(obj[key])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this !Array mean { items: [{ secret: "xyz" }] } would not mask?

if (obj === null || typeof obj !== "object") return;
for (const key of Object.keys(obj)) {
if (DANGEROUS_KEYS.has(key)) continue;
const path = pathPrefix ? `${pathPrefix}.${key}` : key;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ user: { email: "x" } } and { "user.email": "x" } will end up as user.email?

So PATH and ANYWHERE matches on either.

  if (mode === FieldMatchMode.PATH) {
    return path === field.name;
  }
...
 return path.split(".").includes(field.name);

What then happens on the rebuild?

let node = result;
for (let i = 0; i < parts.length - 1; i++) {
if (typeof node[parts[i]] !== "object" || node[parts[i]] === null) {
node[parts[i]] = {};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you had result.user = "arb", and then later user.email, this ends up as node['user'] = {}, and you lose arb?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants