|
| 1 | +# JSON Schema to Zod Converter Architecture |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document describes the modular architecture for converting JSON Schema to Zod schemas. The architecture reflects JSON Schema's compositional nature, where each property adds independent constraints that combine to form the complete validation. |
| 6 | + |
| 7 | +## Core Concepts |
| 8 | + |
| 9 | +### Two-Phase Processing |
| 10 | + |
| 11 | +The converter operates in two distinct phases based on a fundamental architectural principle: |
| 12 | + |
| 13 | +**🔑 Key Principle: Refinement handlers should only be used when Zod doesn't support the operations natively.** |
| 14 | + |
| 15 | +1. **Primitive Phase**: Handlers that use Zod's built-in constraint methods (`.min()`, `.max()`, `.regex()`, etc.) |
| 16 | +2. **Refinement Phase**: Handlers that add custom validation logic through Zod's `.refine()` method for operations Zod cannot express natively |
| 17 | + |
| 18 | +### Type Schemas |
| 19 | + |
| 20 | +During the first phase, we maintain a `TypeSchemas` object that tracks the state of each possible type: |
| 21 | + |
| 22 | +```typescript |
| 23 | +interface TypeSchemas { |
| 24 | + string?: z.ZodTypeAny | false; |
| 25 | + number?: z.ZodTypeAny | false; // integers are numbers with .int() constraint |
| 26 | + boolean?: z.ZodTypeAny | false; |
| 27 | + null?: z.ZodNull | false; |
| 28 | + array?: z.ZodArray<any> | false; |
| 29 | + tuple?: z.ZodTuple<any> | false; |
| 30 | + object?: z.ZodObject<any> | false; |
| 31 | +} |
| 32 | +``` |
| 33 | + |
| 34 | +Each type can be in one of three states: |
| 35 | +- `undefined`: Type is still allowed (no constraints have excluded it) |
| 36 | +- `false`: Type is explicitly disallowed |
| 37 | +- `z.Zod*`: Type with accumulated constraints (including literals, enums, and unions) |
| 38 | + |
| 39 | +## Architecture Components |
| 40 | + |
| 41 | +### 1. Primitive Handlers |
| 42 | + |
| 43 | +These handlers operate during the first phase and modify the `TypeSchemas`: |
| 44 | + |
| 45 | +```typescript |
| 46 | +interface PrimitiveHandler { |
| 47 | + apply(types: TypeSchemas, schema: JSONSchema): void; |
| 48 | +} |
| 49 | +``` |
| 50 | + |
| 51 | +#### Implemented Primitive Handlers: |
| 52 | +- **TypeHandler**: Sets types to `false` if not in the `type` array |
| 53 | +- **ConstHandler**: Handles const values by creating literals |
| 54 | +- **EnumHandler**: Handles enum validation with appropriate Zod types |
| 55 | +- **ImplicitStringHandler**: Enables string type when string constraints are present without explicit type |
| 56 | +- **MinLengthHandler**: Applies `.min()` to string (Zod native support) |
| 57 | +- **MaxLengthHandler**: Applies `.max()` to string (Zod native support) |
| 58 | +- **PatternHandler**: Applies `.regex()` to string (Zod native support) |
| 59 | +- **MinimumHandler**: Applies `.min()` to number (Zod native support) |
| 60 | +- **MaximumHandler**: Applies `.max()` to number (Zod native support) |
| 61 | +- **ExclusiveMinimumHandler**: Applies `.gt()` to number (Zod native support) |
| 62 | +- **ExclusiveMaximumHandler**: Applies `.lt()` to number (Zod native support) |
| 63 | +- **MultipleOfHandler**: Applies `.multipleOf()` to number (Zod native support) |
| 64 | +- **MinItemsHandler**: Applies `.min()` to array (Zod native support) |
| 65 | +- **MaxItemsHandler**: Applies `.max()` to array (Zod native support) |
| 66 | +- **ItemsHandler**: Configures array element validation if arrays still allowed |
| 67 | +- **TupleHandler**: Detects tuple arrays and marks them as tuple type |
| 68 | +- **PropertiesHandler**: Creates initial object schema with known properties |
| 69 | + |
| 70 | +### 2. Refinement Handlers |
| 71 | + |
| 72 | +These handlers operate during the second phase on the combined schema: |
| 73 | + |
| 74 | +```typescript |
| 75 | +interface RefinementHandler { |
| 76 | + apply(zodSchema: z.ZodTypeAny, schema: JSONSchema): z.ZodTypeAny; |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +#### Implemented Refinement Handlers: |
| 81 | + |
| 82 | +**✅ Legitimate Refinement Handlers (Zod doesn't support natively):** |
| 83 | +- **UniqueItemsHandler**: Custom validation for array uniqueness (Zod has no native unique constraint) |
| 84 | +- **NotHandler**: Complex logical negation validation (Zod has no native `.not()`) |
| 85 | +- **AllOfHandler**: Complex schema intersection logic |
| 86 | +- **AnyOfHandler**: Complex anyOf validation logic |
| 87 | +- **OneOfHandler**: Complex oneOf validation (exactly one must match) |
| 88 | +- **EnumComplexHandler**: Complex object/array equality in enums |
| 89 | +- **ConstComplexHandler**: Complex object/array equality for const values |
| 90 | +- **MetadataHandler**: Description and title annotations |
| 91 | + |
| 92 | +**⚠️ Edge Case Handlers (legitimate but specific):** |
| 93 | +- **ProtoRequiredHandler**: Special handler for `__proto__` security protection |
| 94 | +- **EmptyEnumHandler**: Handles empty enum arrays (always invalid) |
| 95 | +- **EnumNullHandler**: Handles null in enum when type doesn't include null |
| 96 | +- **PrefixItemsHandler**: Handles Draft 2020-12 prefixItems validation |
| 97 | +- **ObjectPropertiesHandler**: Object validation for union types (primitive work moved to PropertiesHandler) |
| 98 | + |
| 99 | +## Processing Flow |
| 100 | + |
| 101 | +### Phase 1: Type-Specific Constraints |
| 102 | + |
| 103 | +1. Initialize empty `TypeSchemas` object |
| 104 | +2. Run all primitive handlers in sequence |
| 105 | +3. Each handler: |
| 106 | + - Checks if it has relevant constraints in the schema |
| 107 | + - For each type it affects, checks if that type is still allowed (`!== false`) |
| 108 | + - If allowed and constraint applies, either: |
| 109 | + - Creates initial type schema if `undefined` |
| 110 | + - Adds constraints to existing type schema |
| 111 | + |
| 112 | +### Phase 2: Build Union and Apply Refinements |
| 113 | + |
| 114 | +1. Convert remaining `undefined` types to their most permissive schemas: |
| 115 | + - `string` → `z.string()` |
| 116 | + - `number` → `z.number()` |
| 117 | + - `array` → `z.array(z.any())` |
| 118 | + - `tuple` → handled by TupleItemsHandler |
| 119 | + - `object` → `z.object({}).passthrough()` |
| 120 | + - etc. |
| 121 | + |
| 122 | +2. Filter out `false` types and create union of allowed types: |
| 123 | + - 0 types → `z.never()` |
| 124 | + - 1 type → that type's schema |
| 125 | + - 2+ types → `z.union([...])` |
| 126 | + |
| 127 | +3. Run all refinement handlers on the resulting schema |
| 128 | + |
| 129 | +## Example: Processing a Complex Schema |
| 130 | + |
| 131 | +Given this JSON Schema: |
| 132 | +```json |
| 133 | +{ |
| 134 | + "type": ["string", "number"], |
| 135 | + "minimum": 5, |
| 136 | + "minLength": 3, |
| 137 | + "pattern": "^[A-Z]", |
| 138 | + "uniqueItems": true |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +**Phase 1 (Primitive Handlers):** |
| 143 | +1. TypeHandler: marks `boolean`, `null`, `array`, `object` as `false` |
| 144 | +2. MinimumHandler: sets `number` to `z.number().min(5)` |
| 145 | +3. MinLengthHandler: sets `string` to `z.string().min(3)` |
| 146 | +4. PatternHandler: updates `string` to `z.string().min(3).regex(/^[A-Z]/)` |
| 147 | + |
| 148 | +**Result after Phase 1:** |
| 149 | +```typescript |
| 150 | +{ |
| 151 | + string: z.string().min(3).regex(/^[A-Z]/), |
| 152 | + number: z.number().min(5), |
| 153 | + boolean: false, |
| 154 | + null: false, |
| 155 | + array: false, |
| 156 | + tuple: undefined, |
| 157 | + object: false |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +**Phase 2:** |
| 162 | +1. Create union: `z.union([z.string().min(3).regex(/^[A-Z]/), z.number().min(5)])` |
| 163 | +2. UniqueItemsHandler: Adds refinement (only validates for arrays, but none allowed here) |
| 164 | + |
| 165 | +## Implementation Status |
| 166 | + |
| 167 | +### Test Results |
| 168 | +- **Total tests**: 1355 (999 active, 356 skipped) |
| 169 | +- **Passing**: 999 tests |
| 170 | +- **Failing**: 0 tests |
| 171 | +- **Skipped**: 356 tests (JSON Schema features not supported by Zod) |
| 172 | + |
| 173 | +### Known Limitations |
| 174 | +1. **`__proto__` property validation**: Zod's `passthrough()` strips this property for security. Solved with ProtoRequiredHandler using `z.any()` when `__proto__` is required. |
| 175 | +2. **Unicode grapheme counting**: JavaScript uses UTF-16 code units instead of grapheme clusters. Test added to skip list as platform limitation. |
| 176 | +3. **Complex schema combinations**: Some edge cases with deeply nested `allOf`, `anyOf`, `oneOf` combinations may not perfectly match JSON Schema semantics. |
| 177 | + |
| 178 | +## Benefits |
| 179 | + |
| 180 | +1. **Modularity**: Each JSON Schema keyword is handled by a dedicated handler |
| 181 | +2. **Composability**: Handlers don't need to know about each other |
| 182 | +3. **Type Safety**: Type-specific constraints are only applied to appropriate types |
| 183 | +4. **Extensibility**: New keywords can be supported by adding new handlers |
| 184 | +5. **Maintainability**: Clear separation between constraint types |
| 185 | +6. **Correctness**: Reflects JSON Schema's additive constraint model |
| 186 | +7. **Testability**: Each handler can be tested independently |
| 187 | +8. **Performance**: Native Zod operations are faster than custom refinements |
| 188 | +9. **Better Type Inference**: Primitive handlers create proper Zod types with built-in validation |
| 189 | +10. **Architectural Clarity**: Clear distinction between schema construction vs. custom validation |
| 190 | + |
| 191 | +## Implementation Guidelines |
| 192 | + |
| 193 | +### Adding a New Primitive Handler |
| 194 | + |
| 195 | +**Use primitive handlers when Zod has native support for the constraint (e.g., `.min()`, `.max()`, `.regex()`).** |
| 196 | + |
| 197 | +1. Determine which type(s) the constraint affects |
| 198 | +2. Create handler that checks if those types are still allowed |
| 199 | +3. Apply constraints using Zod's built-in methods (prefer native over custom logic) |
| 200 | +4. Add type guards when working with `z.ZodTypeAny` to ensure type safety |
| 201 | +5. Consider if the constraint should enable a type implicitly (like `ImplicitStringHandler`) |
| 202 | + |
| 203 | +Example: |
| 204 | +```typescript |
| 205 | +export class MyConstraintHandler implements PrimitiveHandler { |
| 206 | + apply(types: TypeSchemas, schema: JSONSchema.BaseSchema): void { |
| 207 | + const mySchema = schema as JSONSchema.MySchema; |
| 208 | + if (mySchema.myConstraint === undefined) return; |
| 209 | + |
| 210 | + if (types.string !== false) { |
| 211 | + const currentString = types.string || z.string(); |
| 212 | + if (currentString instanceof z.ZodString) { |
| 213 | + types.string = currentString.myMethod(mySchema.myConstraint); |
| 214 | + } |
| 215 | + } |
| 216 | + } |
| 217 | +} |
| 218 | +``` |
| 219 | + |
| 220 | +### Adding a New Refinement Handler |
| 221 | + |
| 222 | +**Only use refinement handlers when Zod doesn't support the operation natively.** |
| 223 | + |
| 224 | +1. Use for constraints that: |
| 225 | + - **Cannot be expressed with Zod's built-in constraints** (e.g., uniqueItems, complex object equality) |
| 226 | + - Apply complex logical operations (e.g., not, anyOf, oneOf) |
| 227 | + - Require custom validation across multiple types |
| 228 | + - Handle edge cases or security concerns |
| 229 | +2. Handler receives the complete schema after type union |
| 230 | +3. Return schema with added `.refine()` validation |
| 231 | +4. **Avoid using refinements for operations Zod supports natively** (e.g., string length, number ranges) |
| 232 | + |
| 233 | +Example: |
| 234 | +```typescript |
| 235 | +export class MyRefinementHandler implements RefinementHandler { |
| 236 | + apply(zodSchema: z.ZodTypeAny, schema: JSONSchema.BaseSchema): z.ZodTypeAny { |
| 237 | + if (!schema.myConstraint) return zodSchema; |
| 238 | + |
| 239 | + return zodSchema.refine( |
| 240 | + (value: any) => { |
| 241 | + // Custom validation logic |
| 242 | + return validateMyConstraint(value, schema.myConstraint); |
| 243 | + }, |
| 244 | + { message: "Value does not satisfy myConstraint" } |
| 245 | + ); |
| 246 | + } |
| 247 | +} |
| 248 | +``` |
| 249 | + |
| 250 | +### Handler Order |
| 251 | + |
| 252 | +- **Primitive handlers**: Order matters for some handlers: |
| 253 | + - ConstHandler and EnumHandler should run before TypeHandler |
| 254 | + - ImplicitStringHandler should run before other string handlers |
| 255 | + - TupleHandler should run before ItemsHandler |
| 256 | + - Others can run in any order (they're independent) |
| 257 | + |
| 258 | +- **Refinement handlers**: Should be ordered by complexity/dependencies: |
| 259 | + - Special cases first (ProtoRequiredHandler, EmptyEnumHandler) |
| 260 | + - Logical combinations (AllOf, AnyOf, OneOf) |
| 261 | + - Type-specific refinements (TupleItems, ArrayItems, ObjectProperties) |
| 262 | + - General refinements (Not, UniqueItems) |
| 263 | + - Metadata handlers last |
| 264 | + |
| 265 | +## Future Enhancements |
| 266 | + |
| 267 | +1. **Additional JSON Schema Keywords**: Support for more keywords like `dependencies`, `if/then/else`, `contentMediaType`, etc. |
| 268 | +2. **Performance Optimization**: Cache converted schemas for repeated conversions |
| 269 | +3. **Better Error Messages**: Provide more descriptive validation error messages |
| 270 | +4. **Schema Version Support**: Handle different JSON Schema draft versions |
| 271 | +5. **Bidirectional Conversion**: Improve Zod to JSON Schema conversion fidelity |
| 272 | + |
| 273 | +## Architectural Evolution |
| 274 | + |
| 275 | +### Key Insight: Native vs. Custom Validation |
| 276 | + |
| 277 | +During development, we discovered that **many operations initially implemented as refinement handlers should actually be primitive handlers** because Zod supports them natively. This led to a major architectural insight: |
| 278 | + |
| 279 | +**❌ Anti-pattern: Using refinements for Zod-native operations** |
| 280 | +```typescript |
| 281 | +// WRONG: Using refinement for string length (Zod supports .min() natively) |
| 282 | +return zodSchema.refine( |
| 283 | + (value: any) => typeof value !== "string" || value.length >= minLength, |
| 284 | + { message: "String too short" } |
| 285 | +); |
| 286 | +``` |
| 287 | + |
| 288 | +**✅ Correct pattern: Using primitive handlers for Zod-native operations** |
| 289 | +```typescript |
| 290 | +// CORRECT: Using Zod's native .min() method |
| 291 | +if (types.string !== false) { |
| 292 | + types.string = (types.string || z.string()).min(minLength); |
| 293 | +} |
| 294 | +``` |
| 295 | + |
| 296 | +### Migration Examples |
| 297 | + |
| 298 | +1. **String Constraints**: Moved from `StringConstraintsHandler` (refinement) to `ImplicitStringHandler` + existing primitive handlers |
| 299 | +2. **Array Items**: Moved from `ArrayItemsHandler` (refinement) to enhanced `ItemsHandler` (primitive) + `PrefixItemsHandler` (refinement for edge cases) |
| 300 | +3. **Tuple Handling**: Moved from `TupleItemsHandler` (refinement) to `TupleHandler` (primitive) |
| 301 | + |
| 302 | +### Benefits of This Evolution |
| 303 | + |
| 304 | +- **Performance**: Native Zod methods are faster than custom refinements |
| 305 | +- **Type Safety**: Better TypeScript inference with proper Zod types |
| 306 | +- **Maintainability**: Less custom validation code to maintain |
| 307 | +- **Coverage**: Eliminated unreachable code paths in refinement handlers |
| 308 | + |
| 309 | +## Conclusion |
| 310 | + |
| 311 | +The modular two-phase architecture successfully addresses the need for a clean, extensible design where each JSON Schema property is handled by independent modules. The key insight about **preferring native Zod operations over custom refinements** has significantly improved the architecture's performance, maintainability, and correctness. |
| 312 | + |
| 313 | +This approach makes the codebase more maintainable, testable, and easier to extend with new JSON Schema features while leveraging Zod's full capabilities. |
0 commit comments