Enhance identifier and string literal sanitization in BindingGenerator#1345
Enhance identifier and string literal sanitization in BindingGenerator#1345
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves the safety of generated TypeScript bindings by sanitizing spec-provided identifiers and escaping spec-provided strings when emitting double-quoted string literals, with added integration coverage to prevent regressions.
Changes:
- Expanded
sanitizeIdentifierto replace non-identifier characters and add a_unnamedfallback. - Added
escapeStringLiteraland applied it when emitting union tags and error-enum message strings. - Added integration tests covering quotes in string literals and sanitization of struct/enum identifiers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/bindings/utils.ts |
Updates identifier sanitization behavior and introduces string-literal escaping helper used by generators. |
src/bindings/types.ts |
Applies identifier sanitization to struct fields / enum case names and escapes spec-provided strings in emitted string literals. |
src/bindings/client.ts |
Sanitizes function input names (and fromJSON method keys) to avoid invalid TS identifiers in generated client surface. |
test/integration/bindings.test.ts |
Adds integration tests for escaping quotes and sanitizing identifiers across generated types. |
Comments suppressed due to low confidence (2)
src/bindings/types.ts:146
- Sanitizing struct field names can collapse distinct spec field names into the same identifier (e.g. differing punctuation/whitespace), which can lead to duplicate property declarations in the generated interface and makes it impossible to represent both fields accurately. Consider detecting collisions after
sanitizeIdentifierand disambiguating (or using quoted property names for the original field name and mapping to a safe alias if needed).
const fields = struct
.fields()
.map((field) => {
const fieldName = sanitizeIdentifier(field.name().toString());
const fieldType = parseTypeFromTypeDef(field.type());
const fieldDoc = formatJSDocComment(field.doc().toString(), 2);
return `${fieldDoc} ${fieldName}: ${fieldType};`;
})
src/bindings/types.ts:198
- Sanitizing enum case names can introduce duplicate members when multiple spec case names normalize to the same identifier, which will cause a TypeScript compile error for the generated enum. Consider checking for duplicates after sanitization and disambiguating (or using string-literal enum member names where appropriate).
const members = enumEntry
.cases()
.map((enumCase) => {
const caseName = sanitizeIdentifier(enumCase.name().toString());
const caseValue = enumCase.value();
const caseDoc = enumCase.doc().toString() || `Enum Case: ${caseName}`;
return `${formatJSDocComment(caseDoc, 2)} ${caseName} = ${caseValue}`;
})
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (isNameReserved(identifier)) { | ||
| // Append underscore to reserved | ||
| return identifier + "_"; | ||
| // Strip any characters that are not valid in JS/TS identifiers |
There was a problem hiding this comment.
The comment says this strips characters that are not valid JS/TS identifier characters, but the implementation only allows ASCII [a-zA-Z0-9_$] and replaces everything else (including many valid Unicode identifier characters) with _. Either adjust the comment to reflect the ASCII-only behavior, or consider a Unicode-aware identifier check if preserving valid Unicode identifiers is desired.
| // Strip any characters that are not valid in JS/TS identifiers | |
| // Strip any characters that are not allowed by this ASCII-based identifier pattern |
There was a problem hiding this comment.
The Soroban rust sdk only supports ascii identifiers so I think we should only cater to that encoding
|
Size Change: +41.3 kB (+0.09%) Total Size: 45.1 MB
|
| if (isNameReserved(identifier)) { | ||
| // Append underscore to reserved | ||
| return identifier + "_"; | ||
| // Strip any characters that are not valid in JS/TS identifiers |
sanitizeIdentifierto strip non-identifier characters and handle edge cases.escapeStringLiteralfor safe interpolation in string literals.