Skip to content

Avoid unnecessary string join and split #129370

Closed
@rebornix

Description

@rebornix

function convertStreamOutput(output: NotebookCellOutput): JupyterOutput {
const outputs = output.items
.filter((opit) => opit.mime === CellOutputMimeTypes.stderr || opit.mime === CellOutputMimeTypes.stdout)
.map((opit) => convertOutputMimeToJupyterOutput(opit.mime, opit.data as Uint8Array) as string)
.reduceRight<string[]>((prev, curr) => (Array.isArray(curr) ? prev.concat(...curr) : prev.concat(curr)), []);
const streamType = getOutputStreamType(output) || 'stdout';
return {
output_type: 'stream',
name: streamType,
text: splitMultilineString(outputs.join(''))
};
}

Currently the stream output conversion (from VS Code types to Jupyter) does unnecessary V8 string concatenation and split, which slows down the conversion (using more memory and gc):

  • Stream output is either CellOutputMimeTypes.stderr or CellOutputMimeTypes.stdout, so convertOutputMimeToJupyterOutput will always return string. Using prev.concat(curr) will keep creating arrays
  • splitMultilineString(outputs.join('')) can slow down the process significantly. It firstly joins all the string, and then split by line breaks, this will trigger v8 to flatten the concatenated string (outputs.join('')) and double the memory usage.

We can probably run splitMultilineString on each output and concatenate last line of each output with the first line of next output (split first, then join).

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions