Skip to content

feat(Default Data Loader Node): Add default text splitter #15786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 3, 2025

Conversation

nikhilkuria
Copy link
Contributor

@nikhilkuria nikhilkuria commented May 28, 2025

Summary

This PR makes it easy for users to start with Simple Vector Store.
Currently, users have to set up the following nodes in succession

  • Simple Vector Store
  • Document Loader
  • Text Splitter

This makes the users confused on which properties to pick in each of these

Screenshot 2025-05-30 at 15 20 18

To make it simpler, we add a new property to the Document Loaders.
Screenshot 2025-05-30 at 15 26 48

With the "Simple" option, we pick a Text Splitter by default.

Screenshot 2025-05-30 at 15 21 46

With the "Custom" option, the users can pick a Text Splitter of their choice

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/ADO-3519/default-text-splitting-in-the-data-loaders

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

@n8n-assistant n8n-assistant bot added the n8n team Authored by the n8n team label May 28, 2025
@nikhilkuria nikhilkuria changed the title feat(DataLoader Node): add default text splitter feat(Default Data Loader Node): add default text splitter May 28, 2025
@nikhilkuria nikhilkuria changed the title feat(Default Data Loader Node): add default text splitter feat(Default Data Loader Node): Add default text splitter May 28, 2025
Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 46.66667% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...efaultDataLoader/DocumentDefaultDataLoader.node.ts 46.66% 7 Missing and 1 partial ⚠️
.../DocumentGithubLoader/DocumentGithubLoader.node.ts 46.66% 7 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@nikhilkuria nikhilkuria reopened this May 30, 2025
@nikhilkuria nikhilkuria marked this pull request as ready for review June 2, 2025 12:56
@nikhilkuria nikhilkuria requested review from a team and dariacodes June 2, 2025 13:10
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 16 issues across 8 files. Review them in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 16 issues across 8 files. Review them in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

dariacodes
dariacodes previously approved these changes Jun 3, 2025
Copy link
Contributor

@dariacodes dariacodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the high-level code looks good, tested it locally - it works! 🚀

@shortstacked
Copy link
Contributor

Workflow Test Results 📊 🔴 1 Failed, ⚠️ 4 Warnings, 👍 78 Successful out of 83 total workflows.

Detail: Workflows failing: 237: Workflow contains 1 deleted data. View full workflow run

Tested Ref: 7fcc7f591882fee2025a14e79247ffb2b24cdf2f by @dariacodes

❌ Failed Tests (1)

Workflow ID Workflow Name Reason
237 BasicLLMChain:AzureChat Workflow contains 1 deleted data.

⚠️ Warnings (4)

Workflow ID Workflow Name Reason
35 Slack:User:getPresence info:UserProfile:get update... Workflow contains new data that previously did not exist.
53 ConvertKit:CustomField:create getAll update delete... Workflow contains new data that previously did not exist.
257 Agent:auto-fix:anthropic Workflow contains new data that previously did not exist.
48 Asana:Project:getAll get:Task:create update move g... Workflow contains new data that previously did not exist.

Copy link
Contributor

github-actions bot commented Jun 3, 2025

✅ All Cypress E2E specs passed

@nikhilkuria
Copy link
Contributor Author

@dariacodes I had to make a change to how the nodes are versioned to fix a bug. Can you please review again?

dariacodes
dariacodes previously approved these changes Jun 3, 2025
@shortstacked
Copy link
Contributor

Workflow Test Results 📊 🔴 1 Failed, ⚠️ 4 Warnings, 👍 78 Successful out of 83 total workflows.

Detail: Workflows failing: 237: Workflow contains 1 deleted data. View full workflow run

Tested Ref: c2ff14bd1e6b0b550cd58a7a83a957954edf0264 by @dariacodes

❌ Failed Tests (1)

Workflow ID Workflow Name Reason
237 BasicLLMChain:AzureChat Workflow contains 1 deleted data.

⚠️ Warnings (4)

Workflow ID Workflow Name Reason
35 Slack:User:getPresence info:UserProfile:get update... Workflow contains new data that previously did not exist.
53 ConvertKit:CustomField:create getAll update delete... Workflow contains new data that previously did not exist.
257 Agent:auto-fix:anthropic Workflow contains new data that previously did not exist.
48 Asana:Project:getAll get:Task:create update move g... Workflow contains new data that previously did not exist.

Copy link
Contributor

github-actions bot commented Jun 3, 2025

✅ All Cypress E2E specs passed

Copy link
Contributor

@Cadiac Cadiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, tested the functionality and it seemed to work. The "is the new field present" based approach on the inputs & version checks on supplyData is definitely the way to go here, splitting this into two classes would have been overkill here - we usually do smaller node changes like you've done here. 👍

export class DocumentDefaultDataLoader implements INodeType {
description: INodeTypeDescription = {
displayName: 'Default Data Loader',
name: 'documentDefaultDataLoader',
icon: 'file:binary.svg',
group: ['transform'],
version: 1,
version: [1, 1.1],
defaultVersion: 1.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we don't usually define the defaultVersion when we're defining components like this, by default the latest version is used!

inputs.push({
displayName: 'Text Splitter',
maxConnections: 1,
type: 'ai_textSplitter',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downside of defining this as a function and calling toString() on it instead of inlining it is that you can't use enum values here, like you discovered.

This is why many of our nodes seem to just inline these,

https://github.com/n8n-io/n8n/pull/15915/files#diff-76afc25d81dbfea47a75d0ae4b1555db59b162ade6c650e6ff9baabb320a8ae6R53-R63

with that trick you can still use the enum values, as long as you pass those to the template literal.

But we're unlikely to ever change these enum values and your solution has the benefit of having the function typed so in my opinion we can keep this too, we seem to follow this pattern at ChainLLM node for instance too. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you added tests for node this!

@shortstacked
Copy link
Contributor

Workflow Test Results 📊 🔴 1 Failed, ⚠️ 3 Warnings, 👍 79 Successful out of 83 total workflows.

Detail: Workflows failing: 237: Workflow contains 1 deleted data. View full workflow run

Tested Ref: 9a66636110445a8253534ef5d30354919fd4d4a1 by @Cadiac

❌ Failed Tests (1)

Workflow ID Workflow Name Reason
237 BasicLLMChain:AzureChat Workflow contains 1 deleted data.

⚠️ Warnings (3)

Workflow ID Workflow Name Reason
35 Slack:User:getPresence info:UserProfile:get update... Workflow contains new data that previously did not exist.
257 Agent:auto-fix:anthropic Workflow contains new data that previously did not exist.
48 Asana:Project:getAll get:Task:create update move g... Workflow contains new data that previously did not exist.

Copy link
Contributor

github-actions bot commented Jun 3, 2025

✅ All Cypress E2E specs passed

@nikhilkuria nikhilkuria merged commit 40850c9 into master Jun 3, 2025
36 checks passed
@nikhilkuria nikhilkuria deleted the feat-ado-3519-default-text-splitting-data-loaders branch June 3, 2025 15:14
Alexandero89 pushed a commit to Alexandero89/n8n that referenced this pull request Jun 4, 2025
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Alexandero89 pushed a commit to Alexandero89/n8n that referenced this pull request Jun 4, 2025
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
@janober
Copy link
Member

janober commented Jun 11, 2025

Got released with [email protected]

TianYi0217 pushed a commit to TianYi0217/n8n that referenced this pull request Jun 14, 2025
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
n8n team Authored by the n8n team Released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants