-
Notifications
You must be signed in to change notification settings - Fork 33.7k
feat(Default Data Loader Node): Add default text splitter #15786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Default Data Loader Node): Add default text splitter #15786
Conversation
…ttps://github.com/n8n-io/n8n into feat-ado-3519-default-text-splitting-data-loaders
Codecov ReportAttention: Patch coverage is 📢 Thoughts on this report? Let us know! |
…ttps://github.com/n8n-io/n8n into feat-ado-3519-default-text-splitting-data-loaders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic found 16 issues across 8 files. Review them in cubic.dev
React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai
to give specific feedback.
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Show resolved
Hide resolved
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Show resolved
Hide resolved
...hain/nodes/document_loaders/DocumentDefaultDataLoader/V1/DocumentDefaultDataLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...hain/nodes/document_loaders/DocumentDefaultDataLoader/V1/DocumentDefaultDataLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...odes-langchain/nodes/document_loaders/DocumentGithubLoader/V1/DocumentGithubLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic found 16 issues across 8 files. Review them in cubic.dev
React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai
to give specific feedback.
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Show resolved
Hide resolved
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...nodes/document_loaders/DocumentDefaultDataLoader/test/DocumentDefaultDataLoader.node.test.ts
Show resolved
Hide resolved
...hain/nodes/document_loaders/DocumentDefaultDataLoader/V1/DocumentDefaultDataLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...hain/nodes/document_loaders/DocumentDefaultDataLoader/V1/DocumentDefaultDataLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...odes-langchain/nodes/document_loaders/DocumentGithubLoader/V1/DocumentGithubLoaderV1.node.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Outdated
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Show resolved
Hide resolved
...langchain/nodes/document_loaders/DocumentGithubLoader/test/DocumentGithubLoader.node.test.ts
Show resolved
Hide resolved
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
…ttps://github.com/n8n-io/n8n into feat-ado-3519-default-text-splitting-data-loaders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the high-level code looks good, tested it locally - it works! 🚀
Workflow Test Results 📊 🔴 1 Failed,
|
Workflow ID | Workflow Name | Reason |
---|---|---|
237 | BasicLLMChain:AzureChat | Workflow contains 1 deleted data. |
⚠️ Warnings (4)
Workflow ID | Workflow Name | Reason |
---|---|---|
35 | Slack:User:getPresence info:UserProfile:get update... | Workflow contains new data that previously did not exist. |
53 | ConvertKit:CustomField:create getAll update delete... | Workflow contains new data that previously did not exist. |
257 | Agent:auto-fix:anthropic | Workflow contains new data that previously did not exist. |
48 | Asana:Project:getAll get:Task:create update move g... | Workflow contains new data that previously did not exist. |
✅ All Cypress E2E specs passed |
@dariacodes I had to make a change to how the nodes are versioned to fix a bug. Can you please review again? |
Workflow Test Results 📊 🔴 1 Failed,
|
Workflow ID | Workflow Name | Reason |
---|---|---|
237 | BasicLLMChain:AzureChat | Workflow contains 1 deleted data. |
⚠️ Warnings (4)
Workflow ID | Workflow Name | Reason |
---|---|---|
35 | Slack:User:getPresence info:UserProfile:get update... | Workflow contains new data that previously did not exist. |
53 | ConvertKit:CustomField:create getAll update delete... | Workflow contains new data that previously did not exist. |
257 | Agent:auto-fix:anthropic | Workflow contains new data that previously did not exist. |
48 | Asana:Project:getAll get:Task:create update move g... | Workflow contains new data that previously did not exist. |
✅ All Cypress E2E specs passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, tested the functionality and it seemed to work. The "is the new field present" based approach on the inputs & version checks on supplyData is definitely the way to go here, splitting this into two classes would have been overkill here - we usually do smaller node changes like you've done here. 👍
export class DocumentDefaultDataLoader implements INodeType { | ||
description: INodeTypeDescription = { | ||
displayName: 'Default Data Loader', | ||
name: 'documentDefaultDataLoader', | ||
icon: 'file:binary.svg', | ||
group: ['transform'], | ||
version: 1, | ||
version: [1, 1.1], | ||
defaultVersion: 1.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we don't usually define the defaultVersion
when we're defining components like this, by default the latest version is used!
inputs.push({ | ||
displayName: 'Text Splitter', | ||
maxConnections: 1, | ||
type: 'ai_textSplitter', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downside of defining this as a function and calling toString() on it instead of inlining it is that you can't use enum values here, like you discovered.
This is why many of our nodes seem to just inline these,
with that trick you can still use the enum values, as long as you pass those to the template literal.
But we're unlikely to ever change these enum values and your solution has the benefit of having the function typed so in my opinion we can keep this too, we seem to follow this pattern at ChainLLM node for instance too. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you added tests for node this!
Workflow Test Results 📊 🔴 1 Failed,
|
Workflow ID | Workflow Name | Reason |
---|---|---|
237 | BasicLLMChain:AzureChat | Workflow contains 1 deleted data. |
⚠️ Warnings (3)
Workflow ID | Workflow Name | Reason |
---|---|---|
35 | Slack:User:getPresence info:UserProfile:get update... | Workflow contains new data that previously did not exist. |
257 | Agent:auto-fix:anthropic | Workflow contains new data that previously did not exist. |
48 | Asana:Project:getAll get:Task:create update move g... | Workflow contains new data that previously did not exist. |
✅ All Cypress E2E specs passed |
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Got released with |
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Summary
This PR makes it easy for users to start with Simple Vector Store.
Currently, users have to set up the following nodes in succession
This makes the users confused on which properties to pick in each of these
To make it simpler, we add a new property to the Document Loaders.

With the "Simple" option, we pick a Text Splitter by default.
With the "Custom" option, the users can pick a Text Splitter of their choice
Related Linear tickets, Github issues, and Community forum posts
https://linear.app/n8n/issue/ADO-3519/default-text-splitting-in-the-data-loaders
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)