Deep Research to Generate Tables

This is a Deep Research agent that aims to generate helpful tables of information, as opposed to the classic Report generation.

To visualize the graph in Studio, run the following command. npx @langchain/langgraph-cli@latest dev

Github Repo

Sample Questions

Methodology

One-Shot Base Row Generation

Limitations

Not generating enough rows in 1-shot, only generated ~20 rows even when I asked for 100

Iterative Base Row Generation

Improvements

We can now generate hundreds of rows easily

Limitations

Generated rows do not necessarily answer the prompt. Ex. What are the 10 best restaurants in NYC. We return information about at least 10 restaurants, but we don't really think about which ones are the best, or cheapest, etc. We also might not answer with exactly 10 restaurants

Increase Search Space and add Post Processing

Improvements

The quality of the final returned table is better, the LLM reasons about how to best answer a user's question

Limitations

The LLMs filtering and tool calling isn't perfect - need to iterate on the prompt and set of tools in Post Processing
The schema might be wrong, need to add a HITL there to make sure it is correct before proceeding

Add HITL in the Schema building process

Improvements

The human now can give natural language feedback on the schema before the approving the LLM to continue with the table generation

Limitations

In the future, we might want to create a UI experience where a user can directly edit the schema (out of scope)

Ideas

Do more thinking and reasoning up front in the schema generation and
- Iterate with user to pull out a primary key, criteria columns used to directly answer the question (if any), and additionally interesting columns to the user.
More search tools
- Right now I only added Tavily
- Going to integrate with MCP and Arcade to allow other search tools (google search, GitHub search, etc.)

Features

All web search is through Tavily
All results returned from web search are first summarized with a cheap LLM before being provided as context for row generation
We generate a zod schema during the "Extract Schema" step, and this is used as a structured output guide for the row generators in the "Base Row Generator" and "Row Researcher"
Flexibility given to the user to configure how many iterations to search, max concurrency, which models to use, etc.
Our Deep Research table generator can be exposed as a Tool beneath a higher level Chat interface
- We can add other tools to this higher level interface, like a true deep research report generation one specific row!

Next Steps

Improve response quality
- After n iterations of search without new base rows, you can stop searching
- Do a better "cardinality" analysis up front to determine the search method

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
imgs		imgs
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
langgraph.json		langgraph.json
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Research to Generate Tables

Methodology

One-Shot Base Row Generation

Iterative Base Row Generation

Increase Search Space and add Post Processing

Add HITL in the Schema building process

Ideas

Features

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

langchain-ai/generic-researcher

Folders and files

Latest commit

History

Repository files navigation

Deep Research to Generate Tables

Methodology

One-Shot Base Row Generation

Iterative Base Row Generation

Increase Search Space and add Post Processing

Add HITL in the Schema building process

Ideas

Features

Next Steps

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages