Skip to content

Conversation

@Saswatsusmoy
Copy link

@Saswatsusmoy Saswatsusmoy commented Oct 9, 2025

Description

Added page-Level tracking and citation. Now the references from the retrieval shows the exact page/range of pages from which the chunk was retrieved.

image

Related Issues

closes #2142

Changes Made

  • added optional page tracking fields (start_page, end_page, pages) to TextChunkSchema.
  • updated LightRAG class to handle page metadata during document processing.
  • implemented validation for LLM responses to ensure only valid reference IDs are used.
  • chunking functions to include page data for better context management.
  • reference generation to include page ranges for citations (mostly tweaked the prompts related to reference generation).
  • added PDF extraction methods to capture page-level data using PyPDF2 and Docling.

Checklist

  • Changes tested locally
  • Code reviewed

Additional Notes

Added debugging to the console for the retrieved chunks and it's metadata, can remove it if not required

The metadata shows a range of pages because of the chunk size, frequently the chunk included content from multiple pages, So had to include a range of pages so as to pinpoint which chunk was used as reference

…lidation

- Added optional page tracking fields (start_page, end_page, pages) to TextChunkSchema.
- Updated LightRAG class to handle page metadata during document processing.
- Implemented validation for LLM responses to ensure only valid reference IDs are used.
- Enhanced chunking functions to include page data for better context management.
- Improved reference generation to include page ranges for citations.
- Added PDF extraction methods to capture page-level data using PyPDF2 and Docling.
@yrangana
Copy link
Contributor

Any update on this ?

@Saswatsusmoy
Copy link
Author

Any update on this ?

added a PR waiting to be reviewed and merged

@Albertchamberlain
Copy link

This PR Merged ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Add page number metadata to chunks for citation

3 participants