docs(extract-v2): complete walkthrough cookbook#61
Merged
eli-stewart merged 23 commits intomainfrom Apr 1, 2026
Merged
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Runnable Jupyter notebook covering all Extract V2 features: - Basic extraction, complex multi-page document - Citations with bounding box visualization - Confidence scores with charts - Agentic vs cost_effective comparison - Per-page extraction, advanced options, job management Uses SDK 2.0. Test files included (all public/synthetic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
99c932c to
df36a3a
Compare
…pport - Switch from create()+wait_for_completion() to run() as primary pattern - Add Section 10: ExtractedData.from_extract_job for typed Pydantic results - Install from PyPI (llama-cloud 2.0.0) instead of git branch - Add poppler install for Colab compatibility - Add getpass() fallback for API key input - Fix config list endpoint (items not data) - Add project_id note for multi-project setups - Remove duplicate cells and stale references - 11 sections, 53 cells, all tested against live API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Colab only gets the notebook, not the repo file tree. Added urllib download step that fetches test PDFs and schemas from raw GitHub URLs on first run. Skips if files already exist locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
So the cookbook doesn't need updating after the branch merges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tries the feature branch URL first, falls back to main. Works during testing and after merge without changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spell out abbreviations, name variables after what they represent. schema_from_prompt, cost_effective_job, confidence_metadata, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Receipt has $10 for total/subtotal/amountPaid/unitPrice, so all bounding boxes highlight the same area. Patent has title, assignee, filing_date, grant_date etc in different parts of the cover page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Abstract bbox covers half the page. Use patent_number, title, filing_date, grant_date, assignee, num_claims, primary_examiner for distinct bounding boxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
patent_number, applicant, grant_date, num_claims have clean bounding boxes. Dropped title (was matching a cited reference) and assignee (mis-tagged). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
applicant, grant_date, num_claims all produce tight, accurate bboxes. Dropped fields that return no citation or oversized bounding boxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused pydantic.Field import - Add from __future__ import annotations for list[] syntax Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
stdlib -> third-party -> local, alphabetical within groups, blank lines between groups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Colab already has poppler. The apt-get call was hanging indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Files aren't on main yet. Try main first, fall back to feature branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Colab doesn't have poppler. Previous hang was likely apt-get without update first. Now runs apt-get update then install in one line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…c pattern - Rename document_input_value -> file_input throughout (per platform PR #16633) - Remove Section 10 (ExtractedData) per Logan's feedback - internal API - Add Pydantic model_validate pattern in Section 3 (simpler, user-facing) - Renumber sections (now 1-10) - Update summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 task
Remove feature branch fallback. After merge, files live on main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test file URLs now use an immutable commit SHA instead of branch names. Notebook code evolves with main, but data files are pinned so existing Colab links never break if files are reorganized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Georgehe4
approved these changes
Apr 1, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete Extract V2 cookbook as a runnable Jupyter notebook. Works locally and on Google Colab.
Sections (11)
Test files
Notes
llama-cloud2.0.0 from PyPIrun()as primary pattern (requires SDK PR fix: update SDK for V2 extract API #66)document_input_valuewill need renaming tofile_inputafter SDK regen (platform PR #16633)mainbranch (PR fix: point test file downloads at main branch #71 pending merge)Test plan