Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700

andycandy · 2025-04-11T17:27:01Z

Status: Work in Progress / Seeking Feedback

This PR presents the initial implementation of the Gemini+Qdrant recommendation cookbook outlined in #690. While the core functionality is present, I am actively seeking feedback and collaborating with reviewers to refine the notebook's structure, clarity, and explanations before final merging. Please feel free to add comments and suggestions!

Description

This PR introduces a new cookbook example demonstrating how to build a scalable, content-based recommendation system using Google's Gemini API for semantic embeddings and Qdrant as an efficient vector database.

Motivation

As outlined in #690 , existing examples often use toy datasets or focus on narrow use cases. This cookbook aims to provide a more practical, near-production-level demonstration showing how Large Language Model (LLM) embeddings, specifically from Gemini, can power recommendations across potentially diverse media types without relying on user interaction history (addressing cold-start scenarios).

Solution:

A new Jupyter notebook, Movie_Recommendation, has been added. This notebook guides the user through the following steps:

Setup: Installs necessary libraries (google-genai, qdrant-client, pandas), configures the Gemini API client, and initializes the Qdrant client.
Data Loading & Preparation:
- Loads a sample of the TMDB Movies Dataset.
  Note: The notebook uses a sample (e.g., 5000 items) for demonstration purposes due to the scale of the full dataset and API costs associated with embedding millions of items. Clear comments explain this and how to adapt for the full dataset.
- Selects relevant features (title, overview, genres, keywords, tagline, release_date).
- Performs data cleaning: handles missing titles, fills missing optional text fields, and extracts the release year.
- Constructs a combined text_for_embedding string for each movie, leveraging available metadata.
Embedding & Indexing with Qdrant:
- Defines a function (get_embeddings_batch) to efficiently generate embeddings for batches of text using the Gemini embedding-001 model, including retry logic.
- Creates a Qdrant collection with appropriate vector parameters (size 768, Cosine distance).
- Implements a batched upsert process:
  - Iterates through the data sample in batches.
  - Generates embeddings for each batch via the Gemini API.
  - Creates Qdrant PointStruct objects containing the vector, a unique ID (movie_id), and a structured payload (movie metadata like title, genre, year, overview).
  - Upserts these points into the Qdrant collection in batches for efficiency.
- Verifies the number of points indexed in Qdrant.
Querying & Recommendation:
- Defines a recommend_movies function that:
  - Takes a natural language query (e.g., movie title, theme description).
  - Generates an embedding for the query using Gemini (task_type="RETRIEVAL_QUERY").
  - Performs a similarity search against the indexed movie vectors in Qdrant using client.search.
  - Returns the top K most similar movies, including their metadata (payload) and similarity scores.
- Includes example queries to demonstrate the recommendation functionality.

Disclaimer

This PR uses the TMDb Movie Dataset for non-commercial purposes only, as per its licensing terms.

The dataset is licensed under CC BY-NC 4.0, which restricts usage to non-commercial projects.
Proper attribution has been provided to TMDb as required by the license.

By submitting this PR, I confirm that:

This dataset is used solely for educational and demonstration purposes.
Any further use of this dataset must comply with its licensing terms, and users are responsible for ensuring compliance.

review-notebook-app · 2025-04-11T17:27:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

andycandy · 2025-04-11T17:33:39Z

@markmcd @Giom-V Here is my first rough draft. Could you go through it once?

examples/qdrant/Movie_Recommendation.ipynb

Giom-V · 2025-06-04T09:26:21Z

@andycandy Overall I like the example, and I think it would be a great addition to the cookbook. I just think it needs way more explanations (but that was a first draft so that was expected).
Don't also forget to create a readme in the folder, update the one in examples/, and maybe update the "what's next" section of the embedding notebooks (and maybe other related examples) to mention this one.

andycandy · 2025-06-04T22:45:10Z

Thanks! I've completed all the required updates: added the README in the qdrant/ folder, updated the main examples/ README, and included this notebook in the "What's next" and related examples sections of the relevant notebooks, n also expanded the explanations throughout the notebook for clarity.

examples/qdrant/Movie_Recommendation.ipynb

andycandy added 2 commits April 11, 2025 22:15

Add files via upload

c35e7bc

nbfmt

2d314a5

andycandy marked this pull request as draft April 11, 2025 17:27

github-actions bot added status:awaiting review PR awaiting review from a maintainer component:examples Issues/PR referencing examples folder labels Apr 11, 2025

andycandy added 2 commits April 11, 2025 22:57

nbfmt

339c348

lint checked

3cdb007

Giom-V marked this pull request as ready for review June 4, 2025 09:13

Giom-V reviewed Jun 4, 2025

View reviewed changes

Giom-V assigned Giom-V and andycandy and unassigned Giom-V Jun 4, 2025

andycandy added 7 commits June 5, 2025 01:46

Merge branch 'google-gemini:main' into qdrant-movie-recommendation

80f1e59

Implemented suggestions

2fa4ce1

we => you

c3c937d

readme update

12dd3c2

next steps

187d75c

qdrant readme update

3ac5103

nbfmt

070dc0f

Giom-V reviewed Jun 6, 2025

View reviewed changes

andycandy added 4 commits June 9, 2025 04:38

improved suggestions

bfc5644

we => you

e5f8667

nbfmt

e1c643b

Merge branch 'google-gemini:main' into qdrant-movie-recommendation

88c7c2c

Giom-V approved these changes Jun 16, 2025

View reviewed changes

Giom-V merged commit 8d7b26b into google-gemini:main Jun 16, 2025
5 checks passed

andycandy deleted the qdrant-movie-recommendation branch June 24, 2025 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700

Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700

Uh oh!

andycandy commented Apr 11, 2025

Uh oh!

review-notebook-app bot commented Apr 11, 2025

Uh oh!

andycandy commented Apr 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Giom-V commented Jun 4, 2025

Uh oh!

andycandy commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700

Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700

Uh oh!

Conversation

andycandy commented Apr 11, 2025

Description

Motivation

Solution:

Disclaimer

Uh oh!

review-notebook-app bot commented Apr 11, 2025

Uh oh!

andycandy commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Giom-V commented Jun 4, 2025

Uh oh!

andycandy commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andycandy commented Apr 11, 2025 •

edited

Loading