-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add Cookbook for Content-Based Recommendations with Gemini & Qdrant #700
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@andycandy Overall I like the example, and I think it would be a great addition to the cookbook. I just think it needs way more explanations (but that was a first draft so that was expected). |
Thanks! I've completed all the required updates: added the README in the qdrant/ folder, updated the main examples/ README, and included this notebook in the "What's next" and related examples sections of the relevant notebooks, n also expanded the explanations throughout the notebook for clarity. |
Status: Work in Progress / Seeking Feedback
This PR presents the initial implementation of the Gemini+Qdrant recommendation cookbook outlined in #690. While the core functionality is present, I am actively seeking feedback and collaborating with reviewers to refine the notebook's structure, clarity, and explanations before final merging. Please feel free to add comments and suggestions!
Description
This PR introduces a new cookbook example demonstrating how to build a scalable, content-based recommendation system using Google's Gemini API for semantic embeddings and Qdrant as an efficient vector database.
Motivation
As outlined in #690 , existing examples often use toy datasets or focus on narrow use cases. This cookbook aims to provide a more practical, near-production-level demonstration showing how Large Language Model (LLM) embeddings, specifically from Gemini, can power recommendations across potentially diverse media types without relying on user interaction history (addressing cold-start scenarios).
Solution:
A new Jupyter notebook, Movie_Recommendation, has been added. This notebook guides the user through the following steps:
google-genai
,qdrant-client
,pandas
), configures the Gemini API client, and initializes the Qdrant client.Note: The notebook uses a sample (e.g., 5000 items) for demonstration purposes due to the scale of the full dataset and API costs associated with embedding millions of items. Clear comments explain this and how to adapt for the full dataset.
title
,overview
,genres
,keywords
,tagline
,release_date
).text_for_embedding
string for each movie, leveraging available metadata.get_embeddings_batch
) to efficiently generate embeddings for batches of text using the Geminiembedding-001
model, including retry logic.PointStruct
objects containing the vector, a unique ID (movie_id
), and a structuredpayload
(movie metadata like title, genre, year, overview).recommend_movies
function that:task_type="RETRIEVAL_QUERY"
).client.search
.Disclaimer
This PR uses the TMDb Movie Dataset for non-commercial purposes only, as per its licensing terms.
By submitting this PR, I confirm that: