-
Notifications
You must be signed in to change notification settings - Fork 202
Agentic Retreival for sub-queries generation #10667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@radhgupta please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request enhances the retrieval process by generating multiple subqueries from the original issue details and aggregating deduplicated results for issues and documents. Key changes include:
- Adding a new method GenerateSubqueriesAsync to derive subqueries from prompts.
- Introducing configuration support for subqueries generation.
- Updating the OpenAiAnswerService to process results from multiple subqueries and merge them with the original query results.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tools/issue-labeler/src/IssueLabelerService/TraigeRag.cs | Adds the GenerateSubqueriesAsync method to generate subqueries. |
tools/issue-labeler/src/IssueLabelerService/Configuration/RepositoryConfiguration.cs | Includes a new configuration property for subqueries prompt. |
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs | Implements subquery handling, aggregation, deduplication, and updates result formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub will ignore approval if I approve now and you make changes so LMK
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
double scoreThreshold = double.Parse(_config.ScoreThreshold); | ||
double solutionThreshold = double.Parse(_config.SolutionThreshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double scoreThreshold = double.Parse(_config.ScoreThreshold); | |
double solutionThreshold = double.Parse(_config.SolutionThreshold); | |
var scoreThreshold = double.Parse(_config.ScoreThreshold); | |
var solutionThreshold = double.Parse(_config.SolutionThreshold); |
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
var uniqueIssues = new HashSet<Issue>(new IssueIdComparer()); | ||
var uniqueDocs = new HashSet<Document>(new DocumentUrlComparer()); | ||
|
||
await RetrieveAndAggregateSubqueryResults( | ||
subqueries, | ||
issueIndexName, | ||
issueSemanticName, | ||
issueFieldName, | ||
documentIndexName, | ||
documentSemanticName, | ||
documentFieldName, | ||
top, | ||
scoreThreshold, | ||
labels, | ||
uniqueIssues, | ||
uniqueDocs | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider reducing the number of parameters passed to RetrieveAndAggregateSubqueryResults. Many of these values (like issueIndexName, documentIndexName, semantic names, field names, etc.) are already available via _config. Pulling them inside the method would improve readability and reduce boilerplate when calling the method.
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Outdated
Show resolved
Hide resolved
foreach (var originalIssue in originalIssues) | ||
uniqueIssues.Add(originalIssue); | ||
foreach (var originalDoc in originalDocs) | ||
uniqueDocs.Add(originalDoc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using your helper method AddUniqueItems here to keep deduplication consistent and reusable.
var originalIssues = await _ragService.SearchIssuesAsync(issueIndexName, issueSemanticName, issueFieldName, query, top, scoreThreshold, labels); | ||
var originalDocs = await _ragService.SearchDocumentsAsync(documentIndexName, documentSemanticName, documentFieldName, query, top, scoreThreshold, labels); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting each step into a dedicated private method. This would significantly improve the readability and structure of the method. Something like:
// Step 1: Generate subqueries
var subqueries = await GenerateSubqueriesAsync(query, modelName);
// Step 2: Retrieve and dedupe subquery results
var (uniqueIssues, uniqueDocs) = await RetrieveResultsAsync(subqueries, top, scoreThreshold, labels);
// Step 3: Always include original query results
await AddOriginalQueryResultsAsync(query, top, scoreThreshold, labels, uniqueIssues, uniqueDocs);
ModelName: config.AnswerModelName, | ||
IssueIndexName: config.IssueIndexName, | ||
IssueSemanticName: config.IssueSemanticName, | ||
IssueFieldName: config.IssueIndexFieldName, | ||
DocumentIndexName: config.DocumentIndexName, | ||
DocumentSemanticName: config.DocumentSemanticName, | ||
DocumentFieldName: config.DocumentIndexFieldName, | ||
Top: int.Parse(config.SourceCount), | ||
ScoreThreshold: double.Parse(config.ScoreThreshold), | ||
SolutionThreshold: double.Parse(config.SolutionThreshold), | ||
SubqueriesPromptTemplate: config.SubqueriesGenerationPrompt, | ||
SolutionInstructions: config.SolutionInstructions, | ||
SuggestionInstructions: config.SuggestionInstructions, | ||
SolutionUserPrompt: config.SolutionUserPrompt, | ||
SuggestionUserPrompt: config.SuggestionUserPrompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ModelName: config.AnswerModelName, | |
IssueIndexName: config.IssueIndexName, | |
IssueSemanticName: config.IssueSemanticName, | |
IssueFieldName: config.IssueIndexFieldName, | |
DocumentIndexName: config.DocumentIndexName, | |
DocumentSemanticName: config.DocumentSemanticName, | |
DocumentFieldName: config.DocumentIndexFieldName, | |
Top: int.Parse(config.SourceCount), | |
ScoreThreshold: double.Parse(config.ScoreThreshold), | |
SolutionThreshold: double.Parse(config.SolutionThreshold), | |
SubqueriesPromptTemplate: config.SubqueriesGenerationPrompt, | |
SolutionInstructions: config.SolutionInstructions, | |
SuggestionInstructions: config.SuggestionInstructions, | |
SolutionUserPrompt: config.SolutionUserPrompt, | |
SuggestionUserPrompt: config.SuggestionUserPrompt | |
ModelName: config.AnswerModelName, | |
IssueIndexName: config.IssueIndexName, | |
IssueSemanticName: config.IssueSemanticName, | |
IssueFieldName: config.IssueIndexFieldName, | |
DocumentIndexName: config.DocumentIndexName, | |
DocumentSemanticName: config.DocumentSemanticName, | |
DocumentFieldName: config.DocumentIndexFieldName, | |
Top: int.Parse(config.SourceCount), | |
ScoreThreshold: double.Parse(config.ScoreThreshold), | |
SolutionThreshold: double.Parse(config.SolutionThreshold), | |
SubqueriesPromptTemplate: config.SubqueriesGenerationPrompt, | |
SolutionInstructions: config.SolutionInstructions, | |
SuggestionInstructions: config.SuggestionInstructions, | |
SolutionUserPrompt: config.SolutionUserPrompt, | |
SuggestionUserPrompt: config.SuggestionUserPrompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general convention, C# likes to indent line continuations. The named parameter use is wonderful for readability.
var modelName = _config.AnswerModelName; | ||
var issueIndexName = _config.IssueIndexName; | ||
var documentIndexName = _config.DocumentIndexName; | ||
string query = $"{issue.Title} {issue.Body}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string query = $"{issue.Title} {issue.Body}"; | |
var query = $"{issue.Title} {issue.Body}"; |
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double highestScore = _ragService.GetHighestScore(issues, docs, issue.RepositoryName, issue.IssueNumber); | ||
bool solution = highestScore >= solutionThreshold; | ||
var highestScore = _ragService.GetHighestScore(uniqueIssues, uniqueDocs, issue.RepositoryName, issue.IssueNumber); | ||
var solution = highestScore >= _serviceConfiguration.SolutionThreshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var solution = highestScore >= _serviceConfiguration.SolutionThreshold; | |
var isSolution = highestScore >= _serviceConfiguration.SolutionThreshold; |
nit: C# convention is generally to name boolean instances using an is
of has
or similar prefix that invokes a "yes or no" connotation.
{ "PrintableDocs", printableDocs }, | ||
{ "PrintableIssues", printableIssues } | ||
}; | ||
var replacementsUserPrompt = BuildUserPromptData(uniqueIssues, uniqueDocs, issue); | ||
|
||
string instructions, userPrompt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string instructions, userPrompt; | |
string instructions; | |
string userPrompt; |
nit: The current form is legal syntax, but generally our convention is to explicitly declare each on their own line.
} | ||
else | ||
{ | ||
instructions = _config.SuggestionInstructions; | ||
userPrompt = AzureSdkIssueLabelerService.FormatTemplate(_config.SuggestionUserPrompt, replacementsUserPrompt, _logger); | ||
instructions = _serviceConfiguration.SuggestionInstructions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
userPrompt = isSolution
? _serviceConfiguration.SolutionUserPrompt, replacementsUserPrompt
: _serviceConfiguration.SuggestionUserPrompt, replacementsUserPrompt;
userPrompt = AzureSdkIssueLabelerService.FormatTemplate(userPrompt, replacementsUserPrompt, _logger);
Unless I'm misreading, the only difference in the if/else branches is the config parameter. I think we can simplify a bit.
HashSet<Document> uniqueDocs | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashSet<Document> uniqueDocs | |
) | |
HashSet<Document> uniqueDocs) |
nit: C# convention is generally to throw the closing parens on the same line as the last parameter.
refactored subqueries generation minor refactoring for agentic retrival removed redudant data structure of issues and doc refactoring code subquery generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a couple more C# discussion items and formatting nits.
@@ -10,6 +10,7 @@ namespace IssueLabelerService | |||
{ | |||
public class OpenAiAnswerService : IAnswerService | |||
{ | |||
private readonly ServiceConfiguration _serviceConfiguration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private readonly ServiceConfiguration _serviceConfiguration; | |
private readonly ServiceConfiguration ServiceConfiguration; |
This one gets inconsistently applied, but the _name
pattern indicates a class-level variable. Since this is a member that is intended to be constant after the instance is created, we generally name these like we would a constant.
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Show resolved
Hide resolved
tools/issue-labeler/src/IssueLabelerService/Answers/OpenAiAnswerService.cs
Show resolved
Hide resolved
@@ -22,6 +22,10 @@ public class TriageRag | |||
private static SearchIndexClient s_searchIndexClient; | |||
private ILogger<TriageRag> _logger; | |||
|
|||
private const int MinSubqueries = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we move these to the first class members? Generally, C# conventions call for ordering by scope/visibility:
- const (which is static)
- static readonly
- static
and each type by visibility
- private
- protected
- internal
- public
The service now generates multiple subqueries based on the original issue’s title and body, retrieves relevant issues and documents for each subquery, and aggregates them with deduplication. This approach increases the diversity and relevance of retrieved context, improving the quality of answers and suggestions.
GenerateSubqueriesAsync
: Generates Subqueries using RAG agent. The number of subqueries depends on the length of the original query.SearchIssuesAsync
andSearchDocumentsAsync
to retrieve relevant issues and documents.SendMessageQnaAsync
to generate the Open AI response.