You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/question_answering/QuestionAnsweringModel.java
Copy file name to clipboardExpand all lines: ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/question_answering/SentenceHighlightingQATranslator.java
+32-1Lines changed: 32 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,20 @@
65
65
66
66
/**
67
67
* Translator for sentence highlighting question answering model.
68
-
*
68
+
*
69
+
* This translator processes input for semantic sentence highlighting models that identify
70
+
* relevant sentences within a context document based on a user query or question.
71
+
*
72
+
* The translator performs the following key functions:
73
+
* 1. Tokenizes the question and context using Hugging Face tokenizer
74
+
* 2. Segments the context into sentences
75
+
* 3. Maps tokens to their corresponding sentence IDs
76
+
* 4. Handles chunking for long contexts that exceed the model's maximum token length
77
+
* 5. Processes model outputs to identify and highlight sentences that answer the question
78
+
*
79
+
* The highlighted sentences are returned with their text and position information within
80
+
* the original context, which allows for easy visualization and extraction of relevant
81
+
* information from the document.
69
82
*/
70
83
@Log4j2
71
84
@Getter
@@ -166,6 +179,17 @@ public void setArguments(Map<String, ?> arguments) {
166
179
// No arguments needed for this translator
167
180
}
168
181
182
+
/**
183
+
* Prepares the translator by initializing the tokenizer with the appropriate configuration.
184
+
*
185
+
* The tokenizer is configured to handle chunking for long contexts that exceed the model's
186
+
* maximum token length. Even when processing individual chunks, the full context is always
187
+
* passed to the model in the input stage, ensuring that sentence segmentation and token-to-sentence
188
+
* mapping is consistent across all chunks.
189
+
*
190
+
* @param ctx The translator context which provides access to the model path
191
+
* @throws IOException If there is an error loading the tokenizer
0 commit comments