Skip to content

Commit 324bb96

Browse files
Add explainer for contextual biasing (#167)
Co-authored-by: Evan Liu <[email protected]>
1 parent 59b1992 commit 324bb96

File tree

1 file changed

+89
-0
lines changed

1 file changed

+89
-0
lines changed

explainers/contextual-biasing.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Explainer: Contextual Biasing for the Web Speech API
2+
3+
## Introduction
4+
5+
The Web Speech API provides powerful speech recognition capabilities to web applications. However, general-purpose speech recognition models can sometimes struggle with domain-specific terminology, proper nouns, or other words that are unlikely to appear in general conversation. This can lead to a frustrating user experience where the user's intent is frequently misrecognized.
6+
7+
To address this, we introduce **contextual biasing** to the Web Speech API. This feature allows developers to provide "hints" to the speech recognition engine in the form of a list of phrases and boost values. By biasing the recognizer towards these phrases, applications can significantly improve the accuracy for vocabulary that is important in their specific context.
8+
9+
## Why Use Contextual Biasing?
10+
11+
### 1. **Improved Accuracy**
12+
By providing a list of likely phrases, developers can dramatically increase the probability of those phrases being recognized correctly. This is especially useful for words that are acoustically similar to more common words.
13+
14+
### 2. **Enhanced User Experience**
15+
When speech recognition "just works" for the user's context, it leads to a smoother, faster, and less frustrating interaction. Users don't have to repeat themselves or manually correct transcription errors.
16+
17+
### 3. **Enabling Specialized Applications**
18+
Contextual biasing makes the Web Speech API a more viable option for specialized applications in fields like medicine, law, science, or gaming, where precise and often uncommon terminology is essential.
19+
20+
## Example Use Cases
21+
22+
### 1. Voice-controlled Video Game
23+
A video game might have characters with unique names like "Zoltan," "Xylia," or "Grog." Without contextual biasing, a command like "Attack Zoltan" might be misheard as "Attack Sultan." By providing a list of character and location names, the game can ensure commands are understood reliably.
24+
25+
### 2. E-commerce Product Search
26+
An online store can bias the speech recognizer towards its product catalog. When a user says "Show me Fujifilm cameras," the recognizer is more likely to correctly identify "Fujifilm" instead of a more common but similar-sounding word.
27+
28+
### 3. Medical Dictation
29+
A web-based application for doctors could be biased towards recognizing complex medical terms, drug names, and procedures. This allows for accurate and efficient voice-based note-taking.
30+
31+
## New API Components
32+
33+
Contextual biasing is implemented through a new `phrases` attribute on the `SpeechRecognition` interface, which uses two new supporting interfaces: `SpeechRecognitionPhrase` and `SpeechRecognitionPhraseList`.
34+
35+
### 1. `SpeechRecognition.phrases` attribute
36+
This attribute is assigned a `SpeechRecognitionPhraseList` object to provide contextual hints for the recognition session.
37+
38+
### 2. `SpeechRecognitionPhrase` interface
39+
Represents a single phrase and its associated boost value.
40+
41+
- `constructor(DOMString phrase, optional float boost = 1.0)`: Creates a new phrase object.
42+
- `phrase`: The text string to be boosted.
43+
- `boost`: A float between 0.0 and 10.0. Higher values make the phrase more likely to be recognized.
44+
45+
### 3. `SpeechRecognitionPhraseList` interface
46+
Represents a collection of `SpeechRecognitionPhrase` objects. It can be created with an array of phrases and managed dynamically with `addItem()` and `removeItem()` methods.
47+
48+
### Example Usage
49+
50+
```javascript
51+
// A list of phrases relevant to our application's context.
52+
const phrases = [
53+
{ phrase: 'Zoltan', boost: 3.0 },
54+
{ phrase: 'Grog', boost: 2.0 },
55+
];
56+
57+
// Create SpeechRecognitionPhrase objects.
58+
const phrases = phrases.map(p => new SpeechRecognitionPhrase(p.phrase, p.boost));
59+
60+
// Create a SpeechRecognitionPhraseList.
61+
const phraseList = new SpeechRecognitionPhraseList(phrases);
62+
63+
const recognition = new SpeechRecognition();
64+
// Assign the phrase list to the recognition instance.
65+
recognition.phrases = phraseList;
66+
67+
// Some user agents (e.g. Chrome) might only support on-device contextual biasing.
68+
recognition.processLocally = true;
69+
70+
recognition.onresult = (event) => {
71+
const transcript = event.results[0][0].transcript;
72+
console.log(`Result: ${transcript}`);
73+
};
74+
75+
recognition.onerror = (event) => {
76+
if (event.error === 'phrases-not-supported') {
77+
console.warn('Contextual biasing is not supported by this browser/service.');
78+
}
79+
};
80+
81+
// Start recognition when the user clicks a button.
82+
document.getElementById('speak-button').onclick = () => {
83+
recognition.start();
84+
};
85+
```
86+
87+
## Conclusion
88+
89+
Contextual biasing is a powerful enhancement to the Web Speech API that gives developers finer control over the speech recognition process. By allowing applications to provide context-specific hints, this feature improves accuracy, creates a better user experience, and makes voice-enabled web applications more practical for a wider range of specialized use cases.

0 commit comments

Comments
 (0)