This file provides context for AI assistants working on the OpenType codebase.
快捷键 → captureContext (异步) → 录音 → stopRecording
│ │
│ [Batch] 录完整段 [Streaming] 边录边 WebSocket
│ │
└─── pipeline:process ────┘
│
├─ STT (switch on protocol: openai-batch / dashscope-batch / realtime)
├─ LLM post-processing (动态 prompt: 词典 + 上下文 + 语气)
└─ typeAtCursor → 粘贴到光标
新人必读 5 个文件:
src/types/config.ts— 类型中枢: AppConfig、PROVIDERS、STT 协议、helper 函数electron/ipc-handlers.ts— IPC 调度入口, pipeline:process 是核心electron/stt-service.ts— 5 种 STT 协议分派 (switch on protocol)electron/llm-service.ts— LLM 润色: prompt 构建、上下文截断、VLMsrc/hooks/useRecorder.ts— 前端录音状态机 (idle→recording→processing→idle)
OpenType is an Electron desktop app for intelligent voice dictation. It captures microphone audio, transcribes it via STT APIs, then polishes the text using LLM post-processing (removing fillers, fixing repetitions, detecting self-corrections, adding punctuation). It also captures rich context (active window, focused field, clipboard, screen OCR) to improve transcription quality.
- Frontend: React 18 + TypeScript + Vite + Tailwind CSS 3 + Zustand 4
- Desktop: Electron 32 (CommonJS output in
dist-electron/) - Build: electron-builder 25 for cross-platform packaging
- CI/CD: GitHub Actions (
ci.ymlfor checks,release.ymlfor packaging onv*tags) - i18n: Custom lightweight React Context-based system (no external i18n library)
electron/ → Electron main process (CommonJS, compiled to dist-electron/)
main.ts → App lifecycle, creates windows/tray, registers shortcuts
preload.ts → contextBridge exposing electronAPI to renderer
app-state.ts → Shared mutable state singleton (windows, services, flags)
config-store.ts → JSON file persistence (~/Library/Application Support/OpenType/config.json)
ipc-handlers.ts → All ipcMain.handle() registrations (config, pipeline, media, context, etc.)
stt-service.ts → STT: protocol-driven dispatch (5 protocols), batch REST + realtime WebSocket
auto-dict-utils.ts → Pure functions for auto-dict (skip logic, prompt building) — no Electron deps
llm-service.ts → Server-side LLM API calls, prompt builder, smart truncation, VLM calls, term extraction
context-capture.ts → CapturedContext interface, macOS/Win/Linux context capture, screen OCR (native screencapture + sips)
shortcut-manager.ts → Global hotkey registration, toggleRecording() logic, context capture trigger
auto-dict.ts → LLM-driven dictionary learning: pipeline extraction, user edit detection, term persistence
window-manager.ts → createMainWindow() + createOverlayWindow()
tray-manager.ts → System tray icon + menu
audio-control.ts → macOS system audio mute/restore during recording
auto-updater.ts → electron-updater integration
fn-monitor.ts → macOS Fn key monitoring (child process)
utils.ts → errMsg() helper — type-safe error message extraction (no Electron deps)
src/ → React renderer (ESM, bundled by Vite to dist/)
types/config.ts → Central types: ProviderConfig, ProviderMeta, STTModelDef, STTProtocol, PROVIDERS, AppConfig, DEFAULT_CONFIG, helper functions (getProviderConfig, getSTTProviderOpts, getLLMProviderOpts, getSTTModelDef, getSTTModelMode, getDefaultBatchProtocol)
types/electron.d.ts → Type declarations for window.electronAPI (must match preload.ts)
stores/configStore.ts → Zustand store: load, set, update, history CRUD, dictionary CRUD, cross-window sync
services/ → Dual-mode services (Electron IPC or direct fetch)
audioRecorder.ts → WebAudio recording, webm → WAV conversion
sttService.ts → Speech-to-Text API calls
llmService.ts → LLM post-processing API calls
pipeline.ts → Full pipeline orchestrator (STT + LLM)
hooks/useRecorder.ts → Recording state machine (idle → recording → processing → idle)
utils/ → Shared pure utilities
wordCount.ts → CJK/Latin/mixed word counting
errMsg.ts → Type-safe error message extraction (ESM equivalent of electron/utils.ts)
friendlyError.ts → Map technical errors to localized user-friendly messages
i18n/ → Internationalization
index.ts → I18nProvider, useTranslation hook, detectLocale
locales/en.json → English strings
locales/zh.json → Chinese strings (must mirror en.json structure exactly)
components/
ui/ → Primitives: Button, Input, Select, Toggle, Slider, HotkeyCapture, SettingRow, SettingSection
layout/ → TitleBar, Sidebar, PageHeader
recording/ → RecordButton, ResultPanel
UpdateNotification.tsx → Auto-update notification banner
pages/ → DashboardPage, DictationPage, HistoryPage, DictionaryPage, OverlayPage
settings/ → SettingsLayout + sub-panels: Provider, General, Hotkey,
ToneRules, Context, Advanced, Privacy
scripts/ → Test scripts
# Unit tests (run via `npm test`, no API keys needed)
test-config-helpers.ts → Provider resolution, STT model mode/protocol, defaults (44 tests)
test-migration.ts → Config migration: all edge cases, idempotency (17 tests)
test-llm-helpers.ts → Truncation, cursor markers, term parsing, friendlyErrorMessage (58 tests)
test-auto-dict.ts → Skip logic, prompt building (24 tests)
test-word-count.ts → CJK/Latin/mixed word counting (23 tests)
test-i18n.ts → resolve, interpolate, locale file structure (21 tests)
test-pipeline-e2e.ts → STT config builders, protocol dispatch, prompt assembly, parseApiError + real API integration (31 tests)
# Integration tests (require API keys / network, not in `npm test`)
test-api.ts → API connectivity test
test-stt.ts → STT transcription test
test-pipeline.ts → Full pipeline test (STT + LLM)
test-realtime-providers.ts → Provider config resolution for all STT providers
test-realtime-stt.ts → Realtime STT WebSocket streaming test
test-stt-connection.ts → STT connection test (batch + streaming)
test-paraformer-realtime.ts → Paraformer native inference protocol test
test-fixtures/ → Test audio files (not committed, .gitignore'd)
angry.wav → 3.2s Chinese speech "你是不是觉得我很好欺负" (48kHz PCM16 mono)
build/ → entitlements.mac.plist (microphone + accessibility permissions)
| Config | Module | Resolution | Output | Purpose |
|---|---|---|---|---|
tsconfig.json |
ESNext | bundler | noEmit (Vite handles) | Frontend React code |
tsconfig.electron.json |
CommonJS | node | dist-electron/ |
Electron main process |
Always run both when type-checking: npm run typecheck (runs tsc --noEmit && tsc -p tsconfig.electron.json --noEmit)
Running tsc --noEmit alone only checks frontend — Electron errors will be missed.
npm run dev # Vite dev server (frontend only, http://localhost:5173)
npm run electron:dev # Full Electron dev mode (Vite + Electron)
npm run typecheck # Check BOTH frontend + electron TypeScript
npm test # Run all 218 unit tests (7 suites)
npm run check # typecheck + all unit tests (use before committing)
npm run build # Build frontend (vite build) + compile electron (tsc)
npm run electron:build # Full package (build + electron-builder, auto-detects platform)
# API integration tests (require env vars, not run by `npm test`)
SILICONFLOW_KEY=sk-xxx npm run test:api
SILICONFLOW_KEY=sk-xxx npm run test:stt
SILICONFLOW_KEY=sk-xxx OPENROUTER_KEY=sk-or-xxx npm run test:pipelineDev tips:
npm run dev(browser-only): STT/LLM API calls work via direct fetch, but no global hotkeys, overlay, context capture, or auto-type. Good for UI development.npm run electron:dev: Full app with all features. Electron code changes require restart — no HMR forelectron/files.test-fixtures/angry.wavis gitignored. Integration tests that need it will be skipped if missing.
All frontend services check window.electronAPI first. If present (running in Electron), they delegate to IPC. Otherwise, they make direct fetch calls. This allows developing the UI in a browser without Electron.
src/types/config.ts is the single source of truth — defines AppConfig, DEFAULT_CONFIG, PROVIDERS, and all helper functions. The Zustand store (configStore.ts) loads from Electron IPC or localStorage, and persists changes back on every set() call. The Electron-side config-store.ts imports DEFAULT_CONFIG from the same file (no separate copy). Config is stored at ~/Library/Application Support/OpenType/config.json.
The app has two BrowserWindows with separate renderer processes:
- Main window — hosts Dashboard, History, Dictionary, Settings pages
- Overlay window — transparent, always-on-top pill that shows recording status
Both windows use useRecorder() independently. The Zustand stores are NOT shared between windows. To keep history in sync, the main process broadcasts config:history-updated events via IPC when any window writes history. The configStore.load() registers a listener for this event.
Audio recordings and screenshots are stored as separate files in ~/Library/Application Support/OpenType/media/, NOT inline in config.json. History items store file paths (audioPath, screenshotPath). IPC channels:
media:save— write base64 to file, returns pathmedia:read— read file, returns base64media:delete— delete file
Five providers defined in PROVIDERS array (src/types/config.ts): SiliconFlow (STT+LLM), OpenRouter (LLM only), OpenAI (STT+LLM), DashScope (STT only, batch+streaming), OpenAI-Compatible (STT+LLM, custom endpoint).
Each provider is a ProviderMeta entry with defaultConfig, model lists, extraHeaders, etc. Per-user config lives in AppConfig.providers: Record<string, ProviderConfig> where ProviderConfig = { apiKey, baseUrl, sttModel, llmModel }.
STT Model Modes: Each STT model is STTModelDef = { id, mode, protocol, label?, sampleRate? }:
mode: 'batch' | 'streaming'— determines recording behavior (batch = record-then-send, streaming = send-while-recording)protocol: STTProtocol— determines which code path handles the API call:'openai-batch'— POST/audio/transcriptions(multipart)'dashscope-batch'— POST/compatible-mode/v1/chat/completions(input_audio)'openai-realtime'— WSS OpenAI Realtime API'qwen-asr-realtime'— WSS DashScope Qwen-ASR (OpenAI-compatible)'paraformer-realtime'— WSS DashScope native inference (Paraformer/FunASR/Gummy)
getSTTModelDef(providerId, modelId) resolves full model definition. getSTTModelMode() resolves mode. getDefaultBatchProtocol(providerId) handles custom models not in PROVIDERS. All dispatch in stt-service.ts is switch(protocol) — zero provider-specific if-else.
Helper functions (getProviderConfig, getSTTProviderOpts, getLLMProviderOpts) resolve config with zero if-else — all driven by the PROVIDERS array and PROVIDER_MAP. Adding a new provider only requires adding an entry to PROVIDERS.
The system prompt is dynamically built from:
- Toggle states (filler removal, repetition, self-correction, auto-formatting)
- Output language preference
- Personal dictionary as "Hot Word Table" — LLM prefers these spellings for similar-sounding words
- Active app → tone rule matching
- Context: selected text, field content, clipboard, recent transcriptions, screen OCR
- Smart truncation (
smartTruncate()) applied to all context fields to cap prompt length
Context is captured at hotkey press time (in toggleRecording() in shortcut-manager.ts) — BEFORE the overlay steals focus. This preserves the correct active window info. OCR runs in the background while the user speaks: macOS 使用 native screencapture -R 按光标所在显示器区域截图 + sips --resampleWidth 1280 压缩,跨平台 fallback 为 Electron desktopCapturer。OCR 结果在 pipeline 的 resolveContext() 中被 await 合并。
快捷键按下 → shortcut-manager.ts: toggleRecording()
│
├── prepareEditDetection() ← 快照上次输出状态(不阻塞)
├── state.isRecording = true
├── overlay.send('toggle-recording') ← 通知 overlay 开始录音
│
├── 50ms 后异步:
│ ├── captureFullContext() ← osascript 获取窗口/输入框上下文
│ │ └── 完成后: runEditDetection() ← 复用已获取的 context(fire-and-forget)
│ └── if OCR: captureScreenAndOcr() ← 🔄 后台截图+VLM(screencapture + sips 压缩)
│
│ ... 用户说话中(OCR 在后台跑) ...
│
停止录音 → useRecorder.ts: stopRecording()
│
├── recorder.stop() → audioBuffer
├── media:save 保存音频文件
│
└── ⚡ Promise.all([ ← renderer 端并行
├── runPipeline() → IPC pipeline:process
│ │
│ └── ⚡ Promise.all([ ← main process 端并行
│ ├── sttService.transcribe() ← STT 不等 OCR 完成
│ └── resolveContext() ← 等待 context+OCR promises
│ ])
│ │
│ ├── llmService.process(raw, cfg, ctx) ← LLM 润色(阻塞)
│ ├── schedulePostPipelineExtraction() ← 🔄 setImmediate 后台词典学习
│ └── return result
│
└── getLastContext() ← 获取 context 用于 history
])
│
├── typeAtCursor(text) ← 粘贴到光标 + recordTypedText()
└── addHistoryItem() ← 保存历史 + 广播同步
用户实际等待时间 = max(STT, OCR) + LLM。最优情况下 OCR 在录音期间已完成,等待 = STT + LLM。
不阻塞用户的后台任务:词典术语提取、编辑检测、历史广播。
安全阀:Pipeline 有 60s 超时互斥锁,防止 API 挂起导致永久 busy。
LLM 驱动的智能词典学习,3 个渠道,全部后台异步不阻塞 pipeline:
- Pipeline 后提取 —
schedulePostPipelineExtraction(): 用 raw + processed + 截图(如有)一次 LLM 调用提取术语。有截图时走 VLM(extractTermsWithImage),无截图时走文本 LLM(extractTerms)。setImmediate延迟执行。如果 raw 和 processed 仅有标点/空格差异则跳过。 - 用户编辑检测 —
prepareEditDetection()+runEditDetection(): 在下次按快捷键时,对比上次输出 vs 当前输入框内容,检测用户手动修正中的术语。复用captureFullContext()结果避免并发 osascript 冲突。5 分钟超时 + bundleId/fieldRole 校验。 recordTypedText()— 在typeAtCursor成功后记录输出文本到state.lastTypedText,供下次编辑检测。
提取原则:只学习"用户个人专属、STT 大概率搞错"的词(内部项目名、同事姓名、小众术语),默认返回空数组,max 3 个/次。prompt 中包含已有词典避免重复。
术语来源标记:source: 'auto-llm'(pipeline 提取)、'auto-diff'(编辑检测)、'manual'(手动添加)。
通过 dictionary:auto-added IPC 事件实时通知主窗口更新 UI。
- 10-minute auto-stop:
startRecording()sets a 600s timeout that triggersstopRecording() - Generation tracking:
generationRefprevents stale pipeline results from updating UI, but history is always saved regardless of staleness - All outcomes saved: success, skipped (no speech), and error all create history entries
- LLM post-processing switch:
config.llmPostProcessingmaster toggle — when off, pipeline returns raw STT output directly - Recorder ref isolation:
stopRecordingcaptures recorder in local variable beforeawait stop(), preventing concurrentstartRecordingfrom being orphaned
- STT batch: 30s
AbortControllertimeout (STT_REQUEST_TIMEOUT_MSinstt-service.ts) - LLM call: 30s
AbortControllertimeout (inllm-service.ts) - WebSocket connect: 10s timeout with settled flag (prevents double-reject race)
- WebSocket commit: 30s timeout, closes WebSocket on expiry (prevents resource leak)
- getLastContext: 10s
Promise.racetimeout (prevents osascript hang → permanent processing) - Pipeline mutex: 60s safety valve with force-unlock
- Friendly error messages:
friendlyErrorMessage()insrc/utils/friendlyError.tsmaps technical errors (401/429/5xx) to localized user-friendly messages. Used by ResultPanel, DashboardPage, and HistoryPage.parseApiError()instt-service.tsextracts human-readable message from JSON error bodies.
- Media path traversal protection:
assertMediaPath()inipc-handlers.tsvalidates all file paths are undermediaDirbefore read/write/delete - Atomic config save:
config-store.tswrites to.tmpthenrenameSync(prevents partial writes on crash). Backs up to.bakbefore overwriting.load()attempts.bakrecovery if main file is corrupted. - Dictionary cap:
addDictionaryWord()limits to 2000 entries, evicting oldest auto-learned words first (manual words preserved) - Single instance lock:
app.requestSingleInstanceLock()prevents multiple app instances
Update these three files in lockstep:
electron/ipc-handlers.ts— Register handler:ipcMain.handle('namespace:action', ...)electron/preload.ts— Expose method:action: (...) => ipcRenderer.invoke('namespace:action', ...)src/types/electron.d.ts— Add type:action: (...) => Promise<ReturnType>
IPC channel naming convention: namespace:action (e.g., config:get, stt:transcribe, pipeline:process).
Update these files:
src/types/config.ts— Add toAppConfiginterface + set default inDEFAULT_CONFIG(single source of truth, shared by both frontend and electron)- Settings UI — Add control in appropriate settings sub-panel
- Any service that reads the field (e.g.,
llm-service.ts,ipc-handlers.ts)
Only one file to edit:
src/types/config.ts— Add entry toPROVIDERSarray withProviderMeta(id, name, supportsSTT/LLM, model lists,defaultConfig). If the provider needs custom headers, setextraHeaders. IfsupportsSTT/supportsLLMintroduces a new id, add it toSTTProviderID/LLMProviderIDunion type.
Everything else (settings UI, helper functions, config resolution) is data-driven and works automatically.
Update these files:
electron/context-capture.ts— Add toCapturedContextinterface + populate in platform-specific capture functionssrc/types/config.ts— Add toHistoryContextinterfacesrc/types/electron.d.ts— Add togetLastContextreturn typesrc/hooks/useRecorder.ts— Save new field in history item'scontextobject (inbuildContext())electron/llm-service.ts— Include in prompt construction (with truncation)src/pages/HistoryPage.tsx— Display inDetailModalcomponent
Update both locale files with identical key structure:
src/i18n/locales/en.json— English stringssrc/i18n/locales/zh.json— Chinese strings
Missing keys fall back to English; missing from both shows the raw key string.
i18n key convention: section.subsection.key (e.g., settings.providers.apiKey).
Usage: const { t } = useTranslation(); t('history.clipboard', { count: 5 }).
| What | Convention | Example |
|---|---|---|
| React components | PascalCase files + named export | Button.tsx → export const Button |
| Hooks | camelCase with use prefix |
useRecorder.ts |
| Services | camelCase files | sttService.ts, pipeline.ts |
| Electron files | kebab-case | config-store.ts, llm-service.ts |
| IPC channels | namespace:action |
config:get, stt:transcribe |
| Config fields | camelCase | contextL0Enabled, providers |
| CSS variables | kebab-case | --slider-track |
| i18n keys | dot-notation | settings.providers.apiKey |
- All UI components use
forwardRef+ setdisplayName - Named exports only (no default exports)
- Variant-based styling via
Record<Variant, string>objects - Select uses
createPortalfor dropdown (avoids z-index stacking) - Event listeners from preload must return cleanup functions
- Custom colors only: Use
brand-*(blue, primary) andsurface-*(warm gray) — NOT default Tailwind colors - Dark mode: class-based (
dark:prefix), toggled via<html class="dark"> - All components must have both light and dark styles:
text-surface-800 dark:text-surface-200 - No responsive breakpoints (desktop-only app)
- Z-index: portals/dropdowns use
z-[9999], modals usez-50 - Drag region:
.drag-regionfor draggable areas,.no-dragfor interactive elements inside
brand-500 → #3b82f6 (primary blue — buttons, focus rings, active states)
surface-50 → #faf8f5 (light background)
surface-850→ #201e1c (dark component backgrounds)
surface-900→ #181715 (dark page backgrounds)
electron/uses CommonJS (module: "CommonJS") — noimport.meta, no top-levelexport default- Preload event listeners (
on*methods) must return cleanup functions to prevent memory leaks - Context is captured at hotkey time, not pipeline execution time
- Overlay window is transparent, always-on-top, unfocusable, positioned at bottom-center of active display
typeAtCursorsaves/restores clipboard around paste simulation- IPC handlers are in
ipc-handlers.ts, NOT inmain.ts
- Forgetting one of the 3 IPC files — TypeScript may pass, but runtime will crash
- Only running
tsc --noEmit— Misses electron/ errors. Always usenpm run typecheck - Hardcoding provider logic — Never add if-else for specific providers; add entry to
PROVIDERSarray instead - Using
import.metain electron/ — Will fail; electron uses CommonJS - Not updating both locale files — Chinese users see English fallback or raw key strings
- Capturing context too late — Must happen before overlay shows, or you get overlay's window info
- Missing
displayNameon forwardRef components — React DevTools show "ForwardRef" instead of name - Using default Tailwind colors — Must use
brand-*andsurface-*custom scales - History item ID collisions — Must include randomness:
Date.now().toString(36) + Math.random().toString(36).slice(2, 6) - Stale closures in useEffect — Event listener callbacks must track correct dependencies
- Storing large data in config.json — Audio/screenshots must use media files (
media:saveIPC), NOT inline base64 - Cross-window state not syncing — Overlay and main window have separate Zustand stores; history changes must be broadcast via
config:history-updatedIPC event - Adding IPC handlers in main.ts — All handlers belong in
ipc-handlers.ts;main.tsonly does app lifecycle
- Config stored at
~/Library/Application Support/OpenType/config.json(ElectronuserData), NOT~/.opentype/ - Media files (audio WAV, screenshot JPG) stored in
~/Library/Application Support/OpenType/media/ build/entitlements.mac.plistgrants microphone access (com.apple.security.device.audio-input) for macOS- Audio recording converts webm to WAV (PCM 16-bit, 16kHz) in the browser before sending to STT APIs
- The overlay window is transparent, always-on-top, and unfocusable. It's shown/hidden alongside recording state
- The frameless window uses
-webkit-app-region: dragvia.drag-regionCSS class. Interactive elements need.no-drag - Config persistence is immediate (no debouncing) — each
set()writes to disk - History is capped at 500 items in the Zustand store
- Recording auto-stops after 10 minutes
- Smart truncation in LLM prompts:
selectedText500 chars,fieldText1500,clipboardText500,screenContext400,recentTranscription200 each (max 3)