Skip to content

Commit 76f5219

Browse files
authored
Phase 4: Review status indicators and update functionality (#5)
## Summary - Complete review management system with visual status indicators, progress tracking, and update functionality - Professional interface for conducting human evaluations of the prompt testing framework ## Test plan - [ ] Load existing review results and verify status indicators appear correctly (green ✅ for reviewed, red ⚪ for pending) - [ ] Verify progress bar shows accurate completion percentage in header - [ ] Test form pre-population with existing review data when clicking on reviewed cases - [ ] Confirm "Save Review" vs "Update Review" button text changes appropriately - [ ] Test real-time status updates when saving new reviews or updating existing ones - [ ] Verify review persistence across page refreshes 🤖 Generated with [Claude Code](https://claude.ai/code)
2 parents bff83ab + 59d0475 commit 76f5219

File tree

5 files changed

+227
-16
lines changed

5 files changed

+227
-16
lines changed

claude_explain.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,10 @@ The prompt testing framework feeds back into the main service by:
9595
- AWS deployment (handled by Compiler Explorer infrastructure)
9696
- S3 caching for responses with configurable HTTP Cache-Control headers
9797
- Cache bypass option for fresh responses
98+
- Human review integration with web interface for prompt evaluation
99+
- Interactive review management system with status indicators and progress tracking
100+
- Automated prompt improvement pipeline using human + AI feedback
101+
- Version tracking and comparison system for prompt iterations
98102

99103
### 🔄 In Progress
100104
- Production API key management
@@ -104,7 +108,6 @@ The prompt testing framework feeds back into the main service by:
104108
### 📋 TODO
105109
- Prompt caching for cost optimization
106110
- Production monitoring dashboards
107-
- User feedback collection mechanism
108111

109112
## Design Decisions
110113

@@ -274,10 +277,17 @@ Matt's notes:
274277
- Pass the explanation type, description, and the audience too (along with explanation) to claude reviewer
275278
- would be great to validate the YAML and error on broken/missing/extra fields. probably make most fields required and get rid of all the fallbacks like audience etc too
276279
- probably use pydactic thing to wrap with a strong type?
277-
- HTML review needs UX work; can't see the comment box at the same time as the thing we're commenting on
278-
- HTML review nice way to tick off things already done
279-
- HTML review should use localStorage to save reviewer name and/or get from git
280-
- UX on HTML review - view the automated output too? (like the nuanced opinion not just numbers)
280+
COMPLETED ✅:
281+
- HTML review UX work - completed comprehensive review interface improvements:
282+
* Side-by-side code display for better space usage
283+
* localStorage integration for reviewer name persistence
284+
* 1-5 scale metrics alignment with human evaluation standards
285+
* Line-separated input format (more natural than comma-separated)
286+
* Visual status indicators showing reviewed vs unreviewed cases
287+
* Progress tracking with animated completion bar
288+
* Form pre-population with existing review data
289+
* Update functionality for modifying existing reviews
290+
* Real-time status updates and review management
281291

282292
--- before v4 ---
283293

prompt_testing/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,11 +340,21 @@ uv run prompt-test run --prompt v1_baseline --compare current --categories basic
340340
uv run prompt-test run --prompt current --output my_test_results.json
341341
```
342342

343-
2. Review results interactively:
343+
2. Review results interactively via web interface:
344344
```bash
345345
uv run prompt-test review --results-file prompt_testing/results/my_test_results.json
346346
```
347347

348+
**Features:**
349+
- Visual status indicators (✅ reviewed, ⚪ pending) with colored borders
350+
- Progress tracking with animated completion bar
351+
- Side-by-side source code and assembly display
352+
- Form pre-population with existing review data
353+
- Update functionality for modifying reviews
354+
- localStorage persistence for reviewer information
355+
- 1-5 scale metrics aligned with human evaluation standards
356+
- Line-separated input format for natural feedback entry
357+
348358
3. Analyze review data:
349359
```bash
350360
uv run prompt-test analyze

prompt_testing/WHATS_NEXT.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,17 @@ This document outlines the next steps for improving the prompt testing framework
88

99
### Web Review Interface (Latest)
1010
- **Fixed HTML review interface** - Replaced string concatenation with Flask + Jinja2
11-
- **Added markdown rendering** - AI responses now display with proper formatting using python-markdown
11+
- **Added markdown rendering** - AI responses now display with proper formatting using client-side marked.js
1212
- **Fixed template errors** - Resolved "dict has no attribute request" by enriching results with test case data
1313
- **Improved result descriptions** - Clear labels like "Current Production Prompt - 12 cases" instead of "unknown"
1414
- **Added CSS styling** - Proper code block, header, and list formatting
1515
- **Interactive web server** - `uv run prompt-test review --interactive` launches Flask app on localhost:5001
16+
- **COMPLETED: Quality of Life Improvements**
17+
* Phase 1: localStorage reviewer persistence + 1-5 metrics scale alignment
18+
* Phase 2: Side-by-side source/assembly code display with responsive grid
19+
* Phase 3: Line-separated input format (more natural than comma-separated)
20+
* Phase 4: Review status indicators + progress tracking + update functionality
21+
* Professional review management system with visual status, form pre-population, and real-time updates
1622

1723
### Prompt Improvement System Audit & Fixes
1824
- **Fixed critical "current" prompt loading bug** - PromptOptimizer now handles "current" → `app/prompt.yaml` mapping

prompt_testing/templates/review.html

Lines changed: 175 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,13 @@
1818
.case-card {
1919
background: white; margin-bottom: 30px; border-radius: 8px;
2020
box-shadow: 0 2px 4px rgba(0,0,0,0.1); overflow: hidden;
21+
position: relative;
22+
}
23+
.case-card.reviewed {
24+
border-left: 5px solid #16a34a;
25+
}
26+
.case-card.not-reviewed {
27+
border-left: 5px solid #dc2626;
2128
}
2229
.case-header {
2330
background: #f8f9fa; padding: 15px 20px;
@@ -26,6 +33,24 @@
2633
}
2734
.case-id { font-weight: 600; color: #333; }
2835
.case-meta { font-size: 0.9em; color: #666; }
36+
.review-status {
37+
display: inline-flex; align-items: center; gap: 8px;
38+
padding: 4px 12px; border-radius: 16px; font-size: 0.8em; font-weight: 500;
39+
}
40+
.review-status.reviewed {
41+
background: #dcfce7; color: #166534;
42+
}
43+
.review-status.not-reviewed {
44+
background: #fef2f2; color: #dc2626;
45+
}
46+
.progress-bar {
47+
background: #f3f4f6; border-radius: 8px; height: 8px; margin-top: 10px;
48+
overflow: hidden;
49+
}
50+
.progress-fill {
51+
background: linear-gradient(90deg, #16a34a, #22c55e);
52+
height: 100%; transition: width 0.3s ease;
53+
}
2954
.case-content { padding: 20px; }
3055
.section { margin-bottom: 25px; }
3156
.section h4 {
@@ -166,6 +191,13 @@ <h1>📝 Review: {{ data.summary.prompt_version if data.summary else 'Unknown' }
166191
{{ "%.2f"|format(data.summary.average_metrics.overall_score) }}</p>
167192
{% endif %}
168193
{% endif %}
194+
195+
<div id="review-progress" style="display: none;">
196+
<p><strong>Review Progress:</strong> <span id="progress-text">Loading...</span></p>
197+
<div class="progress-bar">
198+
<div class="progress-fill" id="progress-fill" style="width: 0%"></div>
199+
</div>
200+
</div>
169201
</div>
170202

171203
<div class="reviewer-info">
@@ -185,11 +217,16 @@ <h4>👤 Reviewer Information</h4>
185217
{% if result.success %}
186218
<div class="case-card">
187219
<div class="case-header">
188-
<div class="case-id">{{ result.case_id }}</div>
189-
<div class="case-meta">
190-
{{ result.test_case.language }} | {{ result.test_case.compiler }} |
191-
Audience: {{ result.test_case.audience }} |
192-
Type: {{ result.test_case.explanation_type }}
220+
<div>
221+
<div class="case-id">{{ result.case_id }}</div>
222+
<div class="case-meta">
223+
{{ result.test_case.language }} | {{ result.test_case.compiler }} |
224+
Audience: {{ result.test_case.audience }} |
225+
Type: {{ result.test_case.explanation_type }}
226+
</div>
227+
</div>
228+
<div class="review-status not-reviewed" id="status-{{ result.case_id }}">
229+
<span></span> Not Reviewed
193230
</div>
194231
</div>
195232

@@ -351,6 +388,10 @@ <h4>✏️ Your Review</h4>
351388
breaks: true
352389
});
353390

391+
// Global state for existing reviews
392+
let existingReviews = {};
393+
let reviewProgress = { total: 0, reviewed: 0 };
394+
354395
// Initialize page on load
355396
document.addEventListener('DOMContentLoaded', function() {
356397
// Load saved reviewer name from localStorage
@@ -364,6 +405,9 @@ <h4>✏️ Your Review</h4>
364405
localStorage.setItem('reviewerName', this.value);
365406
});
366407

408+
// Load existing reviews for this prompt version
409+
loadExistingReviews();
410+
367411
// Render all markdown responses
368412
{% for result in data.results %}
369413
{% if result.success %}
@@ -376,6 +420,116 @@ <h4>✏️ Your Review</h4>
376420
{% endfor %}
377421
});
378422

423+
async function loadExistingReviews() {
424+
try {
425+
const promptVersion = '{{ data.summary.prompt_version if data.summary else "unknown" }}';
426+
const response = await fetch(`/api/reviews/prompt/${encodeURIComponent(promptVersion)}`);
427+
const data = await response.json();
428+
429+
existingReviews = data.reviews_by_case || {};
430+
431+
// Update UI for each case
432+
const caseCards = document.querySelectorAll('.case-card');
433+
reviewProgress.total = caseCards.length;
434+
reviewProgress.reviewed = 0;
435+
436+
caseCards.forEach(card => {
437+
const caseId = extractCaseIdFromCard(card);
438+
if (caseId && existingReviews[caseId]) {
439+
updateCaseReviewStatus(caseId, true, existingReviews[caseId]);
440+
reviewProgress.reviewed++;
441+
} else if (caseId) {
442+
updateCaseReviewStatus(caseId, false);
443+
}
444+
});
445+
446+
updateProgressIndicator();
447+
448+
} catch (error) {
449+
console.error('Failed to load existing reviews:', error);
450+
}
451+
}
452+
453+
function extractCaseIdFromCard(card) {
454+
const statusElement = card.querySelector('[id^="status-"]');
455+
if (statusElement) {
456+
return statusElement.id.replace('status-', '');
457+
}
458+
return null;
459+
}
460+
461+
function updateCaseReviewStatus(caseId, isReviewed, reviewData = null) {
462+
const card = document.querySelector(`#status-${caseId}`).closest('.case-card');
463+
const statusElement = document.getElementById(`status-${caseId}`);
464+
const saveButton = document.querySelector(`button[onclick="saveReview('${caseId}')"]`);
465+
466+
if (isReviewed) {
467+
// Mark as reviewed
468+
card.classList.remove('not-reviewed');
469+
card.classList.add('reviewed');
470+
statusElement.classList.remove('not-reviewed');
471+
statusElement.classList.add('reviewed');
472+
statusElement.innerHTML = '<span>✅</span> Reviewed';
473+
474+
if (saveButton) {
475+
saveButton.textContent = `Update Review for ${caseId}`;
476+
}
477+
478+
// Pre-populate form if review data available
479+
if (reviewData) {
480+
populateReviewForm(caseId, reviewData);
481+
}
482+
} else {
483+
// Mark as not reviewed
484+
card.classList.remove('reviewed');
485+
card.classList.add('not-reviewed');
486+
statusElement.classList.remove('reviewed');
487+
statusElement.classList.add('not-reviewed');
488+
statusElement.innerHTML = '<span>⚪</span> Not Reviewed';
489+
490+
if (saveButton) {
491+
saveButton.textContent = `Save Review for ${caseId}`;
492+
}
493+
}
494+
}
495+
496+
function populateReviewForm(caseId, reviewData) {
497+
// Populate numeric scores
498+
const fields = ['accuracy', 'relevance', 'conciseness', 'insight', 'appropriateness'];
499+
fields.forEach(field => {
500+
const input = document.querySelector(`input[name="${field}"][data-case="${caseId}"]`);
501+
if (input && reviewData[field]) {
502+
input.value = reviewData[field];
503+
}
504+
});
505+
506+
// Populate text areas
507+
const textFields = ['strengths', 'weaknesses', 'suggestions', 'overall_comments'];
508+
textFields.forEach(field => {
509+
const textarea = document.querySelector(`textarea[name="${field}"][data-case="${caseId}"]`);
510+
if (textarea && reviewData[field]) {
511+
if (Array.isArray(reviewData[field])) {
512+
textarea.value = reviewData[field].join('\n');
513+
} else {
514+
textarea.value = reviewData[field];
515+
}
516+
}
517+
});
518+
}
519+
520+
function updateProgressIndicator() {
521+
const progressContainer = document.getElementById('review-progress');
522+
const progressText = document.getElementById('progress-text');
523+
const progressFill = document.getElementById('progress-fill');
524+
525+
if (reviewProgress.total > 0) {
526+
const percentage = Math.round((reviewProgress.reviewed / reviewProgress.total) * 100);
527+
progressText.textContent = `${reviewProgress.reviewed}/${reviewProgress.total} cases reviewed (${percentage}%)`;
528+
progressFill.style.width = `${percentage}%`;
529+
progressContainer.style.display = 'block';
530+
}
531+
}
532+
379533
async function saveReview(caseId) {
380534
const reviewerName = document.getElementById('reviewer-name').value.trim();
381535
if (!reviewerName) {
@@ -429,11 +583,21 @@ <h4>✏️ Your Review</h4>
429583
const result = await response.json();
430584

431585
if (result.success) {
432-
messageEl.textContent = '✅ Saved!';
586+
const wasUpdate = existingReviews[caseId] !== undefined;
587+
messageEl.textContent = wasUpdate ? '✅ Updated!' : '✅ Saved!';
433588
messageEl.className = 'success-msg';
434589

435-
const inputs = document.querySelectorAll(`[data-case="${caseId}"]`);
436-
inputs.forEach(input => input.disabled = true);
590+
// Update the global state
591+
existingReviews[caseId] = formData;
592+
593+
// Update UI to show reviewed status
594+
if (!wasUpdate) {
595+
reviewProgress.reviewed++;
596+
}
597+
updateCaseReviewStatus(caseId, true, formData);
598+
updateProgressIndicator();
599+
600+
// Don't disable inputs for updates, allow further editing
437601
} else {
438602
throw new Error(result.error || 'Failed to save review');
439603
}
@@ -443,7 +607,9 @@ <h4>✏️ Your Review</h4>
443607
messageEl.className = 'error-msg';
444608
} finally {
445609
button.disabled = false;
446-
button.textContent = `Save Review for ${caseId}`;
610+
// Update button text based on current review status
611+
const isReviewed = existingReviews[caseId] !== undefined;
612+
button.textContent = isReviewed ? `Update Review for ${caseId}` : `Save Review for ${caseId}`;
447613
}
448614
}
449615
</script>

prompt_testing/web_review.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,25 @@ def get_reviews(case_id):
228228
reviews = self.review_manager.load_reviews(case_id=case_id)
229229
return jsonify({"reviews": [review.__dict__ for review in reviews]})
230230

231+
@self.app.route("/api/reviews/prompt/<prompt_version>")
232+
def get_reviews_for_prompt(prompt_version):
233+
"""Get all existing reviews for a specific prompt version."""
234+
reviews = self.review_manager.load_reviews(prompt_version=prompt_version)
235+
# Group reviews by case_id for easier frontend consumption
236+
#
237+
# Design Note: We use direct assignment (latest review wins) rather than arrays
238+
# because this UI is designed for single-reviewer prompt iteration workflows.
239+
# The underlying JSONL storage preserves all reviews if multi-reviewer support
240+
# is needed later. For current use case (prompt optimization), we want:
241+
# - Simple binary state (reviewed/not reviewed)
242+
# - Clear completion tracking without reviewer consensus complexity
243+
# - UI optimized for iteration speed rather than comprehensive evaluation
244+
# If multiple reviewers become common, this can be evolved to support arrays.
245+
reviews_by_case = {}
246+
for review in reviews:
247+
reviews_by_case[review.case_id] = review.__dict__ # Latest review per case
248+
return jsonify({"reviews_by_case": reviews_by_case, "total_reviews": len(reviews)})
249+
231250
def start(self, open_browser: bool = True):
232251
"""Start the web server."""
233252
if open_browser:

0 commit comments

Comments
 (0)