Phase 4: Review status indicators and update functionality (#5)

mattgodbolt · web-flow · commit 76f5219e2e00 · 2025-06-05T15:11:23.000-05:00
## Summary - Complete review management system with visual status indicators, progress tracking, and update functionality - Professional interface for conducting human evaluations of the prompt testing framework ## Test plan - [ ] Load existing review results and verify status indicators appear correctly (green ✅ for reviewed, red ⚪ for pending) - [ ] Verify progress bar shows accurate completion percentage in header - [ ] Test form pre-population with existing review data when clicking on reviewed cases - [ ] Confirm "Save Review" vs "Update Review" button text changes appropriately - [ ] Test real-time status updates when saving new reviews or updating existing ones - [ ] Verify review persistence across page refreshes 🤖 Generated with [Claude Code](https://claude.ai/code)
diff --git a/claude_explain.md b/claude_explain.md
@@ -95,6 +95,10 @@ The prompt testing framework feeds back into the main service by:
 - AWS deployment (handled by Compiler Explorer infrastructure)
 - S3 caching for responses with configurable HTTP Cache-Control headers
 - Cache bypass option for fresh responses
+- Human review integration with web interface for prompt evaluation
+- Interactive review management system with status indicators and progress tracking
+- Automated prompt improvement pipeline using human + AI feedback
+- Version tracking and comparison system for prompt iterations
 
 ### 🔄 In Progress
 - Production API key management
@@ -104,7 +108,6 @@ The prompt testing framework feeds back into the main service by:
 ### 📋 TODO
 - Prompt caching for cost optimization
 - Production monitoring dashboards
-- User feedback collection mechanism
 
 ## Design Decisions
 
@@ -274,10 +277,17 @@ Matt's notes:
 - Pass the explanation type, description, and the audience too (along with explanation) to claude reviewer
 - would be great to validate the YAML and error on broken/missing/extra fields. probably make most fields required and get rid of all the fallbacks like audience etc too
   - probably use pydactic thing to wrap with a strong type?
-- HTML review needs UX work; can't see the comment box at the same time as the thing we're commenting on
-- HTML review nice way to tick off things already done
-- HTML review should use localStorage to save reviewer name and/or get from git
-- UX on HTML review - view the automated output too? (like the nuanced opinion not just numbers)
+COMPLETED ✅:
+- HTML review UX work - completed comprehensive review interface improvements:
+  * Side-by-side code display for better space usage
+  * localStorage integration for reviewer name persistence
+  * 1-5 scale metrics alignment with human evaluation standards
+  * Line-separated input format (more natural than comma-separated)
+  * Visual status indicators showing reviewed vs unreviewed cases
+  * Progress tracking with animated completion bar
+  * Form pre-population with existing review data
+  * Update functionality for modifying existing reviews
+  * Real-time status updates and review management
 
 --- before v4 ---
 
diff --git a/prompt_testing/README.md b/prompt_testing/README.md
@@ -340,11 +340,21 @@ uv run prompt-test run --prompt v1_baseline --compare current --categories basic
    uv run prompt-test run --prompt current --output my_test_results.json
    ```
 
-2. Review results interactively:
+2. Review results interactively via web interface:
    ```bash
    uv run prompt-test review --results-file prompt_testing/results/my_test_results.json
    ```
 
+   **Features:**
+   - Visual status indicators (✅ reviewed, ⚪ pending) with colored borders
+   - Progress tracking with animated completion bar
+   - Side-by-side source code and assembly display
+   - Form pre-population with existing review data
+   - Update functionality for modifying reviews
+   - localStorage persistence for reviewer information
+   - 1-5 scale metrics aligned with human evaluation standards
+   - Line-separated input format for natural feedback entry
+
 3. Analyze review data:
    ```bash
    uv run prompt-test analyze
diff --git a/prompt_testing/WHATS_NEXT.md b/prompt_testing/WHATS_NEXT.md
@@ -8,11 +8,17 @@ This document outlines the next steps for improving the prompt testing framework
 
 ### Web Review Interface (Latest)
 - **Fixed HTML review interface** - Replaced string concatenation with Flask + Jinja2
-- **Added markdown rendering** - AI responses now display with proper formatting using python-markdown
+- **Added markdown rendering** - AI responses now display with proper formatting using client-side marked.js
 - **Fixed template errors** - Resolved "dict has no attribute request" by enriching results with test case data
 - **Improved result descriptions** - Clear labels like "Current Production Prompt - 12 cases" instead of "unknown"
 - **Added CSS styling** - Proper code block, header, and list formatting
 - **Interactive web server** - `uv run prompt-test review --interactive` launches Flask app on localhost:5001
+- **COMPLETED: Quality of Life Improvements** ✅
+  * Phase 1: localStorage reviewer persistence + 1-5 metrics scale alignment
+  * Phase 2: Side-by-side source/assembly code display with responsive grid
+  * Phase 3: Line-separated input format (more natural than comma-separated)
+  * Phase 4: Review status indicators + progress tracking + update functionality
+  * Professional review management system with visual status, form pre-population, and real-time updates
 
 ### Prompt Improvement System Audit & Fixes
 - **Fixed critical "current" prompt loading bug** - PromptOptimizer now handles "current" → `app/prompt.yaml` mapping
diff --git a/prompt_testing/templates/review.html b/prompt_testing/templates/review.html
@@ -18,6 +18,13 @@
         .case-card {
             background: white; margin-bottom: 30px; border-radius: 8px;
             box-shadow: 0 2px 4px rgba(0,0,0,0.1); overflow: hidden;
+            position: relative;
+        }
+        .case-card.reviewed {
+            border-left: 5px solid #16a34a;
+        }
+        .case-card.not-reviewed {
+            border-left: 5px solid #dc2626;
         }
         .case-header {
             background: #f8f9fa; padding: 15px 20px;
@@ -26,6 +33,24 @@
         }
         .case-id { font-weight: 600; color: #333; }
         .case-meta { font-size: 0.9em; color: #666; }
+        .review-status {
+            display: inline-flex; align-items: center; gap: 8px;
+            padding: 4px 12px; border-radius: 16px; font-size: 0.8em; font-weight: 500;
+        }
+        .review-status.reviewed {
+            background: #dcfce7; color: #166534;
+        }
+        .review-status.not-reviewed {
+            background: #fef2f2; color: #dc2626;
+        }
+        .progress-bar {
+            background: #f3f4f6; border-radius: 8px; height: 8px; margin-top: 10px;
+            overflow: hidden;
+        }
+        .progress-fill {
+            background: linear-gradient(90deg, #16a34a, #22c55e);
+            height: 100%; transition: width 0.3s ease;
+        }
         .case-content { padding: 20px; }
         .section { margin-bottom: 25px; }
         .section h4 {
@@ -166,6 +191,13 @@ <h1>📝 Review: {{ data.summary.prompt_version if data.summary else 'Unknown' }
                {{ "%.2f"|format(data.summary.average_metrics.overall_score) }}</p>
             {% endif %}
             {% endif %}
+
+            <div id="review-progress" style="display: none;">
+                <p><strong>Review Progress:</strong> <span id="progress-text">Loading...</span></p>
+                <div class="progress-bar">
+                    <div class="progress-fill" id="progress-fill" style="width: 0%"></div>
+                </div>
+            </div>
         </div>
 
         <div class="reviewer-info">
@@ -185,11 +217,16 @@ <h4>👤 Reviewer Information</h4>
         {% if result.success %}
         <div class="case-card">
             <div class="case-header">
-                <div class="case-id">{{ result.case_id }}</div>
-                <div class="case-meta">
-                    {{ result.test_case.language }} | {{ result.test_case.compiler }} |
-                    Audience: {{ result.test_case.audience }} |
-                    Type: {{ result.test_case.explanation_type }}
+                <div>
+                    <div class="case-id">{{ result.case_id }}</div>
+                    <div class="case-meta">
+                        {{ result.test_case.language }} | {{ result.test_case.compiler }} |
+                        Audience: {{ result.test_case.audience }} |
+                        Type: {{ result.test_case.explanation_type }}
+                    </div>
+                </div>
+                <div class="review-status not-reviewed" id="status-{{ result.case_id }}">
+                    <span>⚪</span> Not Reviewed
                 </div>
             </div>
 
@@ -351,6 +388,10 @@ <h4>✏️ Your Review</h4>
             breaks: true
         });
 
+        // Global state for existing reviews
+        let existingReviews = {};
+        let reviewProgress = { total: 0, reviewed: 0 };
+
         // Initialize page on load
         document.addEventListener('DOMContentLoaded', function() {
             // Load saved reviewer name from localStorage
@@ -364,6 +405,9 @@ <h4>✏️ Your Review</h4>
                 localStorage.setItem('reviewerName', this.value);
             });
 
+            // Load existing reviews for this prompt version
+            loadExistingReviews();
+
             // Render all markdown responses
             {% for result in data.results %}
             {% if result.success %}
@@ -376,6 +420,116 @@ <h4>✏️ Your Review</h4>
             {% endfor %}
         });
 
+        async function loadExistingReviews() {
+            try {
+                const promptVersion = '{{ data.summary.prompt_version if data.summary else "unknown" }}';
+                const response = await fetch(`/api/reviews/prompt/${encodeURIComponent(promptVersion)}`);
+                const data = await response.json();
+
+                existingReviews = data.reviews_by_case || {};
+
+                // Update UI for each case
+                const caseCards = document.querySelectorAll('.case-card');
+                reviewProgress.total = caseCards.length;
+                reviewProgress.reviewed = 0;
+
+                caseCards.forEach(card => {
+                    const caseId = extractCaseIdFromCard(card);
+                    if (caseId && existingReviews[caseId]) {
+                        updateCaseReviewStatus(caseId, true, existingReviews[caseId]);
+                        reviewProgress.reviewed++;
+                    } else if (caseId) {
+                        updateCaseReviewStatus(caseId, false);
+                    }
+                });
+
+                updateProgressIndicator();
+
+            } catch (error) {
+                console.error('Failed to load existing reviews:', error);
+            }
+        }
+
+        function extractCaseIdFromCard(card) {
+            const statusElement = card.querySelector('[id^="status-"]');
+            if (statusElement) {
+                return statusElement.id.replace('status-', '');
+            }
+            return null;
+        }
+
+        function updateCaseReviewStatus(caseId, isReviewed, reviewData = null) {
+            const card = document.querySelector(`#status-${caseId}`).closest('.case-card');
+            const statusElement = document.getElementById(`status-${caseId}`);
+            const saveButton = document.querySelector(`button[onclick="saveReview('${caseId}')"]`);
+
+            if (isReviewed) {
+                // Mark as reviewed
+                card.classList.remove('not-reviewed');
+                card.classList.add('reviewed');
+                statusElement.classList.remove('not-reviewed');
+                statusElement.classList.add('reviewed');
+                statusElement.innerHTML = '<span>✅</span> Reviewed';
+
+                if (saveButton) {
+                    saveButton.textContent = `Update Review for ${caseId}`;
+                }
+
+                // Pre-populate form if review data available
+                if (reviewData) {
+                    populateReviewForm(caseId, reviewData);
+                }
+            } else {
+                // Mark as not reviewed
+                card.classList.remove('reviewed');
+                card.classList.add('not-reviewed');
+                statusElement.classList.remove('reviewed');
+                statusElement.classList.add('not-reviewed');
+                statusElement.innerHTML = '<span>⚪</span> Not Reviewed';
+
+                if (saveButton) {
+                    saveButton.textContent = `Save Review for ${caseId}`;
+                }
+            }
+        }
+
+        function populateReviewForm(caseId, reviewData) {
+            // Populate numeric scores
+            const fields = ['accuracy', 'relevance', 'conciseness', 'insight', 'appropriateness'];
+            fields.forEach(field => {
+                const input = document.querySelector(`input[name="${field}"][data-case="${caseId}"]`);
+                if (input && reviewData[field]) {
+                    input.value = reviewData[field];
+                }
+            });
+
+            // Populate text areas
+            const textFields = ['strengths', 'weaknesses', 'suggestions', 'overall_comments'];
+            textFields.forEach(field => {
+                const textarea = document.querySelector(`textarea[name="${field}"][data-case="${caseId}"]`);
+                if (textarea && reviewData[field]) {
+                    if (Array.isArray(reviewData[field])) {
+                        textarea.value = reviewData[field].join('\n');
+                    } else {
+                        textarea.value = reviewData[field];
+                    }
+                }
+            });
+        }
+
+        function updateProgressIndicator() {
+            const progressContainer = document.getElementById('review-progress');
+            const progressText = document.getElementById('progress-text');
+            const progressFill = document.getElementById('progress-fill');
+
+            if (reviewProgress.total > 0) {
+                const percentage = Math.round((reviewProgress.reviewed / reviewProgress.total) * 100);
+                progressText.textContent = `${reviewProgress.reviewed}/${reviewProgress.total} cases reviewed (${percentage}%)`;
+                progressFill.style.width = `${percentage}%`;
+                progressContainer.style.display = 'block';
+            }
+        }
+
         async function saveReview(caseId) {
             const reviewerName = document.getElementById('reviewer-name').value.trim();
             if (!reviewerName) {
@@ -429,11 +583,21 @@ <h4>✏️ Your Review</h4>
                 const result = await response.json();
 
                 if (result.success) {
-                    messageEl.textContent = '✅ Saved!';
+                    const wasUpdate = existingReviews[caseId] !== undefined;
+                    messageEl.textContent = wasUpdate ? '✅ Updated!' : '✅ Saved!';
                     messageEl.className = 'success-msg';
 
-                    const inputs = document.querySelectorAll(`[data-case="${caseId}"]`);
-                    inputs.forEach(input => input.disabled = true);
+                    // Update the global state
+                    existingReviews[caseId] = formData;
+
+                    // Update UI to show reviewed status
+                    if (!wasUpdate) {
+                        reviewProgress.reviewed++;
+                    }
+                    updateCaseReviewStatus(caseId, true, formData);
+                    updateProgressIndicator();
+
+                    // Don't disable inputs for updates, allow further editing
                 } else {
                     throw new Error(result.error || 'Failed to save review');
                 }
@@ -443,7 +607,9 @@ <h4>✏️ Your Review</h4>
                 messageEl.className = 'error-msg';
             } finally {
                 button.disabled = false;
-                button.textContent = `Save Review for ${caseId}`;
+                // Update button text based on current review status
+                const isReviewed = existingReviews[caseId] !== undefined;
+                button.textContent = isReviewed ? `Update Review for ${caseId}` : `Save Review for ${caseId}`;
             }
         }
     </script>
diff --git a/prompt_testing/web_review.py b/prompt_testing/web_review.py
@@ -228,6 +228,25 @@ def get_reviews(case_id):
             reviews = self.review_manager.load_reviews(case_id=case_id)
             return jsonify({"reviews": [review.__dict__ for review in reviews]})
 
+        @self.app.route("/api/reviews/prompt/<prompt_version>")
+        def get_reviews_for_prompt(prompt_version):
+            """Get all existing reviews for a specific prompt version."""
+            reviews = self.review_manager.load_reviews(prompt_version=prompt_version)
+            # Group reviews by case_id for easier frontend consumption
+            #
+            # Design Note: We use direct assignment (latest review wins) rather than arrays
+            # because this UI is designed for single-reviewer prompt iteration workflows.
+            # The underlying JSONL storage preserves all reviews if multi-reviewer support
+            # is needed later. For current use case (prompt optimization), we want:
+            # - Simple binary state (reviewed/not reviewed)
+            # - Clear completion tracking without reviewer consensus complexity
+            # - UI optimized for iteration speed rather than comprehensive evaluation
+            # If multiple reviewers become common, this can be evolved to support arrays.
+            reviews_by_case = {}
+            for review in reviews:
+                reviews_by_case[review.case_id] = review.__dict__  # Latest review per case
+            return jsonify({"reviews_by_case": reviews_by_case, "total_reviews": len(reviews)})
+
     def start(self, open_browser: bool = True):
         """Start the web server."""
         if open_browser: