Add AI-generated issue detection system with workflow and documentation

Copilot · blackpiglet · Copilot · commit 350494301925 · 2026-02-02T03:22:43.000Z
Co-authored-by: blackpiglet &lt;59276555+blackpiglet@users.noreply.github.com&gt;
diff --git a/.github/AI-DETECTION-README.md b/.github/AI-DETECTION-README.md
@@ -0,0 +1,80 @@
+# AI-Generated Content Detection
+
+This directory contains the AI-generated content detection system for Velero issues.
+
+## Overview
+
+The Velero project has implemented automated detection of potentially AI-generated issues to help maintain quality and ensure that issues describe real, verified problems.
+
+## How It Works
+
+### Detection Workflow
+
+The workflow (`.github/workflows/ai-issue-detector.yml`) runs automatically when:
+- A new issue is opened
+- An existing issue is edited
+
+### Detection Patterns
+
+The detector analyzes issues for several AI-generation patterns:
+
+1. **Excessive Tables** - More than 5 markdown tables
+2. **Excessive Headers** - More than 8 consecutive section headers
+3. **Formal Phrases** - Multiple formal section headers typical of AI (e.g., "Root Cause Analysis", "Operational Impact", "Expected Permanent Solution")
+4. **Excessive Formatting** - Multiple horizontal rules and perfect formatting
+5. **Future Dates** - Version numbers or dates that are unrealistic or in the future
+6. **Perfect Formatting** - Overly structured tables with perfect alignment
+7. **AI Section Headers** - Generic AI-style headers like "Critical Problem", "Resolution Attempts"
+8. **Generic Solutions** - Auto-generated solution patterns with multiple YAML examples
+
+### Scoring System
+
+Each detected pattern adds to the AI score. If the score is 3 or higher (out of 8), the issue is flagged as potentially AI-generated.
+
+### Actions Taken
+
+When an issue is flagged:
+1. A `potential-ai-generated` label is added
+2. A `needs-triage` label is added
+3. An automated comment is posted explaining:
+   - Why the issue was flagged
+   - What patterns were detected
+   - Guidelines for contributors to follow
+   - Request for verification
+
+## For Contributors
+
+If your issue is flagged:
+
+1. **Don't panic** - This is not an accusation, just a request for verification
+2. **Review the guidelines** in our [Code Standards](../site/content/docs/main/code-standards.md#ai-generated-content)
+3. **Verify your content**:
+   - Ensure all version numbers are accurate
+   - Confirm error messages are from your actual environment
+   - Remove any placeholder or example content
+   - Simplify overly structured formatting
+4. **Update the issue** with corrections if needed
+5. **Comment to confirm** that the issue describes a real problem
+
+## For Maintainers
+
+When reviewing flagged issues:
+
+1. Check if the technical details are realistic and verifiable
+2. Look for signs of hallucinated content (fake version numbers, non-existent features)
+3. Engage with the issue author to verify the problem
+4. Remove the `potential-ai-generated` label once verified
+5. Close issues that cannot be verified or describe non-existent problems
+
+## Configuration
+
+The detection patterns can be adjusted in the workflow file if needed. The threshold is currently set at 3 out of 8 patterns to balance false positives with detection accuracy.
+
+## False Positives
+
+The detector may occasionally flag legitimate issues, especially those that are:
+- Very detailed and well-structured
+- Using formal technical documentation style
+- Reporting complex problems with extensive details
+
+This is intentional - we prefer to verify detailed issues rather than miss AI-generated ones.
diff --git a/.github/labels.yaml b/.github/labels.yaml
@@ -41,3 +41,4 @@ kind:
   - tech-debt
   - usage-error
   - voting
+  - potential-ai-generated
diff --git a/.github/workflows/ai-issue-detector.yml b/.github/workflows/ai-issue-detector.yml
@@ -0,0 +1,136 @@
+name: "Detect AI-Generated Issues"
+
+on:
+  issues:
+    types: [opened, edited]
+
+jobs:
+  detect-ai-content:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      contents: read
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Analyze issue for AI-generated content
+        id: analyze
+        uses: actions/github-script@v7
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const issue = context.payload.issue;
+            const issueBody = issue.body || '';
+            const issueTitle = issue.title || '';
+            
+            // AI detection patterns
+            const aiPatterns = {
+              // Overly structured markdown with extensive tables
+              excessiveTables: (issueBody.match(/\|.*\|/g) || []).length > 5,
+              
+              // Multiple consecutive headers with consistent formatting
+              excessiveHeaders: (issueBody.match(/^#{1,6}\s+/gm) || []).length > 8,
+              
+              // Overly formal language patterns common in AI
+              formalPhrases: [
+                'Root Cause Analysis',
+                'Operational Impact',
+                'Expected Permanent Solution',
+                'Questions for Maintainers',
+                'Labels and Metadata',
+                'Reference Files',
+                'Steps to Reproduce'
+              ].filter(phrase => issueBody.includes(phrase)).length > 4,
+              
+              // Excessive use of emojis or special characters
+              excessiveFormatting: issueBody.includes('---\n \n---') || 
+                                   (issueBody.match(/---/g) || []).length > 4,
+              
+              // Unrealistic version numbers or dates in the future
+              futureDates: /202[6-9]|203\d/.test(issueBody),
+              
+              // Overly detailed technical specs with perfect formatting
+              perfectFormatting: issueBody.includes('| Parameter | Value |') &&
+                                issueBody.includes('| Aspect | Status | Impact |'),
+              
+              // Generic AI-style section headers
+              aiSectionHeaders: [
+                '## Description',
+                '## Critical Problem', 
+                '## Affected Environment',
+                '## Full Helm Configuration',
+                '## Resolution Attempts',
+                '## Related Information'
+              ].filter(header => issueBody.includes(header)).length > 4,
+              
+              // Unusual specificity combined with generic solutions
+              genericSolutions: issueBody.includes('auto-detect') && 
+                               issueBody.includes('configuration:') &&
+                               (issueBody.match(/```yaml/g) || []).length > 2
+            };
+            
+            // Calculate AI score
+            let aiScore = 0;
+            let detectedPatterns = [];
+            
+            for (const [pattern, detected] of Object.entries(aiPatterns)) {
+              if (detected) {
+                aiScore++;
+                detectedPatterns.push(pattern);
+              }
+            }
+            
+            console.log(`AI Score: ${aiScore}/8`);
+            console.log(`Detected patterns: ${detectedPatterns.join(', ')}`);
+            
+            // If AI score is high, add label and comment
+            if (aiScore >= 3) {
+              // Add label
+              try {
+                await github.rest.issues.addLabels({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: issue.number,
+                  labels: ['needs-triage', 'potential-ai-generated']
+                });
+                
+                // Add comment
+                const comment = `👋 Thank you for opening this issue!
+
+This issue has been flagged for review as it may contain AI-generated content (confidence: ${Math.round(aiScore/8 * 100)}%).
+
+**Detected patterns:** ${detectedPatterns.join(', ')}
+
+If this issue was created with AI assistance, please review our [AI contribution guidelines](https://github.com/${context.repo.owner}/${context.repo.repo}/blob/main/CONTRIBUTING.md#ai-generated-content).
+
+**Important:**
+- Please verify all technical details are accurate
+- Ensure version numbers, dates, and configurations reflect your actual environment
+- Remove any placeholder or example content
+- Confirm the issue is reproducible in your environment
+
+A maintainer will review this issue shortly. If this was flagged in error, please let us know!`;
+
+                await github.rest.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: issue.number,
+                  body: comment
+                });
+                
+                core.setOutput('ai-detected', 'true');
+                core.setOutput('ai-score', aiScore);
+              } catch (error) {
+                console.log('Error adding label or comment:', error);
+              }
+            } else {
+              core.setOutput('ai-detected', 'false');
+              core.setOutput('ai-score', aiScore);
+            }
+            
+            return {
+              aiDetected: aiScore >= 3,
+              score: aiScore,
+              patterns: detectedPatterns
+            };
diff --git a/site/content/docs/main/code-standards.md b/site/content/docs/main/code-standards.md
@@ -42,6 +42,46 @@ A command to do this is `make new-changelog CHANGELOG_BODY="Changes you have mad
 
 If a PR does not warrant a changelog, the CI check for a changelog can be skipped by applying a `changelog-not-required` label on the PR. If you are making a PR on a release branch, you should still make a new file in the `changelogs/unreleased` folder on the release branch for your change. 
 
+## AI-Generated Content
+
+We welcome contributions from all developers, including those who use AI tools to assist in their work. However, to maintain code quality and ensure contributions are accurate and appropriate, please follow these guidelines:
+
+### Using AI Assistance
+
+**Acceptable use:**
+- Using AI tools (like GitHub Copilot, ChatGPT, Claude, etc.) to generate scaffolding or boilerplate code
+- Getting AI assistance for writing unit tests
+- Using AI to help understand complex code patterns
+- AI-assisted debugging and problem-solving
+- Using AI to help with documentation writing
+
+**Requirements when using AI:**
+1. **Always review and verify** AI-generated content before submitting
+2. **Test thoroughly** - ensure the code works as expected in your environment
+3. **Verify technical accuracy** - check that all version numbers, configurations, and technical details are correct
+4. **Remove placeholders** - ensure there are no example or placeholder content
+5. **Understand the code** - be able to explain and defend your changes during code review
+6. **Disclose AI usage** - if a significant portion of your PR was AI-generated, mention it in the PR description
+
+### What to Avoid
+
+**Unacceptable practices:**
+- Submitting entirely AI-generated PRs or issues without review or verification
+- Including hallucinated information (false version numbers, non-existent APIs, etc.)
+- Copying AI-generated content with placeholder or example data
+- Submitting AI-generated issues describing problems you haven't actually experienced
+- Using AI to generate issues about features or bugs without verifying they exist
+
+### For Issues
+
+When creating issues with AI assistance:
+- Ensure the issue describes a **real problem** you have experienced
+- Verify all version numbers, error messages, and configurations are from your actual environment
+- Remove any AI-generated boilerplate or overly formal structure
+- Focus on clarity and accuracy over comprehensive formatting
+
+Issues that appear to be entirely AI-generated without proper verification may be labeled as `potential-ai-generated` and flagged for additional review.
+
 ## Copyright header
 
 Whenever a source code file is being modified, the copyright notice should be updated to our standard copyright notice. That is, it should read “Copyright the Velero contributors.”