Skip to content

Commit 3504943

Browse files
Copilotblackpiglet
andcommitted
Add AI-generated issue detection system with workflow and documentation
Co-authored-by: blackpiglet <59276555+blackpiglet@users.noreply.github.com>
1 parent acd4d5b commit 3504943

4 files changed

Lines changed: 257 additions & 0 deletions

File tree

.github/AI-DETECTION-README.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# AI-Generated Content Detection
2+
3+
This directory contains the AI-generated content detection system for Velero issues.
4+
5+
## Overview
6+
7+
The Velero project has implemented automated detection of potentially AI-generated issues to help maintain quality and ensure that issues describe real, verified problems.
8+
9+
## How It Works
10+
11+
### Detection Workflow
12+
13+
The workflow (`.github/workflows/ai-issue-detector.yml`) runs automatically when:
14+
- A new issue is opened
15+
- An existing issue is edited
16+
17+
### Detection Patterns
18+
19+
The detector analyzes issues for several AI-generation patterns:
20+
21+
1. **Excessive Tables** - More than 5 markdown tables
22+
2. **Excessive Headers** - More than 8 consecutive section headers
23+
3. **Formal Phrases** - Multiple formal section headers typical of AI (e.g., "Root Cause Analysis", "Operational Impact", "Expected Permanent Solution")
24+
4. **Excessive Formatting** - Multiple horizontal rules and perfect formatting
25+
5. **Future Dates** - Version numbers or dates that are unrealistic or in the future
26+
6. **Perfect Formatting** - Overly structured tables with perfect alignment
27+
7. **AI Section Headers** - Generic AI-style headers like "Critical Problem", "Resolution Attempts"
28+
8. **Generic Solutions** - Auto-generated solution patterns with multiple YAML examples
29+
30+
### Scoring System
31+
32+
Each detected pattern adds to the AI score. If the score is 3 or higher (out of 8), the issue is flagged as potentially AI-generated.
33+
34+
### Actions Taken
35+
36+
When an issue is flagged:
37+
1. A `potential-ai-generated` label is added
38+
2. A `needs-triage` label is added
39+
3. An automated comment is posted explaining:
40+
- Why the issue was flagged
41+
- What patterns were detected
42+
- Guidelines for contributors to follow
43+
- Request for verification
44+
45+
## For Contributors
46+
47+
If your issue is flagged:
48+
49+
1. **Don't panic** - This is not an accusation, just a request for verification
50+
2. **Review the guidelines** in our [Code Standards](../site/content/docs/main/code-standards.md#ai-generated-content)
51+
3. **Verify your content**:
52+
- Ensure all version numbers are accurate
53+
- Confirm error messages are from your actual environment
54+
- Remove any placeholder or example content
55+
- Simplify overly structured formatting
56+
4. **Update the issue** with corrections if needed
57+
5. **Comment to confirm** that the issue describes a real problem
58+
59+
## For Maintainers
60+
61+
When reviewing flagged issues:
62+
63+
1. Check if the technical details are realistic and verifiable
64+
2. Look for signs of hallucinated content (fake version numbers, non-existent features)
65+
3. Engage with the issue author to verify the problem
66+
4. Remove the `potential-ai-generated` label once verified
67+
5. Close issues that cannot be verified or describe non-existent problems
68+
69+
## Configuration
70+
71+
The detection patterns can be adjusted in the workflow file if needed. The threshold is currently set at 3 out of 8 patterns to balance false positives with detection accuracy.
72+
73+
## False Positives
74+
75+
The detector may occasionally flag legitimate issues, especially those that are:
76+
- Very detailed and well-structured
77+
- Using formal technical documentation style
78+
- Reporting complex problems with extensive details
79+
80+
This is intentional - we prefer to verify detailed issues rather than miss AI-generated ones.

.github/labels.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,4 @@ kind:
4141
- tech-debt
4242
- usage-error
4343
- voting
44+
- potential-ai-generated
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
name: "Detect AI-Generated Issues"
2+
3+
on:
4+
issues:
5+
types: [opened, edited]
6+
7+
jobs:
8+
detect-ai-content:
9+
runs-on: ubuntu-latest
10+
permissions:
11+
issues: write
12+
contents: read
13+
steps:
14+
- name: Checkout repository
15+
uses: actions/checkout@v4
16+
17+
- name: Analyze issue for AI-generated content
18+
id: analyze
19+
uses: actions/github-script@v7
20+
with:
21+
github-token: ${{ secrets.GITHUB_TOKEN }}
22+
script: |
23+
const issue = context.payload.issue;
24+
const issueBody = issue.body || '';
25+
const issueTitle = issue.title || '';
26+
27+
// AI detection patterns
28+
const aiPatterns = {
29+
// Overly structured markdown with extensive tables
30+
excessiveTables: (issueBody.match(/\|.*\|/g) || []).length > 5,
31+
32+
// Multiple consecutive headers with consistent formatting
33+
excessiveHeaders: (issueBody.match(/^#{1,6}\s+/gm) || []).length > 8,
34+
35+
// Overly formal language patterns common in AI
36+
formalPhrases: [
37+
'Root Cause Analysis',
38+
'Operational Impact',
39+
'Expected Permanent Solution',
40+
'Questions for Maintainers',
41+
'Labels and Metadata',
42+
'Reference Files',
43+
'Steps to Reproduce'
44+
].filter(phrase => issueBody.includes(phrase)).length > 4,
45+
46+
// Excessive use of emojis or special characters
47+
excessiveFormatting: issueBody.includes('---\n \n---') ||
48+
(issueBody.match(/---/g) || []).length > 4,
49+
50+
// Unrealistic version numbers or dates in the future
51+
futureDates: /202[6-9]|203\d/.test(issueBody),
52+
53+
// Overly detailed technical specs with perfect formatting
54+
perfectFormatting: issueBody.includes('| Parameter | Value |') &&
55+
issueBody.includes('| Aspect | Status | Impact |'),
56+
57+
// Generic AI-style section headers
58+
aiSectionHeaders: [
59+
'## Description',
60+
'## Critical Problem',
61+
'## Affected Environment',
62+
'## Full Helm Configuration',
63+
'## Resolution Attempts',
64+
'## Related Information'
65+
].filter(header => issueBody.includes(header)).length > 4,
66+
67+
// Unusual specificity combined with generic solutions
68+
genericSolutions: issueBody.includes('auto-detect') &&
69+
issueBody.includes('configuration:') &&
70+
(issueBody.match(/```yaml/g) || []).length > 2
71+
};
72+
73+
// Calculate AI score
74+
let aiScore = 0;
75+
let detectedPatterns = [];
76+
77+
for (const [pattern, detected] of Object.entries(aiPatterns)) {
78+
if (detected) {
79+
aiScore++;
80+
detectedPatterns.push(pattern);
81+
}
82+
}
83+
84+
console.log(`AI Score: ${aiScore}/8`);
85+
console.log(`Detected patterns: ${detectedPatterns.join(', ')}`);
86+
87+
// If AI score is high, add label and comment
88+
if (aiScore >= 3) {
89+
// Add label
90+
try {
91+
await github.rest.issues.addLabels({
92+
owner: context.repo.owner,
93+
repo: context.repo.repo,
94+
issue_number: issue.number,
95+
labels: ['needs-triage', 'potential-ai-generated']
96+
});
97+
98+
// Add comment
99+
const comment = `👋 Thank you for opening this issue!
100+
101+
This issue has been flagged for review as it may contain AI-generated content (confidence: ${Math.round(aiScore/8 * 100)}%).
102+
103+
**Detected patterns:** ${detectedPatterns.join(', ')}
104+
105+
If this issue was created with AI assistance, please review our [AI contribution guidelines](https://github.com/${context.repo.owner}/${context.repo.repo}/blob/main/CONTRIBUTING.md#ai-generated-content).
106+
107+
**Important:**
108+
- Please verify all technical details are accurate
109+
- Ensure version numbers, dates, and configurations reflect your actual environment
110+
- Remove any placeholder or example content
111+
- Confirm the issue is reproducible in your environment
112+
113+
A maintainer will review this issue shortly. If this was flagged in error, please let us know!`;
114+
115+
await github.rest.issues.createComment({
116+
owner: context.repo.owner,
117+
repo: context.repo.repo,
118+
issue_number: issue.number,
119+
body: comment
120+
});
121+
122+
core.setOutput('ai-detected', 'true');
123+
core.setOutput('ai-score', aiScore);
124+
} catch (error) {
125+
console.log('Error adding label or comment:', error);
126+
}
127+
} else {
128+
core.setOutput('ai-detected', 'false');
129+
core.setOutput('ai-score', aiScore);
130+
}
131+
132+
return {
133+
aiDetected: aiScore >= 3,
134+
score: aiScore,
135+
patterns: detectedPatterns
136+
};

site/content/docs/main/code-standards.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,46 @@ A command to do this is `make new-changelog CHANGELOG_BODY="Changes you have mad
4242

4343
If a PR does not warrant a changelog, the CI check for a changelog can be skipped by applying a `changelog-not-required` label on the PR. If you are making a PR on a release branch, you should still make a new file in the `changelogs/unreleased` folder on the release branch for your change.
4444

45+
## AI-Generated Content
46+
47+
We welcome contributions from all developers, including those who use AI tools to assist in their work. However, to maintain code quality and ensure contributions are accurate and appropriate, please follow these guidelines:
48+
49+
### Using AI Assistance
50+
51+
**Acceptable use:**
52+
- Using AI tools (like GitHub Copilot, ChatGPT, Claude, etc.) to generate scaffolding or boilerplate code
53+
- Getting AI assistance for writing unit tests
54+
- Using AI to help understand complex code patterns
55+
- AI-assisted debugging and problem-solving
56+
- Using AI to help with documentation writing
57+
58+
**Requirements when using AI:**
59+
1. **Always review and verify** AI-generated content before submitting
60+
2. **Test thoroughly** - ensure the code works as expected in your environment
61+
3. **Verify technical accuracy** - check that all version numbers, configurations, and technical details are correct
62+
4. **Remove placeholders** - ensure there are no example or placeholder content
63+
5. **Understand the code** - be able to explain and defend your changes during code review
64+
6. **Disclose AI usage** - if a significant portion of your PR was AI-generated, mention it in the PR description
65+
66+
### What to Avoid
67+
68+
**Unacceptable practices:**
69+
- Submitting entirely AI-generated PRs or issues without review or verification
70+
- Including hallucinated information (false version numbers, non-existent APIs, etc.)
71+
- Copying AI-generated content with placeholder or example data
72+
- Submitting AI-generated issues describing problems you haven't actually experienced
73+
- Using AI to generate issues about features or bugs without verifying they exist
74+
75+
### For Issues
76+
77+
When creating issues with AI assistance:
78+
- Ensure the issue describes a **real problem** you have experienced
79+
- Verify all version numbers, error messages, and configurations are from your actual environment
80+
- Remove any AI-generated boilerplate or overly formal structure
81+
- Focus on clarity and accuracy over comprehensive formatting
82+
83+
Issues that appear to be entirely AI-generated without proper verification may be labeled as `potential-ai-generated` and flagged for additional review.
84+
4585
## Copyright header
4686

4787
Whenever a source code file is being modified, the copyright notice should be updated to our standard copyright notice. That is, it should read “Copyright the Velero contributors.”

0 commit comments

Comments
 (0)