compiler-explorer

mirror of https://github.com/compiler-explorer/compiler-explorer.git synced 2026-05-16 14:53:04 -04:00

Author	SHA1	Message	Date
Matt Godbolt	3fc36bb322	Tweaks	2025-10-07 16:50:14 -05:00
Matt Godbolt	fd39af7be8	Add AI-powered duplicate detection with strict filtering (#8176 ) ## Summary Adds optional AI-powered duplicate detection using Claude API to eliminate false positives from string similarity matching. The AI analyzes candidate groups with strict rules and only confirms true duplicates. ## Problem The tag-stripping improvement (#8175) reduced false positives significantly, but string similarity still produces noise: - Different assemblers grouped together: "fasm", "YASM", "AsmX" (all different tools) - Unrelated features: "language tooltips" vs "language detection" (different features) - Specific vs general requests: "EWARM" vs "ARM execution" (specific toolchain vs general support) Before AI filtering: 63 groups with ~60% false positive rate ## Solution ### Two-phase detection: 1. Broadphase: String similarity (fast, high recall) creates candidate groups 2. AI Refinement: Claude Sonnet 4 applies strict rules to confirm duplicates ### Strict AI rules: - ✓ Duplicates: Same tool with spelling/version variants ("NumPy" = "numpy", "GCC 13" = "GCC 13.1") - ✗ NOT duplicates: Different named tools ("fasm" ≠ "YASM"), related features ("tooltips" ≠ "detection") ### Features: - Optional `--use-ai` flag (requires `ANTHROPIC_API_KEY` in `.env` file) - Adjustable confidence threshold with `--ai-confidence` (default: 0.7) - AI reasoning included in markdown reports for transparency - Graceful fallback if API key not available ## Results Testing on 843 open CE issues: \| Phase \| Groups \| Quality \| \|-------\|--------\|---------\| \| Broadphase (string similarity) \| 63 \| ~40% accurate \| \| AI filtering \| 5 \| 100% accurate \| ### Confirmed duplicates found: 1. Forth language requests (identical) 2. Documentation out-of-date reports (#5937 + #4906) 3. objdump tool requests (#4633 + #3139) 4. Haskell vector library requests 5. Make/webpack build issues (same bug) ### False positives eliminated: - ✗ fasm/YASM/AsmX (different assemblers) - ✗ Language tooltips vs detection (different features) - ✗ ARM vs EWARM execution (general vs specific) - ✗ Lua vs LUAU (related but different languages) - ✗ OpenBLAS vs OpenSSL (different libraries) ## Cost Analysis - 63 groups × ~400-500 tokens/group ≈ 25-30k tokens per run - Cost: ~$0.15 per run with Sonnet 4 (based on actual usage) - Runs only when `--use-ai` flag is used - Very affordable for occasional duplicate detection ## Configuration Create `.env` file in `etc/scripts/gh_tool/`: ```bash ANTHROPIC_API_KEY=sk-ant-... ``` The `.env` file is gitignored for security. ## Example Usage ```bash # Standard detection (no AI) uv run gh_tool find-duplicates /tmp/report.md # AI-powered detection uv run gh_tool find-duplicates /tmp/report.md --use-ai # Adjust AI confidence threshold uv run gh_tool find-duplicates /tmp/report.md --use-ai --ai-confidence 0.8 ``` ## Dependencies Added - `anthropic>=0.40.0` - Claude API SDK - `python-dotenv>=1.0.0` - Environment variable management ## Test Plan - [x] All existing tests pass - [x] Tested on 843 real CE issues with 100% accuracy - [x] Graceful fallback when API key not available - [x] AI reasoning included in markdown reports - [x] Code passes ruff linting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-10-07 16:29:12 -05:00
Matt Godbolt	70df51df29	Improve duplicate issue detection by stripping category tags (#8175 ) ## Summary Fixes the duplicate issue detection algorithm to strip `[TAGS]` from issue titles before calculating similarity. This eliminates massive false-positive groups caused by shared tag prefixes like `[LIB REQUEST]` or `[COMPILER REQUEST]`. ## Problem The previous implementation would create groups of 98+ completely unrelated issues just because they shared common tag prefixes. For example: - `[LIB REQUEST] Add ULib Library` - `[LIB REQUEST] musl vs glibc` - `[REQUEST] Float explorer support` - `[REQUEST] Support logging in` These would all be grouped together despite being completely different requests. ## Solution - Strip `[TAGS]` before calculating text similarity using a compiled regex pattern - Compare only the actual content: "Add ULib Library" vs "musl vs glibc" → low similarity ✓ ## Additional Changes - Added `ruff` as a project dependency for consistent code quality - Fixed linting issues (unused imports, updated to `datetime.UTC`) - Updated tests to reflect new tag-stripping behavior ## Results Testing on actual CE issues shows dramatic improvement: - Before: 83 groups, with Group 1 containing 98 unrelated issues (98% false positives) - After: 63 groups, with Group 1 containing 2 legitimate "Forth" duplicates (actual duplicates) Most groups are now legitimate duplicates like: - Three "Problem with [opcode]" bugs - Two TI ARM compiler requests - Multiple MSVC version requests ## Test Plan - [x] All existing tests pass - [x] Tested on real CE issue data showing 20+ group reduction - [x] Code passes ruff linting and formatting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-10-07 15:44:56 -05:00
Matt Godbolt	4cb1416c2a	Add gh_tool CLI for GitHub repository automation (#8170 ) This PR adds a new Python CLI tool for automating GitHub repository management tasks. ## Overview The initial implementation provides duplicate issue detection using text similarity analysis. This is the first step toward automating repository triage tasks. ## Features - Click-based CLI with subcommands for future extensibility - find-duplicates command for detecting duplicate issues using text similarity - Uses gh CLI for GitHub API access (no token management needed) - Text similarity using `difflib.SequenceMatcher` (ratio-based algorithm) - Configurable similarity threshold (default: 0.6) - Progress bar for long-running comparisons - Age filtering support (`--min-age` parameter) - Standard Python src-layout with uv for dependency management - Comprehensive test suite with pytest (integrated into CI) ## Project Structure ``` etc/scripts/gh_tool/ ├── src/gh_tool/ # Main package │ ├── cli.py # Click-based CLI interface │ └── duplicate_finder.py # Core duplicate detection logic ├── tests/ # Test suite │ └── test_duplicate_finder.py ├── docs/ # Documentation │ ├── TRIAGE-CRITERIA.md # Triage guidelines from manual review │ └── PHASE1-FINDINGS.md # Historical analysis of 855 issues ├── pyproject.toml # Package configuration └── README.md # Usage documentation ``` ## Usage ```bash cd etc/scripts/gh_tool uv sync uv run gh_tool find-duplicates /tmp/report.md ``` Options: - `--threshold FLOAT` - Similarity threshold 0-1 (default: 0.6) - `--state {all,open,closed}` - Issue state to check (default: open) - `--min-age DAYS` - Only check issues older than N days (default: 0) - `--limit INTEGER` - Maximum number of issues to fetch (default: 1000) - `--repo TEXT` - GitHub repository in owner/repo format (default: compiler-explorer/compiler-explorer) Example: ```bash # Find high-confidence duplicates in open issues uv run gh_tool find-duplicates /tmp/report.md --threshold 0.85 # Check all issues older than 30 days uv run gh_tool find-duplicates /tmp/report.md --state all --min-age 30 ``` ## Testing The tool includes comprehensive test coverage: - Unit tests for similarity calculation - Integration tests for duplicate detection - Edge case handling (transitive grouping, age filtering, threshold sensitivity) - Report generation validation Run tests: ```bash cd etc/scripts/gh_tool uv run pytest -v ``` Tests are integrated into CI and run on every push. ## Documentation - `README.md`: Complete usage guide with examples - `docs/TRIAGE-CRITERIA.md`: Comprehensive triage guidelines developed during manual review of 22+ issues - `docs/PHASE1-FINDINGS.md`: Historical analysis context from initial 855 issue review ## CI Integration The tool is integrated into the GitHub Actions workflow: - `uv` is installed via `astral-sh/setup-uv@v6` - Tests run automatically on every push - Ensures tool remains functional as codebase evolves ## Next Steps Future enhancements planned for follow-up PRs: - GitHub Action for automatic duplicate detection on new issues - Additional automation tools (upstream health checker, label validator, etc.) - Automated triage reports ## Changes in this PR - ✅ Core duplicate detection implementation - ✅ Comprehensive test suite (192 lines) - ✅ CI integration - ✅ Complete documentation - ✅ Example triage criteria and findings --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-07 14:50:22 -05:00

4 Commits