This PR adds a new Python CLI tool for automating GitHub repository
management tasks.
## Overview
The initial implementation provides duplicate issue detection using text
similarity analysis. This is the first step toward automating repository
triage tasks.
## Features
- **Click-based CLI** with subcommands for future extensibility
- **find-duplicates command** for detecting duplicate issues using text
similarity
- Uses **gh CLI** for GitHub API access (no token management needed)
- Text similarity using `difflib.SequenceMatcher` (ratio-based
algorithm)
- Configurable similarity threshold (default: 0.6)
- Progress bar for long-running comparisons
- Age filtering support (`--min-age` parameter)
- Standard Python src-layout with **uv** for dependency management
- **Comprehensive test suite** with pytest (integrated into CI)
## Project Structure
```
etc/scripts/gh_tool/
├── src/gh_tool/ # Main package
│ ├── cli.py # Click-based CLI interface
│ └── duplicate_finder.py # Core duplicate detection logic
├── tests/ # Test suite
│ └── test_duplicate_finder.py
├── docs/ # Documentation
│ ├── TRIAGE-CRITERIA.md # Triage guidelines from manual review
│ └── PHASE1-FINDINGS.md # Historical analysis of 855 issues
├── pyproject.toml # Package configuration
└── README.md # Usage documentation
```
## Usage
```bash
cd etc/scripts/gh_tool
uv sync
uv run gh_tool find-duplicates /tmp/report.md
```
**Options:**
- `--threshold FLOAT` - Similarity threshold 0-1 (default: 0.6)
- `--state {all,open,closed}` - Issue state to check (default: open)
- `--min-age DAYS` - Only check issues older than N days (default: 0)
- `--limit INTEGER` - Maximum number of issues to fetch (default: 1000)
- `--repo TEXT` - GitHub repository in owner/repo format (default:
compiler-explorer/compiler-explorer)
**Example:**
```bash
# Find high-confidence duplicates in open issues
uv run gh_tool find-duplicates /tmp/report.md --threshold 0.85
# Check all issues older than 30 days
uv run gh_tool find-duplicates /tmp/report.md --state all --min-age 30
```
## Testing
The tool includes comprehensive test coverage:
- Unit tests for similarity calculation
- Integration tests for duplicate detection
- Edge case handling (transitive grouping, age filtering, threshold
sensitivity)
- Report generation validation
**Run tests:**
```bash
cd etc/scripts/gh_tool
uv run pytest -v
```
Tests are integrated into CI and run on every push.
## Documentation
- **`README.md`**: Complete usage guide with examples
- **`docs/TRIAGE-CRITERIA.md`**: Comprehensive triage guidelines
developed during manual review of 22+ issues
- **`docs/PHASE1-FINDINGS.md`**: Historical analysis context from
initial 855 issue review
## CI Integration
The tool is integrated into the GitHub Actions workflow:
- `uv` is installed via `astral-sh/setup-uv@v6`
- Tests run automatically on every push
- Ensures tool remains functional as codebase evolves
## Next Steps
Future enhancements planned for follow-up PRs:
- GitHub Action for automatic duplicate detection on new issues
- Additional automation tools (upstream health checker, label validator,
etc.)
- Automated triage reports
## Changes in this PR
- ✅ Core duplicate detection implementation
- ✅ Comprehensive test suite (192 lines)
- ✅ CI integration
- ✅ Complete documentation
- ✅ Example triage criteria and findings
---------
Co-authored-by: Claude <noreply@anthropic.com>
There's nothing special with these anymore. They are the same both
locally and in CI now, because our Vitest and Biome setups don't behave
differently there
[](https://renovatebot.com)
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [actions/setup-node](https://togithub.com/actions/setup-node) | action
| major | `v3` -> `v4` |
---
### Release Notes
<details>
<summary>actions/setup-node (actions/setup-node)</summary>
### [`v4`](https://togithub.com/actions/setup-node/compare/v3...v4)
[Compare
Source](https://togithub.com/actions/setup-node/compare/v3...v4)
</details>
---
### Configuration
📅 **Schedule**: Branch creation - "before 8pm on monday" in timezone
America/Chicago, Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/compiler-explorer/compiler-explorer).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNzMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjE3My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[](https://renovatebot.com)
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [actions/github-script](https://togithub.com/actions/github-script) |
action | major | `v6` -> `v7` |
---
### Release Notes
<details>
<summary>actions/github-script (actions/github-script)</summary>
### [`v7`](https://togithub.com/actions/github-script/compare/v6...v7)
[Compare
Source](https://togithub.com/actions/github-script/compare/v6...v7)
</details>
---
### Configuration
📅 **Schedule**: Branch creation - "before 8pm on monday" in timezone
America/Chicago, Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/compiler-explorer/compiler-explorer).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNzMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjE3My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
- Update the actions in the GitHub Actions workflows to their latest versions.
- Run the workflows on the latest versions of the platforms where appropriate.
- Set the `check-latest` field of the `actions/setup-node` action to `true` to use the latest release of the specified Node.js version instead of using the cached one when there's a new release available.
- Explicitly specify the action versions instead of using the `latest` tag.
- Renames orphancompiler.py script to propscheck.py, the name fits better
- Fixes Go props to make them easier to maintain
- Alphabetically sorts some listings in the code
- Adds tests to the props checker script (No idea how to integrate those so it's manual for now)
- Adds 2 missing limit cases for Motd filtering
* The Grand Reformat
- everything made prettier...literally
- some tweaks to include a few more files, including documentation
- minor changes to format style
- some tiny `// prettier-ignore` changes to keep a few things the way we like them
- a couple of super minor tweaks to embedded document types to ensure they format correctly
* Group some files to their own folders
In etc/scripts/, added disasms/, docenizers/, and util/ folders
In lib/, added mapfiles/, and parsers/ folders (+moved google.js to
shortener)
In static/, added widgets/ folder
Added cypress folder to .gitignore
* Address Matt's PR reviews
* Move new Pane renaming to folder
* New build and deploy setup
* kills a lot of makefile complexity
* users will only ever use node-env=development for all practical
purposes. Hopefully fixes issues where something node_modules
is missing non-dev dependencies
* script building now pulled into a bash file