mirror of
https://github.com/compiler-explorer/compiler-explorer.git
synced 2025-12-27 07:04:04 -05:00
Add TOML migration proposal document
This commit adds a detailed proposal for migrating Compiler Explorer's configuration system from .properties files to TOML format. It includes: - Analysis of current configuration system limitations - Benefits of using TOML - An audit of array usage patterns in current config files - A phased migration approach - Sample conversions showing how properties map to TOML - Implementation considerations and timeline The proposal addresses issues raised in #7150 and #7341 regarding configuration readability and hierarchical inheritance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
352
docs/MigratingToToml.md
Normal file
352
docs/MigratingToToml.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Migrating Compiler Explorer to TOML
|
||||
|
||||
This document outlines a proposal for migrating Compiler Explorer's configuration system from the current `.properties`
|
||||
format to TOML (Tom's Obvious, Minimal Language).
|
||||
|
||||
## Current System Limitations
|
||||
|
||||
Our current properties-based configuration system has several limitations:
|
||||
|
||||
1. **Awkward List Handling**: The system relies on splitting strings on `:` and `|` characters, which is error-prone and
|
||||
inflexible. This creates very long, difficult-to-read lines in configuration files.
|
||||
|
||||
2. **Poor Group Inheritance**: As noted in
|
||||
issue [#7150](https://github.com/compiler-explorer/compiler-explorer/issues/7150), group properties don't properly
|
||||
cascade across environment files, making configuration reuse difficult.
|
||||
|
||||
3. **Limited Data Structure Support**: The system can only represent flat key-value pairs without proper support for
|
||||
nested structures, arrays, or tables.
|
||||
|
||||
4. **Readability Issues**: Long property values (particularly lists) become unwieldy and hard to maintain, as noted in
|
||||
issue [#7341](https://github.com/compiler-explorer/compiler-explorer/issues/7341).
|
||||
|
||||
5. **No Format Standard**: The `.properties` format lacks standardized tooling for validation, formatting, and editor
|
||||
support.
|
||||
|
||||
## Why TOML?
|
||||
|
||||
TOML (Tom's Obvious, Minimal Language) provides several advantages:
|
||||
|
||||
1. **Human-Friendly**: TOML is designed to be easy to read and write, with clear semantics.
|
||||
|
||||
2. **Native Array Support**: TOML has built-in support for arrays, eliminating the need for custom string splitting.
|
||||
|
||||
3. **Table Support**: TOML's tables provide natural grouping of related properties.
|
||||
|
||||
4. **Widely Adopted**: Many projects use TOML, resulting in good editor support and validation tools.
|
||||
|
||||
5. **Backward Compatible Path**: TOML's structure makes it possible to maintain compatibility with the existing
|
||||
codebase.
|
||||
|
||||
6. **Type System**: TOML natively supports strings, integers, floats, booleans, dates, and arrays.
|
||||
|
||||
7. **Multi-line Strings**: TOML supports multi-line strings, making long values more readable.
|
||||
|
||||
## Array Usage Audit
|
||||
|
||||
The current configuration system uses different separators for different types of arrays. Below is an audit of the
|
||||
common array patterns found in the configuration files:
|
||||
|
||||
### Colon-Separated (`:`) Arrays
|
||||
|
||||
| Property Pattern | Description | Example | Typical Length |
|
||||
|-----------------------|------------------------------|--------------------------------------------------------------------------------|------------------------------|
|
||||
| `compilers` | List of compiler IDs | `compilers=&gcc:&clang` | Short to Medium (5-15 items) |
|
||||
| `group.X.compilers` | List of compilers in a group | `group.gcc.compilers=g10:g11:gdefault` | Medium (5-15 items) |
|
||||
| `tools` | List of tool IDs | `tools=clangquerydefault:clangtidydefault:clangquery7:...` | Medium to Long (10-30 items) |
|
||||
| `libs` | List of library IDs | `libs=boost:eigen:gsl:...` | Very Long (20-50+ items) |
|
||||
| `group.X.includeFlag` | List of include paths | `group.libs.includeFlag=-isystem/path1:-isystem/path2` | Medium (3-10 items) |
|
||||
| `group.X.versions` | Available versions | `group.boost.versions=164:165:166:167:168:169:170:171:172:173:174:175` | Medium to Long (5-20 items) |
|
||||
| `group.X.options` | List of compiler options | `group.clang10.options=-std=c++98:-std=c++11:-std=c++14:-std=c++17:-std=c++20` | Medium (3-10 items) |
|
||||
|
||||
### Pipe-Separated (`|`) Arrays
|
||||
|
||||
| Property Pattern | Description | Example | Typical Length |
|
||||
|----------------------------|-------------------------|----------------------------------------------------------------------|-------------------|
|
||||
| `ldPath` | Library search paths | `ldPath=${exePath}/../lib\|${exePath}/../lib32\|${exePath}/../lib64` | Short (2-5 items) |
|
||||
| `compiler.X.demanglerArgs` | Arguments for demangler | `demanglerArgs=-n\|-C\|--no-verbose` | Short (2-5 items) |
|
||||
| `compiler.X.objdumperArgs` | Arguments for objdumper | `objdumperArgs=-d\|--no-show-raw-insn\|--no-leading-addr` | Short (2-5 items) |
|
||||
| `group.X.libPath` | Library paths | `libPath=/path/to/lib1\|/path/to/lib2` | Short (2-5 items) |
|
||||
|
||||
### Key Patterns and Usage Insights
|
||||
|
||||
1. **Command-line Arguments**: Pipe-separated (`|`) is consistently used for command-line arguments and path lists that
|
||||
might contain colons. This is because colons often appear in paths (especially on Windows) and in command-line
|
||||
options.
|
||||
|
||||
2. **Entity Lists**: Colon-separated (`:`) is used for lists of entity IDs like compilers, compiler versions, or tools.
|
||||
These lists tend to be longer and are the ones most in need of better readability.
|
||||
|
||||
3. **Length Patterns**:
|
||||
- Pipe-separated lists tend to be shorter (2-5 items)
|
||||
- Colon-separated lists are often longer, with some (like library lists) becoming extremely long and difficult to
|
||||
maintain
|
||||
|
||||
4. **Particularly Problematic Examples**:
|
||||
- The `tools` property in language configs often becomes very long
|
||||
- Library version lists (`group.X.versions`) are frequently long and hard to read
|
||||
- The libs property in production files can have dozens of entries on a single line
|
||||
|
||||
### Examples of Particularly Long Arrays
|
||||
|
||||
From `c++.amazon.properties`:
|
||||
|
||||
```properties
|
||||
tools=clangformat:clangquery:clangquerytrunk:clang-apply-replacements:clang-tidy:clang-tidy-13:clang-tidy-trunk:pahole:llvm-mcatrunk:readelf:strings:ldd:llvm-objdump:llvm-objdump-13:llvm-objdump-trunk:llvm-readobj:nm:llvm-cov-trunk:llvm-cov-13:include-what-you-use:include-what-you-use-trunk:llvm-dwarfdump-trunk:llvm-dwarfdump:x86to6502:sonarqube-gcc:sonarqube-clang:microsoft-analyzer:pvs-studio:objdump:readobj:nm-mp:llc:llc1_0:llc1_1:llc1_2:opt-trunk:bronto-trunk
|
||||
```
|
||||
|
||||
From `compiler-explorer.amazon.properties`:
|
||||
|
||||
```properties
|
||||
storageBucketSessions=compiler-explorer-sessions
|
||||
sessionsExpirationInDays=30:40:60:180:365
|
||||
```
|
||||
|
||||
From `c++.amazon.properties` (library versions):
|
||||
|
||||
```properties
|
||||
group.boost.versions=164:165:166:167:168:169:170:171:172:173:174:175:176:177:178:179:180:181:182:183
|
||||
```
|
||||
|
||||
## Migration Approach
|
||||
|
||||
### 1. Add TOML Support While Maintaining Backward Compatibility
|
||||
|
||||
1. Add a TOML parser dependency to the project.
|
||||
2. Create a new configuration loader that can read both TOML and properties files.
|
||||
3. Implement a compatibility layer that converts TOML structures to the current properties format internally.
|
||||
|
||||
### 2. Property Mapping Model
|
||||
|
||||
Properties would map from the current format to TOML as follows:
|
||||
|
||||
| Current Format | TOML Representation |
|
||||
|-----------------------------|------------------------------------------------|
|
||||
| `key=value` | `key = "value"` |
|
||||
| `key=true` | `key = true` |
|
||||
| `key=42` | `key = 42` |
|
||||
| `list=a:b:c` | `list = ["a", "b", "c"]` |
|
||||
| `args=a\|b\|c` | `args = ["a", "b", "c"]` |
|
||||
| `compiler.xyz.name=Foo` | `[compiler.xyz]`<br>`name = "Foo"` |
|
||||
| `group.abc.compilers=x:y:z` | `[group.abc]`<br>`compilers = ["x", "y", "z"]` |
|
||||
|
||||
### 3. Group References
|
||||
|
||||
The current `&group` syntax can be mapped to TOML as follows:
|
||||
|
||||
```toml
|
||||
# Current: compilers=&gcc:&clang
|
||||
compilers = ["&gcc", "&clang"]
|
||||
|
||||
[group.gcc]
|
||||
compilers = ["g7", "g8", "g9"]
|
||||
groupName = "GCC"
|
||||
```
|
||||
|
||||
### 4. Migration Process
|
||||
|
||||
1. **Phase 1: Dual Support**
|
||||
- Add TOML parser
|
||||
- Create property loader that reads both formats
|
||||
- Create TOML-to-properties converter
|
||||
- Add test suite to verify equivalent behavior
|
||||
|
||||
2. **Phase 2: Conversion of Existing Files**
|
||||
- Create a conversion script to transform .properties to .toml
|
||||
- Convert default configuration files first
|
||||
- Validate equivalence with test suite
|
||||
- Documentation update
|
||||
|
||||
3. **Phase 3: New Features**
|
||||
- Enhance properties system to leverage TOML's richer types
|
||||
- Improve group inheritance system
|
||||
- Add validation tools
|
||||
|
||||
4. **Phase 4: Complete Migration**
|
||||
- Deprecate .properties support
|
||||
- Full migration to TOML
|
||||
- Removal of legacy code
|
||||
|
||||
## Sample Conversions
|
||||
|
||||
### Example 1: Basic Compiler Configuration
|
||||
|
||||
**Current (.properties):**
|
||||
|
||||
```properties
|
||||
compiler.g11.exe=/usr/bin/g++-11
|
||||
compiler.g11.name=g++ 11.x
|
||||
compiler.g11.options=-Wall -Wextra
|
||||
compiler.g11.supportsBinary=true
|
||||
```
|
||||
|
||||
**TOML:**
|
||||
|
||||
```toml
|
||||
[compiler.g11]
|
||||
exe = "/usr/bin/g++-11"
|
||||
name = "g++ 11.x"
|
||||
options = "-Wall -Wextra"
|
||||
supportsBinary = true
|
||||
```
|
||||
|
||||
### Example 2: Group Configuration
|
||||
|
||||
**Current (.properties):**
|
||||
|
||||
```properties
|
||||
compilers=&gcc:&clang
|
||||
group.gcc.compilers=g10:g11:gdefault
|
||||
group.gcc.groupName=GCC
|
||||
group.gcc.compilerType=gcc
|
||||
group.clang.compilers=clang11:clang12:clangdefault
|
||||
group.clang.intelAsm=-mllvm --x86-asm-syntax=intel
|
||||
group.clang.compilerType=clang
|
||||
```
|
||||
|
||||
**TOML:**
|
||||
|
||||
```toml
|
||||
compilers = ["&gcc", "&clang"]
|
||||
|
||||
[group.gcc]
|
||||
compilers = ["g10", "g11", "gdefault"]
|
||||
groupName = "GCC"
|
||||
compilerType = "gcc"
|
||||
|
||||
[group.clang]
|
||||
compilers = ["clang11", "clang12", "clangdefault"]
|
||||
intelAsm = "-mllvm --x86-asm-syntax=intel"
|
||||
compilerType = "clang"
|
||||
```
|
||||
|
||||
### Example 3: Tool Configuration with Long Lists
|
||||
|
||||
**Current (.properties):**
|
||||
|
||||
```properties
|
||||
tools=clangquerydefault:clangtidydefault:clangquery7:clangquery8:clangquery9:clangquery10:clangquery11:clangquery12:strings:ldd:readelf:nm:llvmdwarfdumpdefault
|
||||
tools.clangquerydefault.exe=/usr/bin/clang-query
|
||||
tools.clangquerydefault.name=clang-query (default)
|
||||
tools.clangquerydefault.type=independent
|
||||
tools.clangquerydefault.class=clang-query-tool
|
||||
tools.clangquerydefault.stdinHint=Query commands
|
||||
```
|
||||
|
||||
**TOML:**
|
||||
|
||||
```toml
|
||||
tools = [
|
||||
"clangquerydefault", "clangtidydefault",
|
||||
"clangquery7", "clangquery8", "clangquery9",
|
||||
"clangquery10", "clangquery11", "clangquery12",
|
||||
"strings", "ldd", "readelf", "nm", "llvmdwarfdumpdefault"
|
||||
]
|
||||
|
||||
[tools.clangquerydefault]
|
||||
exe = "/usr/bin/clang-query"
|
||||
name = "clang-query (default)"
|
||||
type = "independent"
|
||||
class = "clang-query-tool"
|
||||
stdinHint = "Query commands"
|
||||
```
|
||||
|
||||
### Example 4: Command Arguments
|
||||
|
||||
**Current (.properties):**
|
||||
|
||||
```properties
|
||||
compiler.clang.demanglerArgs=-n|-C|--no-verbose
|
||||
```
|
||||
|
||||
**TOML:**
|
||||
|
||||
```toml
|
||||
[compiler.clang]
|
||||
demanglerArgs = ["-n", "-C", "--no-verbose"]
|
||||
```
|
||||
|
||||
### Example 5: Library Path with Variable Substitution
|
||||
|
||||
**Current (.properties):**
|
||||
|
||||
```properties
|
||||
ldPath=${exePath}/../lib|${exePath}/../lib32|${exePath}/../lib64
|
||||
```
|
||||
|
||||
**TOML:**
|
||||
|
||||
```toml
|
||||
ldPath = ["${exePath}/../lib", "${exePath}/../lib32", "${exePath}/../lib64"]
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### TOML Parsing Library
|
||||
|
||||
For TypeScript/JavaScript, we can use one of:
|
||||
|
||||
- **@iarna/toml**: Full-featured TOML v1.0.0 parser with good TypeScript support
|
||||
- **@ltd/j-toml**: TOML v1.0.0 parser with good performance
|
||||
- **toml**: Simple TOML parser that's widely used
|
||||
|
||||
### Configuration System Changes
|
||||
|
||||
1. **Properties Loader**: Modify `properties.ts` to support both formats
|
||||
2. **Interface Adapters**: Create adapter layer to normalize between formats
|
||||
3. **Test Cases**: Create comprehensive tests to verify equivalence
|
||||
|
||||
### Property Resolution Logic
|
||||
|
||||
The hierarchical cascade system would remain largely unchanged, but would be enhanced to:
|
||||
|
||||
1. Better handle group property inheritance across files
|
||||
2. Leverage TOML's native types
|
||||
3. Provide clearer error messages for configuration issues
|
||||
|
||||
## Pros and Cons
|
||||
|
||||
### Pros
|
||||
|
||||
1. **Better Readability**: TOML's structure makes configuration more readable and maintainable
|
||||
2. **Native Array Support**: Eliminates custom string parsing for lists
|
||||
3. **Improved Structure**: Better organization of related settings
|
||||
4. **Editor Support**: Better tooling and syntax highlighting
|
||||
5. **Type Safety**: TOML's type system helps catch configuration errors
|
||||
6. **Standardization**: Using a standard format improves maintainability
|
||||
|
||||
### Cons
|
||||
|
||||
1. **Migration Effort**: Requires converting all configuration files
|
||||
2. **Learning Curve**: Team members must learn TOML (though it's designed to be simple)
|
||||
3. **Backward Compatibility**: Need to maintain both parsers during transition
|
||||
4. **Custom Logic**: Some CE-specific features (like `&group` references) still need custom handling
|
||||
|
||||
## Timeline and Resources
|
||||
|
||||
1. **Planning & Design**: 1-2 weeks
|
||||
- Finalize conversion specifications
|
||||
- Create test plan and compatibility tests
|
||||
|
||||
2. **Implementation**: 2-3 weeks
|
||||
- Implement parser integration
|
||||
- Create conversion utilities
|
||||
- Update documentation
|
||||
|
||||
3. **Testing & Validation**: 1-2 weeks
|
||||
- Test with various configuration scenarios
|
||||
- Verify backward compatibility
|
||||
|
||||
4. **Rollout**: Phased approach
|
||||
- Convert default configuration files
|
||||
- Allow users to opt-in to TOML
|
||||
- Eventually deprecate .properties
|
||||
|
||||
## Conclusion
|
||||
|
||||
Migrating to TOML offers significant benefits for maintainability and readability of Compiler Explorer's configuration.
|
||||
The structured approach outlined above allows for a smooth transition while maintaining backward compatibility, with
|
||||
clear improvements for both users and developers maintaining the configuration files.
|
||||
|
||||
The migration addresses the specific issues raised in #7150 and #7341, providing a more robust and flexible
|
||||
configuration system that can grow with Compiler Explorer's needs.
|
||||
Reference in New Issue
Block a user