This commit adds a detailed proposal for migrating Compiler Explorer's configuration system from .properties files to TOML format. It includes: - Analysis of current configuration system limitations - Benefits of using TOML - An audit of array usage patterns in current config files - A phased migration approach - Sample conversions showing how properties map to TOML - Implementation considerations and timeline The proposal addresses issues raised in #7150 and #7341 regarding configuration readability and hierarchical inheritance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
Migrating Compiler Explorer to TOML
This document outlines a proposal for migrating Compiler Explorer's configuration system from the current .properties
format to TOML (Tom's Obvious, Minimal Language).
Current System Limitations
Our current properties-based configuration system has several limitations:
-
Awkward List Handling: The system relies on splitting strings on
:and|characters, which is error-prone and inflexible. This creates very long, difficult-to-read lines in configuration files. -
Poor Group Inheritance: As noted in issue #7150, group properties don't properly cascade across environment files, making configuration reuse difficult.
-
Limited Data Structure Support: The system can only represent flat key-value pairs without proper support for nested structures, arrays, or tables.
-
Readability Issues: Long property values (particularly lists) become unwieldy and hard to maintain, as noted in issue #7341.
-
No Format Standard: The
.propertiesformat lacks standardized tooling for validation, formatting, and editor support.
Why TOML?
TOML (Tom's Obvious, Minimal Language) provides several advantages:
-
Human-Friendly: TOML is designed to be easy to read and write, with clear semantics.
-
Native Array Support: TOML has built-in support for arrays, eliminating the need for custom string splitting.
-
Table Support: TOML's tables provide natural grouping of related properties.
-
Widely Adopted: Many projects use TOML, resulting in good editor support and validation tools.
-
Backward Compatible Path: TOML's structure makes it possible to maintain compatibility with the existing codebase.
-
Type System: TOML natively supports strings, integers, floats, booleans, dates, and arrays.
-
Multi-line Strings: TOML supports multi-line strings, making long values more readable.
Array Usage Audit
The current configuration system uses different separators for different types of arrays. Below is an audit of the common array patterns found in the configuration files:
Colon-Separated (:) Arrays
| Property Pattern | Description | Example | Typical Length |
|---|---|---|---|
compilers |
List of compiler IDs | compilers=&gcc:&clang |
Short to Medium (5-15 items) |
group.X.compilers |
List of compilers in a group | group.gcc.compilers=g10:g11:gdefault |
Medium (5-15 items) |
tools |
List of tool IDs | tools=clangquerydefault:clangtidydefault:clangquery7:... |
Medium to Long (10-30 items) |
libs |
List of library IDs | libs=boost:eigen:gsl:... |
Very Long (20-50+ items) |
group.X.includeFlag |
List of include paths | group.libs.includeFlag=-isystem/path1:-isystem/path2 |
Medium (3-10 items) |
group.X.versions |
Available versions | group.boost.versions=164:165:166:167:168:169:170:171:172:173:174:175 |
Medium to Long (5-20 items) |
group.X.options |
List of compiler options | group.clang10.options=-std=c++98:-std=c++11:-std=c++14:-std=c++17:-std=c++20 |
Medium (3-10 items) |
Pipe-Separated (|) Arrays
| Property Pattern | Description | Example | Typical Length |
|---|---|---|---|
ldPath |
Library search paths | ldPath=${exePath}/../lib|${exePath}/../lib32|${exePath}/../lib64 |
Short (2-5 items) |
compiler.X.demanglerArgs |
Arguments for demangler | demanglerArgs=-n|-C|--no-verbose |
Short (2-5 items) |
compiler.X.objdumperArgs |
Arguments for objdumper | objdumperArgs=-d|--no-show-raw-insn|--no-leading-addr |
Short (2-5 items) |
group.X.libPath |
Library paths | libPath=/path/to/lib1|/path/to/lib2 |
Short (2-5 items) |
Key Patterns and Usage Insights
-
Command-line Arguments: Pipe-separated (
|) is consistently used for command-line arguments and path lists that might contain colons. This is because colons often appear in paths (especially on Windows) and in command-line options. -
Entity Lists: Colon-separated (
:) is used for lists of entity IDs like compilers, compiler versions, or tools. These lists tend to be longer and are the ones most in need of better readability. -
Length Patterns:
- Pipe-separated lists tend to be shorter (2-5 items)
- Colon-separated lists are often longer, with some (like library lists) becoming extremely long and difficult to maintain
-
Particularly Problematic Examples:
- The
toolsproperty in language configs often becomes very long - Library version lists (
group.X.versions) are frequently long and hard to read - The libs property in production files can have dozens of entries on a single line
- The
Examples of Particularly Long Arrays
From c++.amazon.properties:
tools=clangformat:clangquery:clangquerytrunk:clang-apply-replacements:clang-tidy:clang-tidy-13:clang-tidy-trunk:pahole:llvm-mcatrunk:readelf:strings:ldd:llvm-objdump:llvm-objdump-13:llvm-objdump-trunk:llvm-readobj:nm:llvm-cov-trunk:llvm-cov-13:include-what-you-use:include-what-you-use-trunk:llvm-dwarfdump-trunk:llvm-dwarfdump:x86to6502:sonarqube-gcc:sonarqube-clang:microsoft-analyzer:pvs-studio:objdump:readobj:nm-mp:llc:llc1_0:llc1_1:llc1_2:opt-trunk:bronto-trunk
From compiler-explorer.amazon.properties:
storageBucketSessions=compiler-explorer-sessions
sessionsExpirationInDays=30:40:60:180:365
From c++.amazon.properties (library versions):
group.boost.versions=164:165:166:167:168:169:170:171:172:173:174:175:176:177:178:179:180:181:182:183
Migration Approach
1. Add TOML Support While Maintaining Backward Compatibility
- Add a TOML parser dependency to the project.
- Create a new configuration loader that can read both TOML and properties files.
- Implement a compatibility layer that converts TOML structures to the current properties format internally.
2. Property Mapping Model
Properties would map from the current format to TOML as follows:
| Current Format | TOML Representation |
|---|---|
key=value |
key = "value" |
key=true |
key = true |
key=42 |
key = 42 |
list=a:b:c |
list = ["a", "b", "c"] |
args=a|b|c |
args = ["a", "b", "c"] |
compiler.xyz.name=Foo |
[compiler.xyz]name = "Foo" |
group.abc.compilers=x:y:z |
[group.abc]compilers = ["x", "y", "z"] |
3. Group References
The current &group syntax can be mapped to TOML as follows:
# Current: compilers=&gcc:&clang
compilers = ["&gcc", "&clang"]
[group.gcc]
compilers = ["g7", "g8", "g9"]
groupName = "GCC"
4. Migration Process
-
Phase 1: Dual Support
- Add TOML parser
- Create property loader that reads both formats
- Create TOML-to-properties converter
- Add test suite to verify equivalent behavior
-
Phase 2: Conversion of Existing Files
- Create a conversion script to transform .properties to .toml
- Convert default configuration files first
- Validate equivalence with test suite
- Documentation update
-
Phase 3: New Features
- Enhance properties system to leverage TOML's richer types
- Improve group inheritance system
- Add validation tools
-
Phase 4: Complete Migration
- Deprecate .properties support
- Full migration to TOML
- Removal of legacy code
Sample Conversions
Example 1: Basic Compiler Configuration
Current (.properties):
compiler.g11.exe=/usr/bin/g++-11
compiler.g11.name=g++ 11.x
compiler.g11.options=-Wall -Wextra
compiler.g11.supportsBinary=true
TOML:
[compiler.g11]
exe = "/usr/bin/g++-11"
name = "g++ 11.x"
options = "-Wall -Wextra"
supportsBinary = true
Example 2: Group Configuration
Current (.properties):
compilers=&gcc:&clang
group.gcc.compilers=g10:g11:gdefault
group.gcc.groupName=GCC
group.gcc.compilerType=gcc
group.clang.compilers=clang11:clang12:clangdefault
group.clang.intelAsm=-mllvm --x86-asm-syntax=intel
group.clang.compilerType=clang
TOML:
compilers = ["&gcc", "&clang"]
[group.gcc]
compilers = ["g10", "g11", "gdefault"]
groupName = "GCC"
compilerType = "gcc"
[group.clang]
compilers = ["clang11", "clang12", "clangdefault"]
intelAsm = "-mllvm --x86-asm-syntax=intel"
compilerType = "clang"
Example 3: Tool Configuration with Long Lists
Current (.properties):
tools=clangquerydefault:clangtidydefault:clangquery7:clangquery8:clangquery9:clangquery10:clangquery11:clangquery12:strings:ldd:readelf:nm:llvmdwarfdumpdefault
tools.clangquerydefault.exe=/usr/bin/clang-query
tools.clangquerydefault.name=clang-query (default)
tools.clangquerydefault.type=independent
tools.clangquerydefault.class=clang-query-tool
tools.clangquerydefault.stdinHint=Query commands
TOML:
tools = [
"clangquerydefault", "clangtidydefault",
"clangquery7", "clangquery8", "clangquery9",
"clangquery10", "clangquery11", "clangquery12",
"strings", "ldd", "readelf", "nm", "llvmdwarfdumpdefault"
]
[tools.clangquerydefault]
exe = "/usr/bin/clang-query"
name = "clang-query (default)"
type = "independent"
class = "clang-query-tool"
stdinHint = "Query commands"
Example 4: Command Arguments
Current (.properties):
compiler.clang.demanglerArgs=-n|-C|--no-verbose
TOML:
[compiler.clang]
demanglerArgs = ["-n", "-C", "--no-verbose"]
Example 5: Library Path with Variable Substitution
Current (.properties):
ldPath=${exePath}/../lib|${exePath}/../lib32|${exePath}/../lib64
TOML:
ldPath = ["${exePath}/../lib", "${exePath}/../lib32", "${exePath}/../lib64"]
Implementation Details
TOML Parsing Library
For TypeScript/JavaScript, we can use one of:
- @iarna/toml: Full-featured TOML v1.0.0 parser with good TypeScript support
- @ltd/j-toml: TOML v1.0.0 parser with good performance
- toml: Simple TOML parser that's widely used
Configuration System Changes
- Properties Loader: Modify
properties.tsto support both formats - Interface Adapters: Create adapter layer to normalize between formats
- Test Cases: Create comprehensive tests to verify equivalence
Property Resolution Logic
The hierarchical cascade system would remain largely unchanged, but would be enhanced to:
- Better handle group property inheritance across files
- Leverage TOML's native types
- Provide clearer error messages for configuration issues
Pros and Cons
Pros
- Better Readability: TOML's structure makes configuration more readable and maintainable
- Native Array Support: Eliminates custom string parsing for lists
- Improved Structure: Better organization of related settings
- Editor Support: Better tooling and syntax highlighting
- Type Safety: TOML's type system helps catch configuration errors
- Standardization: Using a standard format improves maintainability
Cons
- Migration Effort: Requires converting all configuration files
- Learning Curve: Team members must learn TOML (though it's designed to be simple)
- Backward Compatibility: Need to maintain both parsers during transition
- Custom Logic: Some CE-specific features (like
&groupreferences) still need custom handling
Timeline and Resources
-
Planning & Design: 1-2 weeks
- Finalize conversion specifications
- Create test plan and compatibility tests
-
Implementation: 2-3 weeks
- Implement parser integration
- Create conversion utilities
- Update documentation
-
Testing & Validation: 1-2 weeks
- Test with various configuration scenarios
- Verify backward compatibility
-
Rollout: Phased approach
- Convert default configuration files
- Allow users to opt-in to TOML
- Eventually deprecate .properties
Conclusion
Migrating to TOML offers significant benefits for maintainability and readability of Compiler Explorer's configuration. The structured approach outlined above allows for a smooth transition while maintaining backward compatibility, with clear improvements for both users and developers maintaining the configuration files.
The migration addresses the specific issues raised in #7150 and #7341, providing a more robust and flexible configuration system that can grow with Compiler Explorer's needs.