# How often to perform maintenance operations (merge, prune)
MAINTENANCE_INTERVAL = 40
# Strategy selection thresholds
STRATEGY_CREATION_THRESHOLD = 0.7 # Higher threshold to avoid creating similar strategies
STRATEGY_MERGING_THRESHOLD = 0.6 # Lower threshold to merge more similar strategies
MIN_SUCCESS_RATE_FOR_INFERENCE = 0.4 # Minimum success rate for a strategy to be used during inference
The configs are all defined here - https://github.com/codelion/optillm/blob/main/optillm/plugin...
The MXFP4 quantization detail might be the sleeper feature here. Getting 20B running on a 16 GB consumer card, or 120B on a single H100/MI300X without multi-GPU orchestration headaches, could be a bigger enabler for indie devs and researchers than raw benchmark deltas. A lot of experimentation never happens simply because the friction of getting the model loaded is too high.
One open question I’m curious about: given gpt-oss’s design bias toward reasoning (and away from encyclopedic recall), will we start seeing a formal split in open-weight model development—specialized “reasoners” that rely on tool use for facts, and “knowledge bases” tuned for retrieval-heavy work? That separation could change how we architect systems that wrap these models.