CCL: Categorical Configuration Language

That's a lot of words to describe a very simple syntax: name=value pairs, with line continuation using whitespace.

Basically RFC822 email headers, or Debian Control File Format [0] but with "=" instead of ":", and without dedicated comment character.

The biggest problem with this format is that a lot of things are left for the app, so each app will have its own way to implement lists, bools, line wrap support.. Even something like "value override" is left to program implementation. Don't expect YAML/JSON/XML-style automated validators/linters, each program will need its own bespoke parser/generator.

[0] https://www.debian.org/doc/debian-policy/ch-controlfields.ht...

atoav · 8 months ago

Not to discredit the author, there are some smart thoughts in there... but I can't help but feel like: yeah of course this is very elegant — but the complexity is not gone, it is elsewhere. And they are not showing that elsewhere.

Namely the parsing code.

earnestinger · 8 months ago

Yup.

If simplicity syntax is the only goal, we can take one step further.

I present you with, EP - easy properties: Any UTF8 encoded file is valid configuration file. You’re the boss, not the language. You can concatenate two files, you can add comments in format you want, you can choose any syntax you want!

(Somebody will need to parse it eventually though)

sn9 · 8 months ago

You can literally just look at it: https://github.com/chshersh/ccl/tree/main/lib

diggan · 8 months ago

> The biggest problem with this format is that a lot of things are left for the app, so each app will have its own way to implement lists, bools, line wrap support

That seems to be one of the explicit goals:

> Configuration is specific to a particular application. What you want is to follow the rule of the least surprise and utility functions to parse strings.

Since configuration is specific to a particular program, so should the configuration, seems to be what the author is getting at.

Personally, what puts me off this particular configuration language is this part, hidden behind collapsed text:

> In fact, CCL is indentation-sensitive.

Programming/configuring stuff with invisible characters isn't my idea of fun, and it sounds especially cumbersome if everyone is using it differently, since the configuration language leaves a lot up to the users of the configuration.

cies · 8 months ago

I think indentation sensitivity is very well suited for configs: you want little line noise and the complexity is low. I do understand the trade-off TOML made in this case.

Some languages prohibit the TAB character, and only allow spaces at the start of the line in groups of 2 or 4: so it is always clear how indentation is to be understood.

Everyone is entitled to scratch their own itch, but this seems like the most useless configuration language I've ever seen.

Take the "fixed point" example, where you have a boolean setting which one file says should should be "yes" and the other says it should be "no" and the language semantics composes that into a list with both values. For what boolean setting does this make sense?

The article says "Overrides are not a problem because you keep both values. And you can decide what to do with them: keep only the first, keep only the last or use some smart logic to combine both of them. You’re the boss."

If you need custom logic in your application determine the setting to use, how is this language helping you?

evujumenuk · 8 months ago

I think this is probably the best place within these comments to note that one thing some people expect of a configuration format is to be able to hide information from the consuming piece of software.

Normally, it is often useful for a program to receive all the configuration from all sources. ("This flag is normally set to TRUE, has been set to FALSE on this system, has been set to TRUE by the user, and now there's an environment variable that says one thing and a command line flag that says something else.") Sometimes, integrating several incoherent settings into one is dependent on its consumer, or even the setting itself. Sometimes, you would like to be able to debug how different settings interact with one another. Sometimes, different settings can be merged without issue.

CCL exposes everything to the program receiving the config, which is something (some) people seem to abhor. I can see how wanting to hide information can be both useful and detrimental, so I'm wondering if this issue is actually orthogonal to configuration languages, meaning CCL, and others, shouldn't even concern themselves with it.

cies · 8 months ago

Reading this I think of all the programming languages that comments with whole languages inside of them. That is beyond the complex documentation I found.

ulbu · 8 months ago

you apply another monoid operation.

one possible one is to return first or last element. makes sense in layered configuration of, for example, a text editor, where you might override a colour.

another possible one is to return error on duplicate. makes sense in flat configuration of, for example, a build system.

your application knows which operation fits its intended structure. your application documents the behaviour, just as it normally would.

theamk · 8 months ago

mightyham · 8 months ago

I really like the conciseness of this syntax. The language seems very well thought through.

That being said, I've been working with NixOS recently and it's made me reconsider what is useful for a configuration language. In many reasonably large software projects, where configs become very complex, config reuse (in other words templating or meta-configuration) becomes an increasingly helpful feature. Nix configs are great because it's not just a config, but a full blown purely functional language for manipulating the config. It's intuitive and powerful once you get the hang of it, and I sometimes find myself wishing I could use it when I have to work with yaml, json, etc.

one-punch · 8 months ago

You might be interested in nickel (https://nickel-lang.org/), which is a modern take on configuration management based on the experience of Nix/NixOS configurations: purely functional configuration, built-in validation (types & contracts), reusable (functions, modules, defaults), and in addition exports to Yaml, Json, etc.

To integrate nickel with nix, see how organist (https://github.com/nickel-lang/organist) does DevShell management.

nickm12 · 8 months ago

4ad · 8 months ago

Compositionality is paramount and category theory guarantees compositionality, but the author's criteria for what entails a good configuration language are woefully naïve.

Configuration is not about describing data, it's about control. Control over a system made of impure, effectful parts.

Configuration is a matter or programming a mutable computer, i.e. a way to specify the composition of effects that you want.

The configuration language is agnostic over the systems it controls, therefore it must provide semantics that preserve morphism in any of its interpetations. The language must be rich enough to accomodate for this. It is not enough to have one semantics.

Moreover, it must be rich enough to describe its own models. Yes, the interpretation of it by arbitrary systems must be expressible in the language itself in order to be meaningful and to preserve consistency with regard of its interpretations. In practice, this is done through types.

Additionally, configuration is a global activity, it's applied to the whole system, with many people changing conflicting aspects of it. Just like with any large evolving program, abstraction and typing are required for software engineering reasons alone.

Coincidentally, CUE is also a monoid, but it is more than that, it is a complete Heyting algebra (or a complete Boolean algebra in the case of closed world assumption), these objects also form very rich categories.

Another way to look at CUE is to view it as a semantic domain for the denotation of arbitrary types of arbitrary languages. It's suitable for this because it's a coherence space (Girard). All CUE operations are closed, preserving the structure of the space.

One interesting aspect of author's effort is that even if he was so naïve, category theory led him to a path that is correct. What he did is incomplete, a monoid does not suffice for a configuration language, but a monoid is required. This is saying something.

teleforce · 7 months ago

This is a very insightful comments +1

trelliscoded · 8 months ago

The equal sign is a required character for anything base64 encoded, which includes some things you’d expect to be in a config file, like ssh public keys and x509 certs.

efitz · 8 months ago

“Data” has syntax (structure), semantics (meaning), and often needs references (to other parts of itself or other data).

There does not exist a perfect configuration language because whether and to what extent each of these capabilities are supported is a subjective trade-off, and reasonable people with different problems might reasonably want different trade-offs.

I like config languages that allow variables and references, so that eg if I change the root path, I just have to change the $ROOT variable near the start of the file and 20 other sub-paths just reference the new $ROOT.

I also like semantics with my syntax, because lots of time I care about dstip but not srcip or vice versa; IP lets me parse for accuracy but not for meaning/usage.

I hate encoding meaning in whitespace; it trades away robustness in duplication in favor of being more human readable. This probably comes from lots of NNTP and XMODEM and 7-bit ASCII battle scars. But reasonable people can disagree.

On the other hand, I think it is a valuable learning exercise to write your own DSL for some common problem space and share it, IF you listen to and internalize the feedback others write about it rather than just filter out anything that isn’t adulation.

Bost · 8 months ago

Have a look at "Code is Data / Data is Code" https://en.m.wikipedia.org/wiki/Homoiconicity And then see how it's done in real life: https://guix.gnu.org/

Nice! I really like a fresh take on anything.

It's been said to be like RFC822 or Debian Control File Format in the comments here, I'd like to add like x-www-form-urlencoded. At work I use this a lot as it is what browsers submit. It's List<String, List<String>>, so keys may occur more than once. We standardized on little language for the keys that allows us to submit structured forms. (Many libraries prescribe a language for this, Rails does too; our keys look like ".location.space[2].name" for "{location:{space:[null,null,{name=VALUE_AS_STRING}]}}" in json).

Some years ago I wrote a TOML parser in Haskell. Because parsers a fun to write in Haskell, and I needed one.

Since we deploy with AWS/Fargate (Docker) the config is passed as JSON k-v pairs that are then set as ENV VARs in the container (following one of those 12factor principles). So it seems I cannot dictate the config file format.