Melody – a language that compiles to regular expressions

digit: charset "1234567890" if parse "0.2.5" [ opt "v" copy major [some digit] "." copy minor [some digit] "." copy patch [some digit] ][ print [major minor patch] ]

I made a wiki page for similar alternative regex syntax projects:

https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...

(including my own https://www.oilshell.org/release/latest/doc/eggex.html which is built into Oil)

Banana699 · 4 years ago

You might want to add Wolfram's Language String Patterns[1].

Perl 6 (which is now actually called Raku, for what it's worth) has BNF-style Grammars as a first class citizen of the language (a virtually unheard of thing AFAIK), and programmers are encouraged to use it for complex parsing tasks instead of regex. I don't know whether that falls under "alternative syntax for regex", they are very closely related to regexes and actually much more powerful and readable, including regexes as a subset. But adding them might drag you into adding every Context Free\Parsing Expression Grammar tool out there, things like Bison and YACC and Antlr.

The language SNOBOL[2][3][4] is one of the earliest languages with text matching and processing as a first class citizen (in fact, the only citizen). Being designed (1962) before the first software implementation of regular expressions* (1968), the pattern language it uses is not based on regular expression, and in some cases actually exceeds it (e.g. matching balanced parenthesis, which mathmatical regex can't do, but some variants of practical regexes can with special non-regular constructs).

This thread[5] in Retrocomputing stack exchange discusses what is the earliest language with string pattern matching capabilities, and finds hidden gems in the process.

[1] https://reference.wolfram.com/language/guide/StringPatterns....

[2] https://en.wikipedia.org/wiki/SNOBOL

[3] https://dl.acm.org/doi/10.1145/800025.1198417

[4] https://www.snobol4.org/

[5] https://retrocomputing.stackexchange.com/questions/658/what-...

* : As opposed to the mathematical definition of regular expression, which dates back to 1956 and perhaps even before.

chubot · 4 years ago

Thanks for the links, I added Wolfram.

I think SNOBOL might count, but it's a bit different in that I think it's a "dead" language now? e.g. it doesn't appear to have a user base or implementations. But feel free to add it or anything else if there are good links.

There's definitely a fuzzy line between things like LPeg or Rosie and YACC/ANTLR ... it's less about the power of the language and what kind of tasks people use it for, I suppose. If it's "scripting friendly".

nefitty · 4 years ago

That's so weird. I literally just started my own exploration of alternative Regex syntax this morning. The simulation rears her head again. The Dude abides.

Some thoughts I had for the developer:

Does it (or are the plans to) reverse compile? If I could input my regex and output melody script one could create an excellent interactive learning tool, and also more selfishly help with adoption in teams with crusty old devs like me who like our magic rituals and prefer typing our regex by hand.

Also are there plans to support runtime compiling in JS? Something like...

someMelodyObject = <initialise and configure melody> String.replace(someMelodyObject.toRegexp(), someString)

This I think would make it a compelling library for inclusion into projects assuming it were fairly efficient and lightweight. Not sure how or if you'd have to deal with performance and caching but it would probably go a ways to improving adoption among web developers at least.

Anyways good luck with the project. Regex is often considered a dark art when it's actually fairly concise and expressive, opening it up to more people at a higher level could lead to greater understanding of regex in general. Also what an interesting and challenging project to undertake, definitely a nontrivial challenge all told.

yoav_lavi · 4 years ago

Author here,

1. A reverse compiler is one of the 'maybe' features (see the table in the README), it's something I'd like but would essentially be an entire compiler so it's non trivial

2. The plan is to make Melody available as a compile step (like e.g. SASS) with no runtime overhead or as a Rust crate. You could do the compilation at runtime but other than including variables in the pattern I'm not sure if it'd have a benefit over compile time transforming, + it'd have a performance impact.

Thank you!

parksy · 4 years ago

No dramas, understood re 1 it's no doubt beyond trivial to create a bidirectional transpiler. I wouldn't even know where to begin so good work on what you've managed so far.

Re 2, shouldn't be a problem as we already have build processes in place. Most projects I work on have npm build steps, I'm not sure how that figures in with rust (I really need to get off my butt and check it out sometime), but if it could be pulled in as an npm dependency that would work. If it could be done inline even better (e.g. inline melody within a JS file, compiles to the expression inline...)

Anyway good job again so far, have followed the repo, all the best once more!

draegtun · 4 years ago

I like using `parse` in Rebol / Red - http://www.rebol.com/r3/docs/functions/parse.html

Here's the parse rule for Batman:

  [
      16 "na"
      2 [space "batman"]
  ]

And complete example for the Semantic version:

blahgeek · 4 years ago

Emacs lisp provides a similar tool (with better syntax IMO): https://www.gnu.org/software/emacs/manual/html_node/elisp/Rx...

Author here, thanks for posting Melody! This is my first attempt at a language and I'm learning Rust, so any input would be appreciated

fouc · 4 years ago

Minor comment.. I personally find it harder to type < > than : because the < > keys are a lot closer to the shift key and causes more wrist strain.

Have you considered borrowing the :emoji: convention that slack/discord/github use? :space: :feed: etc..

fire · 4 years ago

( not OP and not disregarding the issue at hand ) have you tried practicing usage of the shift key opposite the key you want to press? Learning to make that change was hard for me, but is one or if not the largest improvement for me over the years in both typing speed and general typing comfort

zdragnar · 4 years ago

Assuming a standard QWERTY layout, shouldn't you be using the left shift when typing < or >? It'll significantly reduce the contortion effort in your right hand. Same goes for most chords- use opposing hands for the modifier and symbol keys.

(Oddly enough, I don't bother doing this with ctrl/cmd + a/s/f/z/x/c/v, but I think that is mostly because keys to the right of the space bar vary so much between laptops and keyboards that I never bothered trying to stick with it).

dotancohen · 4 years ago

Hello Yoav! In my opinion the match keyword is not needed. When the parser gets to an opening bracket that should start whatever methodology the match keyword is doing. As a heavy regex user, I understand that you want consistency with the capture keyword behaviour. But if we assume that the user is a programmer but does not know regex, it makes more sense to view {<space>;"batman";} as an array (delimited by curly brackets).

In fact, you might want to go a step further and consider using [] for match and {} for capture (thus eliminating the capture keyword as well). Using [] for match would be natural for Javascript programmers.

dokem · 4 years ago

I feel like there should be a way to group a portion of the regex, by name, for extraction later. Otherwise I like it.

    group $first_name { some of <letter> }
    1 of <space>
    group $last_name { some of <letter> }

ozzmotik · 4 years ago

capture name {} is a thing it listed

Dead Comment

maximilianroos · 4 years ago

A bit orthogonal but something I would love to see:

A library which takes a regex and shows some examples that pass and some that fail. I would find that the easiest way of understanding a regex, rather than changing the language itself. (Though Melody looks v promising and I'm keen to see it develop).

It wouldn't be trivial to build — particularly for the "fail" examples, you'd want them fairly close to passing. For example, with `(/*\.csv\.gz)` you'd want `foo.csv.gz` rather than `aoseutn` as an example of a failure.

all2 · 4 years ago

There's a python library called xeger [0] that allows you to generate strings from regular expressions. I've used this at work to generate large quantities of "valid" test data.

[0] https://pypi.org/project/xeger/

This looks great as a password generator, username generator, etc. Thank you!

tgv · 4 years ago

The fail bit is harder indeed, especially for larger regexps, but not totally impossible. The easiest way towards that goal seems to be constructing the DFA first and then generating illegal single edits (insertion, deletion, substitution). Generating a positive example is possible without it.

nathancahill · 4 years ago

Like American Fuzzy Lop for regex. I dig it.

ZeroGravitas · 4 years ago

I always liked the Perl style commented regexes, would be nice if this could generate those, though I guess that needs language support.

https://stackoverflow.com/questions/15463257/commenting-regu...

Some interesting workarounds mentioned here that might pair well with melody type languages.

A way to specify example strings that match or don't as a mini unit test would be cool too.

kdtop · 4 years ago

I have never taken the time to learn RegEx stuff. It seems like it would be great if I could keep all the syntax in my head. So the idea of Melody seems great. I don't like that the github description claims it to be unstable currently. I hope this project continues and flourishes.

Author here, thank you! The reason I stated that Melody is unstable is that the project is very young (days) and so some of the syntax is still being considered and may change (although the general idea and direction will remain), and also not everything is implemented yet. I'm also considering changing the way the parsing works but that wouldn't affect end users in terms of expected results for valid code)

thedevelopnik · 4 years ago

Even as someone who has invested time in learning to write regexp, they are still hard to read and maintain. This project looks super cool!

2OEH8eoCRo0 · 4 years ago

You could learn it in a few hours. There aren't many rules.

gowld · 4 years ago

Just use one of the many generators

manual: https://regexr.com/

AI: https://regex-generator.olafneumann.org/

nixpulvis · 4 years ago

I'm a big fan of https://rubular.com.