Readit News logoReadit News
tolmasky · a year ago
I implemented something similar to the compositional regular expressions feature described here for JavaScript a while ago (independently, so semantics may not be the same), and it is one of the libraries I find myself most often bringing into other projects years later. It gets you a tiny bit closer to feeling like you have a first-class parser in the language. Here is an example of implementing media type parsing with regexes using it: https://runkit.com/tolmasky/media-type-parsing-with-template...

"templated-regular-expression" on npm, GitHub: https://github.com/tolmasky/templated-regular-expression

To be clear, programming languages should just have actual parsers and you shouldn't use regular expressions for parsers. But if you ARE going to use a regular expression, man is it nice to break it up into smaller pieces.

b2gills · a year ago
"Actual parsers" aren't powerful enough to be used to parse Raku.

Raku regular expressions combined with grammars are far more powerful, and if written well, easier to understand than any "actual parser". In order to parse Raku with an "actual parser" it would have to allow you to add and remove things from it as it is parsing. Raku's "parser" does this by subclassing the current grammar adding or removing them in the subclass, and then reverting back to the previous grammar at the end of the current lexical scope.

In Raku, a regular expression is another syntax for writing code. It just has a slightly different default syntax and behavior. It can have both parameters and variables. If the regular expression syntax isn't a good fit for what you are trying to do, you can embed regular Raku syntax to do whatever you need to do and return right back to regular expression syntax.

It also has a much better syntax for doing advanced things, as it was completely redesigned from first principles.

The following is an example of how to match at least one `A` followed by exactly that number of `B`s and exactly that number of `C`s.

(Note that bare square brackets [] are for grouping, not for character classes.)

  my $string = 'AAABBBCCC';

  say $string ~~ /
    ^

    # match at least one A
    # store the result in a named sub-entry
    $<A> = [ A+ ]

    {} # update result object

    # create a lexical var named $repetition
    :my $repetition = $<A>.chars(); # <- embedded Raku syntax

    # match B and then C exactly $repetition times
    $<B> = [ B ** {$repetition} ]
    $<C> = [ C ** {$repetition} ]
  
    $
  /;
Result:

  「AAABBBCCC」
  A => 「AAA」
  B => 「BBB」
  C => 「CCC」
The result is actually a very extensive object that has many ways to interrogate it. What you see above is just a built-in human readable view of it.

In most regular expression syntaxes to match equal amounts of `A`s and `B`s you would need to recurse in-between `A` and `B`. That of course wouldn't allow you to also do that for `C`. That also wouldn't be anywhere as easy to follow as the above. The above should run fairly fast because it never has to backtrack, or recurse.

When you combine them into a grammar, you will get a full parse-tree. (Actually you can do that without a grammar, it is just easier with one.)

To see an actual parser I often recommend people look at JSON::TINY::Grammar https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...

Frankly from my perspective much of the design of "actual parsers" are a byproduct of limited RAM on early computers. The reason there is a separate tokenization stage was to reduce the amount of RAM used for the source code so that further stages had enough RAM to do any of the semantic analysis, and eventual compiling of the code. It doesn't really do that much to simplify any of the further stages in my view.

The JSON::Tiny module from above creates the native Raku data structure using an actions class, as the grammar is parsing. Meaning it is parsing and compiling as it goes.

ogogmad · a year ago
I imagine this could be understood as making use of a monad. Right?

The main problem with generalised regexes is that you can't match them in linear time worst-case. I'm wondering if this is addressed at all by Raku.

xigoi · a year ago
Why are they called regular expressions if they can parse non-regular languages?
tolmasky · a year ago
I don't think we disagree here. To clarify, my statement about using "actual parsers" over regexes was more directed at my own library than Raku. Since I had just posted a link on how to "parse" media types using my library, I wanted to immediately follow that with a word of caution of "But don't do that! You shouldn't be using (traditional) regexes to parse! They are the wrong tool for that. How unfortunate it is that most languages have a super simple syntax for (traditional/PCRE) regexes and not for parsing." I had seen in the article that Raku had some sort of "grammar" concept, so I was kind of saying "oh it looks like Raku may be tackling that to."

Hopefully that clarifies that I was not necessarily making any statement about whether or not to use Raku regexes, which I don't pretend to know well enough to qualify to give advice around. Just for the sake of interesting discussion however, I do have a few follow up comments to what you wrote:

1. Aside from my original confusing use of the term "regexes" to actually mean "PCRE-style regexes", I recognize I also left a fair amount of ambiguity by referring to "actual parsers". Given that there is no "true" requirement to be a parser, what I was attempting to say is something along the lines of: a tool designed to transform text into some sort of structured data, as opposed to a tool designed to match patterns. Again, from this alone, seems like Raku regexes qualify just fine.

2. That being said, I do have a separate issue with using regexes for anything, which is that I do not think it is trivial to reason about the performance characteristics of regexes. IOW, the syntax "doesn't scale". This has already been discussed plenty of course, but suffice it to say that backtracking has proven undeniably popular, and so it seems an essential part of what most people consider regexes. Unfortunately this can lead to surprises when long strings are passed in later. Relatedly, I think regexes are just difficult to understand in general (for most people). No one seems to actually know them all that well. They venture very close to "write-only languages". Then people are scared to ever make a change in them. All of this arguably is a result of the original point that regexes are optimized for quick and dirty string matching, not to power gcc's C parser. This is all of course exacerbated by the truly terrible ergonomics, including not being able to compose regexes out of the box, etc. Again, I think you make a case here that Raku is attempting to "elevate" the regex to solve some if not all of these problems (clearly not only composable but also "modular", as well as being able to control backtracking, etc.) All great things!

I'd still be apprehensive about the regex "atoms" since I do think that regexes are not super intuitive for most people. But perhaps I've reversed cause and effect and the reason they're not intuitive is because of the state they currently exist in in most languages, and if you could write them with Raku's advanced features, regexes would be no more unintuitive than any other language feature, since you aren't forced to create one long unterminated 500-character regex for anything interesting. In other words, perhaps the "confusing" aspects of regexes are much more incidental to their "API" vs. an essential consequence of the way they describe and match text.

3. I'd like to just separately point out that many aspects of what you mentioned was added to regexes could be added to other kinds of parsers as well. IOW, "actual parsers" could theoretically parse Raku, if said "actual parsers" supported the discussed extensions. For example, there's no reason PEG parsers couldn't allow you to fall into dynamic sub-languages. Perhaps you did not mean to imply that this couldn't be the case, but I just wanted to make sure to point out that these extensions you mention appear to have much more generally applicable than they are perhaps given credit for by being "a part of regexes in Raku" (or maybe that's not the case at all and it was just presented this way in this comment for brevity, totally possible since I don't know Raku).

I'll certainly take a closer look at the full Raku grammar stuff since I've written lots of parser extensions that I'd be curious have analogues in Raku or might make sense to add to it, or alternatively interesting other ideas that can be taken from Raku. I will say that RakuAST is something I've always wanted languages to have, so that alone is very exciting!

mempko · a year ago
I use Raku in production. It's the best language to deal with text because building parsers so so damn nice. I'm shocked this isn't the top language to create an LLM text pipeline.
bloopernova · a year ago
Very late to the thread, but I was wondering if you knew of a good example of Raku calling an API over https, polling the API until it returns a specific value?
antononcube · a year ago
Here is one way:

    use HTTP::UserAgent;
    use JSON::Fast;

    my $url = 'https://api.coindesk.com/v1/bpi/currentprice.json';

    my $ua = HTTP::UserAgent.new;

    my $total-time = 0;
    loop {
        my $response = $ua.get($url);
        if $response.is-success {
            my $data = from-json $response.content;
            my $rate = $data<chartName>;
            say "Current chart name: $rate";
            #last if $rate eq 'Bitcoin';
            last if $total-time ≥ 16;
        }
        else {
            say "Failed to fetch data: {$response.status-line}";
        }
        sleep 3; # Poll every 3 seconds
        $total-time += 3;
    }
(Tweak / uncomment / rename the $rate variable assignments and checks.)

antononcube · a year ago
Do you use any of Raku's LLM packages? If yes, which ones?
mempko · a year ago
I have not used any. What are some good ones to look at?
christophilus · a year ago
Wow. Sign me up for leaving the industry before I ever have to maintain a Raku codebase.
agumonkey · a year ago
Funny, cause reading that blog post made me want to quit my job and find a raku team to work with. Maybe I'm still too naive :)
TOGoS · a year ago
Same. And it's a bit funny because I'm usually against unnecessary complexity, and here's a language that seems to have embraced it and become a giant castle of language features that I could spend weeks studying.

Maybe it's because Raku's features were actually well thought-out, unlike the incidental "doesn't actually buy me anything, just makes the code hard to deal with" complexity I have to deal at work day in and day out.

Maybe if Java had a few of these features back in the day people wouldn't've felt the need to construct these monstrous annotation soup frameworks in it.

IshKebab · a year ago
Some of them seem ok, e.g. ignoring whitespaces in regex by default is a great move, and the `` as a shorthand single lambda argument is neat.

But trust me if you ever have to actually

work* with them you will find yourself cursing whoever decided to riddle their code with <<+>>, or the person that decided `* + ` isn't the same as `2 *` (that parser must have been fun to write!)
fuzztester · a year ago
yeah, you're right.

your entire comment is syntactically valid raku, or can be made so, because raku syntax is so powerful and flexible.

raku grammars ftw!

https://docs.raku.org/language/grammars

https://docs.raku.org/language/grammar_tutorial

;)

even that last little fella above, is or can be made syntactically valid.

7thaccount · a year ago
That's a fair reaction to the post if you haven't looked at any normal Raku code.

If you look at any of the introductory Raku books, it seems a LOT like Python with a C-like syntax. By that I mean the syntax is more curly-brace oriented, but the ease of use and built-in data structures and OO features are all very high level stuff. I think if you know any other high level scripting language that you would find Raku pretty easy to read for comparable scripts. I find it pretty unlikely that the majority of people would use the really unusual stuff in normal every day code. Raku is more flexible (more than one way to do things), but it isn't arcane looking for the normal stuff I've seen. I hope that helps.

maleldil · a year ago
Sure, but the fact that weird stuff is possible means that someone, at some point, will try to use it in your codebase. This might be prevented if you have a strong code review culture, but if the lead dev in the project wants to use something unusual, chances are no one will stop them. And once you start...
rjh29 · a year ago
Same as Perl, nobody wants to maintain it, but it's extremely fun to write. It has a lot of expression.

You can see that in Raku's ability to define keyword arguments with a shorthand (e.g. :global(:$g)' as well as assuming a value of 'True', so you can just call match(/foo/, :g) to get a global regex match). Perl has tons of this stuff too, all aimed at making the language quicker and more fun to write, but less readable for beginners.

b2gills · a year ago
Many of the features that make Perl harder to write cleanly have been improved in Raku.

Frankly I would absolutely love to maintain a Raku codebase.

I would also like to update a Perl codebase into being more maintainable. I'm not sure how much I would like to actually maintain a Perl codebase because I have been spoiled by Raku. So I also wouldn't like to maintain one in Java, Python, C/C++, D, Rust, Go, etc.

Imagine learning how to use both of your arms as if they were your dominant arm, and doing so simultaneously. Then imagine going back to only using one arm for most tasks. That's about how I feel about using languages other than Raku.

calvinmorrison · a year ago
its not that reading perl is hard, the _intent_ of the operations of often hard/unclear.

Yes its nice to write dense fancy code, however, something very boring to write like PHP is a lot of "loop over this bucket and do something with the fish in the bucket, afterwards, take the bucket and throw it into the bucket pile" that mirrors a 'human follows these steps' type.

Deleted Comment

BeFlatXIII · a year ago
Perl reminds me of that job I had writing ANSI MUMPS.
wruza · a year ago
it's extremely fun to write

Then your contrarian phase ends and you regret that you didn’t learn something useful in that time.

kamaal · a year ago
Its strange that people are saying the same about maintaining code bases written using AI assistance.

Im guessing its going to be a generational thing now. A whole older generation of programmers will just find themselves out of place in what is like a normal work set up for the current generation.

zokier · a year ago
I don't think Raku is intended for "the industry".
lmm · a year ago
Some of these are halfway familiar. Hyper sounds like a more ad-hoc version of something from recursion-schemes, and * as presented is somewhat similar to Scala _ (which I love for lambdas and think every language should adopt something similar).
klibertp · a year ago
I think this is the closest equivalent for hyper: https://groovy-lang.org/operators.html#_spread_operator
hexane360 · a year ago
It's also quite similar to Thread and MapThread in Mathematica
emmelaich · a year ago
> (2, 30, 4, 50).map(* + *) returns (32, 45)

Should it be `returns (32, 54)` ? i.e. 4+50 for the 2nd term.

Maybe this is a consequence (head translation) of some countries saying e.g. vierenvijftig (four and fifty) instead of the English fifty-four.

agumonkey · a year ago
checked in rakudo, it does return (32 54), author fingers slipped

Dead Comment

jimberlage · a year ago
So I guess Perl is a gateway drug for the APL family of languages now?
riffraff · a year ago
yes, and the post didn't even touch on metaoperators e.g.

    # use the reduce metaoperator [ ] with infix + to do "sum all"
    [+] 1, 2, 3

b2gills · a year ago
That's nothing, use it to calculate the sum of range of values

  say [+] 1..10000000000000000000000000000000000000000000
Which will result in you getting this back in a fraction of a second

50000000000000000000000000000000000000000005000000000000000000000000000000000000000000

(It actually cheats because that particular operator gets substituted for `sum` which knows how to calculate the sum of a Range object.)

cutler · a year ago
Speed is still a major issue with Raku. Parsing a log file with a regex is Perl's forte but the latest Raku still takes 6.5 times as long as Python 3.13 excluding startup time.
donaldihunter · a year ago
You'd need to qualify that with an example. In my experience some things are faster in Raku and some are slower, so declaring that "Raku takes 6.5 times as long as Python 3.13" is pretty meaningless without seeing what it's slower at.
cutler · a year ago
I specified the use case so why "meaningless"? Here's the code:

    Raku
    for 'logs1.txt'.IO.lines -> $_ { .say if $_ ~~ /<<\w ** 15>>/; }

    Python
    from re import search
    with open('logs1.txt', 'r') as fh:
        for line in fh:
            if search(r'\b\w{15}\b', line): print(line, end='')

jddj · a year ago

  > (2,4,8...*)[17]
  262144
This one genuinely surprised me

quink · a year ago
My brain immediately reached for the word 'horrifying', with phrases 'terrible consequences' and 'halting problem' soon after, but to each their own.
labster · a year ago
I believe that it only tries to DWIM with arithmetic and geometric sequences, and gives up otherwise. Of course there’s nothing keeping you from writing a module that would override infix:... in the local scope with a lookup in OEIS.
mst · a year ago
You get either a compile time instantiated infinite lazy sequence, or a compilation error.

My personal bar for "amount of clever involved" is fairly high when the clever either does exactly what you'd expect or fails, and even higher when it does so at compile time.

(personal bars, and personal definitions of "exactly what you'd expect" will of course vary, but I think your brain may have miscalibrated the level of risk before it got as far as applying your preferences in this particular case)

b2gills · a year ago
This has nothing to do with the halting problem. And I have no idea why you think there would be 'terrible consequences'.

  2, 4, 8 ... *
The `...` operator only deduces arithmetic, or geometric changes for up-to the previous 3 values.

Basically the above becomes

  2, 4, 8, * × 2 ... *
Since each value is just double the previous one, it can figure that out.

If ... can't deduce the sequence, it will error out.

  2, 4, 9 ... *

  Unable to deduce arithmetic or geometric sequence from: 2, 4, 9
  Did you really mean '..'?
  in block at ./example.raku line 1
So I really don't understand how you would be horrified.

emmelaich · a year ago
The detecting of increments has been in Perl6 for ages so that's not new. [edit Perl6 not Perl)

I guess (apart from the Whatever), the laziness is new since Perl6/Raku.

JadeNB · a year ago
> The detecting of increments has been in Perl for ages so that's not new.

But this is detecting that the increment is multiplicative, rather than additive. It might seem like a natural next step, but, for example, I somewhat suspect (and definitely hope) that Raku wouldn't know that `(1, 1, 2...*)` is a (shifted) list of Fibonacci numbers.

agumonkey · a year ago
very cohesive,

    [7] > (1,3,9...*)[4,5]
    (81 243)
    [8] > (1,3,9...*)[(1..3)]
    (3 9 27)
and nestable

    [0] > (1,2,4...*)[(1,2,4...*)[1,2,3]]

justinator · a year ago
These are all very clever, but what's the use case? I'm not saying there isn't one, I just don't know what it is! Not to speak of the dead, but Perl was utilitarian: it was built to solve problems. From my point of view, these are solutions to problems I've never had.