The #regex IRC channel had an IRC bot with a quiz with 28 levels.
All sensibility ended after level 14 or so. At that point it was just "how deep does the PCRE rabbit-hole go?"
But there was a lot of useful, non-trivial stuff, too. Most specifically, look-aheads/lookbehinds, non-greedy matching, back-references, named capture-groups, character classes, anchors,
When I learned jq, I went much the same way: Started hanging out on #jq IRC channels and started trying to answer jq questions on StackOverflow. Sadly, I got outperformed the first six months, until it finally clicked.
The resources from Jan Goyvaerts / Just Great Software are great! His guides and to some extent tools is how I learned it too. Today I tend to often be the Regex go-to guy among colleagues, all seemingly because I learned to properly get the hang of the basics via Jan's resources.
I read Mastering Regular Expressions by Jeffrey Friedl 20 years ago when I was in middle school, front to back, and it's probably been the best investment of reading time I've ever made.
I had read on some tech site, some years ago, that Friedl worked at Yahoo for a while, and IIRC in a role involving a lot of text munging, which would probably have involved a lot of regular expression usage, maybe across the many web properties they had in that period, which included Yahoo Search, Yahoo Mail, Yahoo Groups, Yahoo Finance, and many more.
Found that interesting.
I had bought his book around that time or sometime later, but never read it fully, partly because I used to go cross-eyed from reading the text with all the italics and other highlighting (of the regexes in action) in a small font, which was probably needed to explain regex concepts, but still ...
but after reading this thread, I feel motivated to dive into regex again, at least at the shallow end of the pool, although I have dabbled in it and used it now and then in my work, before now.
I mean, the highlighting was maybe useful, but having it in the small font made it unnecessarily difficult to read. squiggly italics of say, one or two characters in length, are harder to distinguish from non-italic characters, when in quite a small font.
If I remember correctly, the local public library was selling books off for ultra cheap for some reason, and I added it to the pile of books my parents bought to fill the empty built-in bookshelves in our new family home (along with a bunch of reader's digest anthologies). I scanned through the first part of it and it seemed super powerful (I was just getting into programming at the time), so it captivated me.
I'm really surprised by the low quantity of people who learned by trying and instead read whole books or manuals. Had some code or whatnot that needed mass-replacing and used the built-in RegEx find-and-replace (I think it was EditPlus in those days). First learned how to match the exact string then extrapolate from there using {}, (), replacing, etc. It's a lot easier to learn when you need to solve a practical, immediate problem.
That's how I learned. I'm a self taught dev so I pretty much just took the same approach. Read documentation, try it out, read more docs, try it out, read some examples, search for how other people do it, etc. At a certain point, you just know it and can solve 90% of your needs without looking at docs. Although, tbh, I haven't written a very complicated regex in probably a decade and would need to do some warm up reps if I needed to today.
I agree you don’t need “whole books or manuals”, but how do you learn by trying alone?
The search space is enormous, and even if you stumble on a code fragment that appears to work, how do you know your code actually does what you think it does, and how do you know there isn’t a more efficient or readable way to do what you want to do? Case in point, you wrote:
> then extrapolate from there using {}, (), replacing, etc.
How did you find out about those, if not from reading (likely followed by some trying to check your understanding)? I think you have to read, not “whole books”, but ‘just’ the right documentation, where ‘right’ depends on the tool you use. For example man regex may be sufficient. That, you can read in a few minutes.
Yeah this is wild to me, maybe it’s a generational thing? I never “learned” regex. I’ve written hundreds of them but I figured out what I needed and then I moved on.
"Learned" in University but it wasn't until Jeff Friedl's Perl Conference talks that I really became one with the regex engine. He taught you how to think like the regex engine and thus how regular expressions would be interpreted and thus how to write them. Then I got a master class in RE from Tom Christiansen when we were writing the Perl Cookbook.
Jeff wrote "Mastering Regular Expressions", which grew from that talk. You probably want a copy even though it was first released in 1997. For the mindset of RE, you can't beat it.
Learning REs is a roll through:
* how matching happens (advancing, matching, backtracking)
* using * ? and {} to match repetitions
* greediness and stinginess within the RE
* character classes, both [manual] and escapes like \s \W etc
* anchors and "what a line is"
* grouping and backreferences
* accessing groups outside the RE
* substitution and access backrefs in substitutions
* find ALL the matches
* complex parsing (just don't, it's rare not to regret it)
and then it's an absolutely epic deep-dive into the minutiae of what line starts and ends might be, Unicode and regex, code to be executed from within the regex enging, using code to BUILD regex and worrying about when escaping happens or doesn't, denial of service regex, etc. that will take you through ASCII, various Unix tool chains over time, and a bunch of other fun stuff.
I need to build a Regex a couple of times a year, and have always wondered whether others learn it and store it in their brain-cache, or whether they too need to look it up each time.
These days, I ask Claude/ChatGPT to create the regex and usually I know enough to be able to verify it. To double check, I'll start a new conversation and ask it what the regex does and verify it that way.
You can also ask it to create unit tests with edge cases. It might not catch every edge case, but usually it will create edge cases that you might not think of when writing unit tests yourself.
Learned regex in the 90's from the Perl documentation, or possibly one of the oreilly perl references. That was a time where printed language references were more convenient than searching the internet. Perl still includes a shell component for accessing it's documentation, that was invaluable in those ancient times. Perl's regex documentation is rather fantastic.
A simple way to test a regex you're building is this website, which offers immediate parsing and documentation of your regex, lets you test it against various inputs, and lets you choose which language's regex parser you are targeting.
Practice, the more you use them the easier they become. I never studied them but knew when to use them, then just tinkered and iterated until the pattern did what I needed it to. After a while you can mostly just write and read them without much tinkering.
So, you can observe what kind of state machine is produced from any given Regular Expression. You can also use it to merge and such manipulate state machines, or simplify Regular Expressions.
I only used the bare minimum for years.
I also hung out on a #regex IRC channel, so I got exposed to questions and answers by many people.
Later I read up on https://www.regular-expressions.info/ which has a lot of very good explanations.
The #regex IRC channel had an IRC bot with a quiz with 28 levels.
All sensibility ended after level 14 or so. At that point it was just "how deep does the PCRE rabbit-hole go?"
But there was a lot of useful, non-trivial stuff, too. Most specifically, look-aheads/lookbehinds, non-greedy matching, back-references, named capture-groups, character classes, anchors,
When I learned jq, I went much the same way: Started hanging out on #jq IRC channels and started trying to answer jq questions on StackOverflow. Sadly, I got outperformed the first six months, until it finally clicked.
Found that interesting.
I had bought his book around that time or sometime later, but never read it fully, partly because I used to go cross-eyed from reading the text with all the italics and other highlighting (of the regexes in action) in a small font, which was probably needed to explain regex concepts, but still ...
but after reading this thread, I feel motivated to dive into regex again, at least at the shallow end of the pool, although I have dabbled in it and used it now and then in my work, before now.
That being said - regex is a superpower.
The search space is enormous, and even if you stumble on a code fragment that appears to work, how do you know your code actually does what you think it does, and how do you know there isn’t a more efficient or readable way to do what you want to do? Case in point, you wrote:
> then extrapolate from there using {}, (), replacing, etc.
How did you find out about those, if not from reading (likely followed by some trying to check your understanding)? I think you have to read, not “whole books”, but ‘just’ the right documentation, where ‘right’ depends on the tool you use. For example man regex may be sufficient. That, you can read in a few minutes.
Jeff wrote "Mastering Regular Expressions", which grew from that talk. You probably want a copy even though it was first released in 1997. For the mindset of RE, you can't beat it.
Learning REs is a roll through:
and then it's an absolutely epic deep-dive into the minutiae of what line starts and ends might be, Unicode and regex, code to be executed from within the regex enging, using code to BUILD regex and worrying about when escaping happens or doesn't, denial of service regex, etc. that will take you through ASCII, various Unix tool chains over time, and a bunch of other fun stuff.You can also ask it to create unit tests with edge cases. It might not catch every edge case, but usually it will create edge cases that you might not think of when writing unit tests yourself.
`perldoc perlre` from your terminal.
or https://perldoc.perl.org/perlre
A simple way to test a regex you're building is this website, which offers immediate parsing and documentation of your regex, lets you test it against various inputs, and lets you choose which language's regex parser you are targeting.
https://regexr.com/
Regex101 is an excellent tool.
https://github.com/qntm/greenery
So, you can observe what kind of state machine is produced from any given Regular Expression. You can also use it to merge and such manipulate state machines, or simplify Regular Expressions.
Quite helpful.