Regex.ai is an AI-powered tool that generates regular expressions. It can accurately generate regular expressions that match specific patterns in text with precision. Whether you're a novice or an expert, Regex.ai's intuitive interface makes it easy to input sample text and generate complex regular expressions quickly and efficiently. Overall, Regex.ai is a game-changer that will save you time and streamline your workflow.
Or, just write regular expressions?
> ... Regex.ai's intuitive interface makes it easy to input sample text and generate complex regular expressions quickly and efficiently.
See: https://www.ibm.com/topics/overfitting
Inputting the sample text:
And highlighting the first "baz" produced patterns which all had "[A-Z][a-z]*@libertylabs\\.ai" included, assumedly due to the default inclusions.Removing those and highlighting the second "baz" resulted "<Agent B>" as the results in one case.
There is no explanation of any patterns generated. If a person is to use one of the generated patterns and Regex.ai is supposed to "save you time and streamline your workflow", no matter "[w]hether you're a novice or an expert", then some form of verification and/or explanation must exist.
Otherwise, a person must know how to formulate regular expressions in order to determine which, if any, of the presented options are applicable. And if a person knows how to formulate regular expressions, then why would they use Regex.ai?
Well guess what, LLM-generated code is someone else’s code: an amalgamation derived from many peoples’ code. Except those people are ‘helpfully’ “abstracted away” from you by the middleman, so you can’t know their original intents and choices. What’s worse, it’s someone else’s code that will be treated as your code—unlike working with a legacy system that everyone knows was written by some guy, in this case any bugs will be squarely on you.
It's all fun and games until they burn down your house.
> ... I need to understand the intent, the whys behind the choices.
As do I.
And that is something ChatGPT-X (for any given X) cannot provide, regardless of whether or not what is produced is correct. Perhaps with some form of backward chaining[0] a ChatGPT-X someday can explain how it arrived at what was produced works.
But "the why" is the domain of people.
0 - https://en.wikipedia.org/wiki/Backward_chaining
With LLM-generated code, especially ChatGPT-style decoder models, none of that is true. All of the posts and comments I see about it here seem to be anecdotes "it can do all of my job for me" yet asking it to write the simplest code creates several issues on my end.
Personally I think a model geared towards code generation isn't an unsolvable task; the Spider dataset was released some time ago (text to SQL task) and the winning approach there was no fanciness on the model side, but rather to just test all the output queries to ensure it's at least valid SQL. That got a 20%+ boost in accuracy.
At that point the plane AI better be 100% TRUSTWORTHY cause there's no safe fallback.
Also, consider how to express anchoring and/or grouping preferences in the UI or weighting based on highlight positioning. These are oft used features of regex languages.
Try giving it examples where the data provides context cues.
\b(foo|bar|baz)\b
\w(foo|bar|baz)\w
\bbaz\b
[fF][oO][oO]|[Bb][Aa][Rr]|[Bb][Aa][Zz]
It only lacks a dice button which randomly selects the "correct" answer.
https://arts.units.it/retrieve/handle/11368/2758954/57751/20...
https://arxiv.org/pdf/1908.03316
https://cs.stanford.edu/~minalee/pdf/gpce2016-alpharegex.pdf
It uses genetic programming to build the regular expression.
https://regex.ai/ was stuck with /9856|10190|9753|8883/ and confidently emitted /\d{4}/ as an alternative.
https://regex101.com/r/cAaV1z/1 confirms the former.
For complex inputs, use actual peg parsers : https://docs.rs/peg/latest/peg/
For simplet inputs, express your intent with readable methods using a lib : https://github.com/sgreben/regex-builder/ & https://github.com/francisrstokes/super-expressive
There are certainly cases where different parsing methods/grammars are a better fit, but regex shines in many places.
I'd like to see some numbers on a tool like this. If a huge majority of people are seeing genuine improvements in their workflow with it, I won't be a luddite yelling at them. Rare, low-severity failures shouldn't hold us back.
But the potential cost of failure with (any) regex is very high, so I personally wouldn't want to trust any remotely mission-critical to a person who doesn't understand regex well enough to write it themself, and if they can write it on their own that's often faster than debugging AI-generated regex.
Would you feel better if it generated a regex-builder expression instead of a regex?
Even if regex-builder generates a regex under the hood?
In any case, the regex itself is only an implementation detail.
There is an excellent HN comment that provides more reading material around regex generation:- https://news.ycombinator.com/item?id=32037544
It looks like no one did that here. Even using the sample data provided, if you highlight a few of the addresses, it can't find the rest of them, mainly because it generates a regex with ST/AVE/LN in it, missing all the ones with RD. And if you add an RD sample, it just adds that to the list.
There's lots of great innovation coming with LLMs, but people are forgetting their "AI basics" when it comes to verifying them.
We tell AI what we want. AI produces a hyper-specific, but barely comprehensible result. We look over the result to make sure it’s all good.
Then execute.
Except... it made ONE ERROR that I just spent two hours tracking down and fixing in my JSON file and now in the Stripe dash. (I coincidentally found the error using ChatGPT lol).
It's probably still faster and less error-prone than I could have done it manually. But it's still error-prone...