Readit News logoReadit News
c0nstantine commented on Ask HN: What are you working on? (May 2025)    · Posted by u/david927
c0nstantine · 10 months ago
Working on trre - extension of regex for text editing. I'm redesigning the underlying engine to operate on deterministic automata (transducers) for most expressions. Theoretically, it should outperform AWK in complex text-processing tasks.

https://github.com/c0stya/trre

c0nstantine commented on Ask HN: What's the best implementation of Conway's Game of Life?    · Posted by u/jakemanger
c0nstantine · a year ago
I have a minimalistic one (7 lines in Python) using convolutions:

https://c0stya.github.io/articles/game_of_life.html

### The code ###

  import numpy as np
  from scipy.signal import convolve2d

  field = np.random.randint(0, 2, size=(100, 100)) # 100x100 field size
  kernel = np.ones((3, 3))
  
  for i in range(1000): # 1000 steps
      new_field = convolve2d(field, kernel, mode="same")
      field = (new_field == 3) + (new_field == 4) * field

c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
larodi · a year ago
In place replacing the text while parsing is very powerful approach. Fingers crosses this flies.

This, combined with probabilistic approach can be even more interesting. Probabilistic approaches regexes exist since 70s if memory serves right.

c0nstantine · a year ago
Thank you for your feedback. There is a bunch of deterministic methods to infer regex from samples (positive and negative). There are ml-based as well. But it is a different story.
c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
mordechai9000 · a year ago
I would probably use this in a text editor if it was available. I don't struggle with group logic, but I often forget which tools use \ to reference capture groups, and which use $. (If it's Microsoft or Microsoft-adjacent, it's probably $.)
c0nstantine · a year ago
Let me know if you need any help. Not it is still raw but I hope I'll polish it soon.
c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
kemiller · a year ago
This feels like it would be interesting in the Helix editor. They don't have a good solution for traditional regexp search-and-replace, but this would fill that niche in a more elegant way.
c0nstantine · a year ago
Hi. Missed your message initially. Helix is a great project. Let me know if/how I can help. The trre is a bit raw. But hope I can polish it within a month or two.
c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
simlevesque · a year ago
I'm using Linux.
c0nstantine · a year ago
Did it solve the problem? I guess the issue is the process substitution construction of bash "<()". Not all shells support this.
c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
kccqzy · a year ago
I do not understand the rules by which you inject the epsilon and I think this is a source of confusion for many people. I had thought that an epsilon could be injected anywhere REGEX can appear (effectively allowing epsilon as a REGEX) but of course that just leads to infinite number of parses. Manually injecting epsilon is a highly hacky thing to do; better to consider that when you design the grammar.

I would not worry about "cat:dog:mouse" because intuitively it is clearly correct and it means replacing cat with mouse. With parentheses it could be written as "((cat:dog):mouse)".

c0nstantine · a year ago
Epsilon injection appears whenever right or left side of ':' has no operand. E.g.

(:a)

(a:)

a:|b

a|b:

etc

I will try to change the precedence and see how it works. Btw what do you think about explicit operators '>' '<' where '<' works as usual regex matcher, and '>' as a generator. For example to change 'cat' to 'dog' there could be something like '<cat>dog' where '<cat' part is a parser and '>dog' is a generator. Thanks.

c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
froh · a year ago
hm maybe juxtapose a number of examples in one precedence and then the other? and share them to gather feedback?

also,

* colon as member of a character class (and dash and how they interact)

* posix character classes (and the colon in the syntax)

* word boundaries are really useful in practice

* think of ünicøde early and look at what russ cox did there

boundaries, what do you decide to exclude? for example back references and grouping have fun with DFAs (again see russ cox and re2)

composition is fantastic, and a shortcut to exponential land because grammars (as in Chomsky hierarchy) can easily be expressed with transducers, yay.

boundaries will also clarify use cases and allow to state: "if you want to do xyz please use re2 and a bit of code instead"

and one "technicality" that hit me once with re2 was a static buffer for resumable iteration. I'd loved to reallocate the buffer and drop it's beginning and extend it at the end but alas, the state of re2 has hard pointers into the buffer, not relative ones. I think this was when re2 hit the end of the buffer mid-expression during incremental parse. so you can't reallocate and instead you have to drop the hard earned state and resume.

anyhow, it's been quite a while so I'm no longer in the thicket of this :-D

what's your driver? curiosity? or some itch?

but I really enjoy seeing your project :-)

c0nstantine · a year ago
Thank you for the detailed comment.

So Unicode is something on a top of my TODO list. Boundaries is a very interesting topic. Maybe I'll extend the doc to include the details.

> what's your driver? curiosity? or some itch?

It's an itch :). 8 years ago I explored automata theory and found finite transducers to be a handy and natural way to transform texts. And as regex corresponds to FSA (finite state acceptors) I wanted to create a language that corresponds to FST (finite state transducers). There is a lesser-known algorithm for FST determinization and I want to applied it to make the transformation fast and efficient. It turned out to be not that simple as I expected initially.

c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
Timwi · a year ago
There’s no reason to say that “ca(t:d)og” is a “more correct” parsing than “(cat):(dog)”. You did hit the nail on the head insofar as that you realized that we as programmers have built strong habits and make assumptions on the basis of those habits. But you didn’t take it to the logical conclusion and didn’t realize that having a text-based syntax to represent regexes is also such a habit/assumption.

In pure theoretical computer science, regular expressions exist as an abstract concept independent from syntax or parsers. They are an “algebra”, which means they are composed of elements connected with operators, but they are not inherently tied to a syntax. In the most fundamental formulation of regular expressions (the one in the Chomsky hierarchy), the only operators are alteration (which modern syntaxes express as “|”), the Kleene star (“*”) and — notably — concatenation, which modern syntaxes simply omit, in a way comparable to how modern mathematics notation omits the multiplication operator when you write “2x”.

In the same way that maths needs rules to define whether “2x²” means “(2x)²” or “2(x²)”, regex syntax needs such rules too. This is called operator precedence. I’m sure you’ve heard that before, but you just might not have realized that the regular expression “ab” has an operator in it because it is typically not written.

Now I’m not going to argue that the operator precedence in maths notation is haphazard or without reason — but it is arbirary. It was arbitrarily chosen to be the most useful to mathematicians using the notation. And it turns out that giving exponentiation higher precedence than (invisible) multiplication (meaning: “2x²” means “2(x²)” rather than “(2x)²”) is more useful.

So coming back to the original example, whether “cat:dog” means “ca(t:d)og” or “(cat):(dog)” is simply a matter of defining the precedence of the “:” operator relative to the concatenation operator. You can argue (and I would agree with you) that one is more useful than the other, and therefore preferable (in the same way that “(cat)|(dog)” is more useful than “ca(t|d)og”), but neither of them is more fundamentally correct or primal or, as you put it, “supposed to formally expand to”.

c0nstantine · a year ago
I agree with the point that precedence is arbitrary. The current version looks like this:

1 Escaped characters

2 []

3 ()

4 * + ? {m,n}

5 :

6 . (implicit concatenation)

7 |

I have some reasons to put it that way. I want : to be somewhat 'atomic'. If you think about '*' or '+' they can be lower in the table as well. Anyway, I will try to put : lower in the next version and see how it goes.

c0nstantine commented on Show HN: Transductive regular expressions for text editing   github.com/c0stya/trre... · Posted by u/c0nstantine
johnnymellor · a year ago
If I understand correctly the following ttre expression does what you're asking for:

  ":'(':(\\')|[^"'])*":'

c0nstantine · a year ago
Thank you for doing my work! :)

u/c0nstantine

KarmaCake day144March 4, 2023
About
I am a machine learning engineer and researcher with a focus on representation learning, information theory, and inductive reasoning. Please feel free to reach out - I’d love to connect.

https://c0stya.github.io/about.html

View Original