How to think about the OpenAI Q rumors*
I have never come across the word "loyle" before, and googling suggests it does not exist. Does anyone know what the reference here is? Or what this is a typo for?
http://luc.devroye.org/fonts-44934.html
It was a bit of a "novelty" font for typewriters. Almost like the Comic Sans of its day. It wasn't meant to be used as italics, it was for typing a fun invitation or more "personal" letter or something like that.
I can't find any information on it specifically though, whether it was the first. It wouldn't surprise me if it were, however.
Oof, my hubris.
import Prelude hiding (null)
import Data.Set (Set, toList, fromList, empty, singleton, isSubsetOf, unions, null)
data Regex = Class [Char] -- character class
| Seq [Regex] -- sequence, ABC
| Choice [Regex] -- choice, A|B|C
| Star Regex -- zero or more, A*
deriving (Show)
-- The language of a regex is either finite or infinite.
-- We only care about the finite case.
data Lang = Finite (Set String) | Infinite deriving (Show, Eq)
zero = Finite empty
one = Finite (singleton "")
isEmpty (Finite s) = null s
isEmpty Infinite = False
cat :: Lang -> Lang -> Lang
cat x y | isEmpty x || isEmpty y = zero
cat (Finite s) (Finite t) = Finite $ fromList [x ++ y | x <- toList s, y <- toList t]
cat _ _ = Infinite
subsingleton :: Lang -> Bool
subsingleton Infinite = False
subsingleton (Finite s) = isSubsetOf s (fromList [""])
eval :: Regex -> Lang
eval (Class chars) = Finite $ fromList [[c] | c <- chars]
eval (Seq rs) = foldr cat one $ map eval rs
eval (Choice rs) | any (== Infinite) langs = Infinite
| otherwise = Finite $ unions [s | Finite s <- langs]
where langs = map eval rs
eval (Star r) | subsingleton (eval r) = one
| otherwise = InfiniteOof, my hubris.
(EDIT: this code is completely wrongheaded and does not work; it assumes that when sequencing regexes, you can take the product of their sizes to find the overall size. This is just not true. See reply, below, for an example.)
-- https://gist.github.com/rntz/03604e36888a8c6f08bb5e8c665ba9d0
import qualified Data.List as List
data Regex = Class [Char] -- character class
| Seq [Regex] -- sequence, ABC
| Choice [Regex] -- choice, A|B|C
| Star Regex -- zero or more, A*
deriving (Show)
data Size = Finite Int | Infinite deriving (Show, Eq)
instance Num Size where
abs = undefined; signum = undefined; negate = undefined -- unnecessary
fromInteger = Finite . fromInteger
Finite x + Finite y = Finite (x + y)
_ + _ = Infinite
Finite x * Finite y = Finite (x * y)
x * y = if x == 0 || y == 0 then 0 else Infinite
-- computes size & language (list of matching strings, if regex is finite)
eval :: Regex -> (Size, [String])
eval (Class chars) = (Finite (length cset), [[c] | c <- cset])
where cset = List.nub chars
eval (Seq regexes) = (product sizes, concat <$> sequence langs)
where (sizes, langs) = unzip $ map eval regexes
eval (Choice regexes) = (size, lang)
where (sizes, langs) = unzip $ map eval regexes
lang = concat langs
size = if elem Infinite sizes then Infinite
-- finite, so just count 'em. inefficient but works.
else Finite (length (List.nub lang))
eval (Star r) = (size, lang)
where (rsize, rlang) = eval r
size | rsize == 0 = 1
| rsize == 1 && List.nub rlang == [""] = 1
| otherwise = Infinite
lang = [""] ++ ((++) <$> [x | x <- rlang, x /= ""] <*> lang)
size :: Regex -> Size
size = fst . eval
NB. Besides the utter wrong-headedness of the `product` call, the generated string-sets may not be exhaustive for infinite languages, and the original version (I have since edited it) was wrong in several cases for Star (if the argument was nullable or empty).With a backtracking-search implementation of regexes, bounded iteration is pretty easy.
But the linked webpage appears to compile regexes to finite state machines (it shows you their finite-state-machine, for instance), and eg [a-z]{1,256} will have 256 states: 256 times the 1 state needed for [a-z]. If [a-z] were a complex regex, you could get a combinatorial explosion.
This alone probably isn't the issue? 256 is not a very large number. But I suspect there are follow-on algorithmic issues. This is just speculation, but I wouldn't be surprised if that 256-state machine were computed by applying DFA minimization, an algorithm with worst-case exponential running time, to a more naively generated machine.
Looks like it was the author's decision. +1 on the irony.