osaariki (u/osaariki)

osaariki commented on Response Healing: Reduce JSON defects by 80%+ openrouter.ai/announcemen... · Posted by u/numlocked

red2awn · 3 months ago

Very confused. When you enable structured output the response should adhere to the JSON schema EXACTLY, not best effort, by constraining the output via guided decoding. This is even documented in OpenRouter's structured output doc

> The model will respond with a JSON object that strictly follows your schema

Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.

Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.

WTF

osaariki · 3 months ago

You're exactly right. The llguidance library [1,2] seems to have emerged as the go-to solution for this by virtue of being >10X faster than its competition. It's work from some past colleagues of mine at Microsoft Research based on theory of (regex) derivatives, which we perviously used to ship a novel kind of regex engine for .NET. It's cool work and AFAIK should ensure full adherence to a JSON grammar.

llguidance is used in vLLM, SGLang, internally at OpenAI and elsewhere. At the same time, I also see a non-trivial JSON error rate from Gemini models in large scale synthetic generations, so perhaps Google hasn't seen the "llight" yet and are using something less principled.

1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance

osaariki commented on Apple’s Persona technology uses Gaussian splatting to create 3D facial scans cnet.com/tech/computing/a... · Posted by u/dmarcos

dangus · 4 months ago

It’s amazing tech, it’s just a solution looking for a problem.

It feels a bit like the original Segway’s over-engineered solution versus cheap Chinese hoverboards, then the scooters and e-bikes that took over afterwards.

Why would I be paying all this money for this realistic telepresence when my shitbox HP laptop from Walmart has a perfectly serviceable webcam?

osaariki · 4 months ago

I live half way across the world from my folks so I don’t see them often. I’d love something that gives me a greater sense of presence than a video call can give.

osaariki commented on Gödel's theorem debunks the most important AI myth – Roger Penrose [video] youtube.com/watch?v=biUfM... · Posted by u/Lockal

katabasis · a year ago

Many philosophical traditions which incorporate a meditation practice emphasize that your consciousness is distinct from the contents of your thoughts. Meditation (even practiced casually) can provide a direct experience of this.

When it comes to the various kinds of thought-processes that humans engage in (linguistic thinking, logic, math, etc) I agree that you can describe things in terms of functions that have definite inputs and outputs. So human thinking is probably computable, and I think that LLMs can be said to be ”think” in ways that are analogous to what we do.

But human consciousness produces an experience (the experience of being conscious) as opposed to some definite output. I do not think it is computable in the same way.

I don’t necessarily think that you need to subscribe to dualism or religious beliefs to explain consciousness - it seems entirely possible (maybe even likely) that what we experience as consciousness is some kind of illusory side-effect of biological processes as opposed to something autonomous and “real”.

But I do think it’s still important to maintain a distinction between “thinking” (computable, we do it, AIs do it as well) and “consciousness” (we experience it, probably many animals experience it also, but it’s orthogonal to the linguistic or logical reasoning processes that AIs are currently capable of).

At some point this vague experience of awareness may be all that differentiates us from the machines, so we shouldn’t dismiss it.

osaariki · a year ago

We don’t know that LLMs generating tokens for scenarios involving simulations of conscious don’t already involve such experience. Certainly such threads of consciousness would currently be much less coherent and fleeting than the human experience, but I see no reason to simply ignore the possibility. To whatever degree it is even coherent to talk about the conscious experience of others than yourself (p-zombies and such), I expect that as AIs’ long term coherency improves and AI minds become more tangible to us, people will settle into the same implicit assumption afforded to fellow humans that there is consciousness behind the cognition.

osaariki commented on TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (2023) arxiv.org/abs/2305.07759... · Posted by u/tzury

osaariki · a year ago

For some interesting context: this paper was a precursor to all the work on synthetic data at Microsoft Research that lead to the Phi series of SLMs. [1] It was an important demonstration of what carefully curated and clean data could do for language models.

1: https://arxiv.org/abs/2412.08905

osaariki commented on Swift Homomorphic Encryption swift.org/blog/announcing... · Posted by u/yAak

bluedevilzn · 2 years ago

This must be the first real world use case of HE. It has generally been considered too slow to do anything useful but this is an excellent use case.

osaariki · 2 years ago

Edge's Password Monitor feature uses homomorphic encryption to match passwords against a database of leaks without revealing anything about those passwords: https://www.microsoft.com/en-us/research/blog/password-monit... So not the first, but definitely cool to see more adoption!

osaariki commented on TypeChat microsoft.github.io/TypeC... · Posted by u/DanRosenwasser

33a · 3 years ago

Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

osaariki · 3 years ago

I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

osaariki commented on Show HN: Regex Derivatives (Brzozowski Derivatives) github.com/c0stya/brzozow... · Posted by u/c0nstantine

burntsushi · 3 years ago

Wow, that's really cool. I can't wait to read the paper. Have y'all written anything else about it?

> For example, when we can prove that a regex R subsumes a regex T, then an alternation R|T can be rewritten to just R, since T is already included in it.

This doesn't compose though, right? For example, if you have `sam|samwise`, then you can do that transformation, but if you have `\b(?:sam|samwise)\b` then you can't.

> but a larger class of patterns get to stay in the DFA world with derivatives.

Can you say more about this?

osaariki · 3 years ago

We have an early tool paper [1] for a previous version of the engine, but that's short and with POSIX semantics, so doesn't include a lot of the interesting stuff. The most relevant bit there is the handling of Unicode.

>This doesn't compose though, right? For example, if you have `sam|samwise`, then you can do that transformation, but if you have `\b(?:sam|samwise)\b` then you can't.

You'd get subsumption when you have something like '(?:sam)?wise|wise' and in fact this kind of subsumption due to a "nullable prefix" is the main one currently detected (because we encountered patterns that motivated it). And you're right that all these rewrites should compose regardless of context so that they can be eagerly applied in the AST constructors.

>> but a larger class of patterns get to stay in the DFA world with derivatives. >Can you say more about this?

Yeah, the easiest example I can point at is from that tool paper [1], where a subsumption-based rewrite for counted repetitions can turn an exponential blow-up into a linear one. Off the top of my head I think a pattern like '.*a[ab]{0,1000}' would have a 2^1000 blow-up when determinized into a DFA but stays linear in size with derivatives. However, that loop subsumption rule isn't valid as-is under PCRE semantics, so it still needs some work to be ported to the .NET engine.

Before we get that PLDI paper out the best resource is probably just the code under [2]. It's fairly well commented, but of course that's no substitute for a good write-up.

[1]: https://www.microsoft.com/en-us/research/uploads/prod/2019/0... [2]: https://github.com/dotnet/runtime/tree/main/src/libraries/Sy...

osaariki commented on Show HN: Regex Derivatives (Brzozowski Derivatives) github.com/c0stya/brzozow... · Posted by u/c0nstantine

burntsushi · 3 years ago

As the sibling commenter said, indeed, the regex crate does not use derivatives. If you wouldn't mind, could you share what led you to that conclusion? I'd love to fix it!

I am at least currently not aware of any "production" and "general purpose" regex engine that is built on derivatives. And I'm not really sure how you'd build one. The biggest hurdle you'd have to over come as far as I can tell is that derivatives are usually used to build a DFA. In this case, the OP does matching while taking the derivative simultaneously. My guess is that you'll run into problems doing that which huge character classes, which are easy to get when Unicode is enabled.

Whether "production" and "general purpose" are the same as "practical" is unclear. To put away the vague words, my understanding is that with derivatives, you'll either get slow match times or slow compilation times. (To the point where "slow" becomes enough to notice and be a problem.)

With that said, the world is full of experts saying you can't do something. What I'm trying to say here is that there are some challenges I've faced in the course of building a regex engine that I simply don't know how I'd address with derivatives.

Another thing worth considering here is the match semantics of the regex. I haven't had time yet to try this particular matcher, but when I do, I'd check for how alternations are matched. For example, what does 'samwise|sam' match in the haystack 'samwise'? Either answer is correct, but one is typically found in POSIX engines and the other found in Perl-like engines. Can derivatives implement either?

It's also worth noting that I am not an expert on regex derivatives. I've never actually build a derivative oriented matcher. If I had, I'm sure I could be a lot more specific with my criticisms. :-)

osaariki · 3 years ago

We built the new engine behind .NET's RegexOptions.NonBacktracking with derivatives. We will have a paper at PLDI this year on the work that went into that.

PCRE semantics was indeed the big thing that required new techniques. Basically, you have to modify the derivation function such that the derivatives model what paths through the pattern a backtracking engine would consider before stopping at the current position.

The big thing derivatives buys you is the ability to apply rewrite rules lazily during the construction. For example, when we can prove that a regex R subsumes a regex T, then an alternation R|T can be rewritten to just R, since T is already included in it. These kinds of rewrites often result in the DFA that gets constructed being minimal or close to so. Of course, you do pay somewhat for the machinery to do this, so best-case construction times suffer compared to in traditional NFA+lazy-DFA engines like RE2 or Rust's, but a larger class of patterns get to stay in the DFA world with derivatives.

I hope our work ignites a new interest in regex matching with derivatives. I believe the ability to apply these syntactic rewrites on-the-fly is really powerful and I'd love to see how far folks like you with extensive experience in optimizing regex matchers can take this.

osaariki commented on Ask HN: ML Papers to Implement · Posted by u/Heidaradar

osaariki · 3 years ago

I'd love for someone to do a good quality PyTorch enabled implementation of Sampled AlphaZero/MuZero [1]. RLLib has an AlphaZero, but it doesn't have the parallelized MCTS you really want to have and the "Sampled" part is another twist to it. It does implement a single player variant though, which I needed. This would be amazing for applying MCTS based RL to various hard combinatorial optimization problems. Case in point, AlphaTensor uses their internal implementation of Sampled AlphaZero.

An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.

[1]: https://arxiv.org/abs/2104.06303

osaariki commented on Chris Lattner on garbage collection vs. Automatic Reference Counting (2017) atp.fm/205-chris-lattner-... · Posted by u/Austin_Conlon

verdagon · 4 years ago

I develop languages full time, and it's clear to me that RC will make massive strides forward in the next decade, for a few reasons:

1. First-class regions allow us to skip a surprising amount of RC overhead. [0]

2. There are entire new fields of static analysis coming out, such as in Lobster which does something like an automatic borrow checker. [1]

3. For immutable data, a lot of options open up, such as the static lazy cloning done by HVM [2] and the in-place updates in Perceus/Koka [3]

Buckle up yall, it's gonna be a wild ride!

[0] https://verdagon.dev/blog/zero-cost-refs-regions

[1] https://aardappel.github.io/lobster/memory_management.html

[2] https://github.com/Kindelia/HVM

[3] https://www.microsoft.com/en-us/research/publication/perceus...

osaariki · 4 years ago

The in-place update work in Koka [1] is super impressive. One of my co-workers, Daan Leijen leads the Koka project and hearing his talks about it have been such a privilege. The work around Koka is really convincing me that functional languages will eventually lead the pack in the effort-to-performance trade-off.

Something that came out of the Koka project that everyone should know about is mimalloc [2]: if your built-in malloc is not doing it for you, this is the alternative allocator you should try first. Mimalloc is tiny and it has consistently great performance in concurrent environments.

[1]: https://koka-lang.github.io/koka/doc/index.html

[2]: https://github.com/microsoft/mimalloc