mattalex (u/mattalex)

mattalex commented on Microsoft Favors Anthropic over OpenAI for Visual Studio Code theverge.com/report/77864... · Posted by u/corvad

cwyers · 3 months ago

This is Microsoft subsidizing Claude inference costs -- if you look at how they charge models against your allotment, Gemini, GPT-5 and Claude 4 Sonnet all cost the same, despite Claude 4 Sonnet being more expensive than the other two. Not really sure I understand the economics here, especially since there's not really a clear winner between GPT-5 and Claude 4 Sonnet for coding (if anything I think GPT-5 puts up a better showing).

mattalex · 3 months ago

It might be that they pay less for anthropic depending how many tokens are generated by each model: total cost is token cost times number of tokens. I haven't checked gpt5, but it is not impossible that price wise they might be very comparable if you account for reasoning tokens used.

mattalex commented on Into CPS, Never to Return bernsteinbear.com/blog/cp... · Posted by u/g0xA52A2A

upghost · a year ago

In case anyone is wondering, "when would I EVER use this (in hand-written code)?", it's a trick that makes DSL (domain specific language) and small language implementation much easier. A great reference for this is Peter Norvig's Paradigms of Artificial Intelligence Programming, when he implements a subset of Prolog and bolsters the functionality using CPS[1].

The second, although more obscure, is that you can use it in languages that do not have "non-local exits" to terminate a deeply nested computation early or return to an earlier point in the call stack. For example, Clojure does not have nonlocal exits, as only the final form of the function is returned. However, using CPS, you can terminate the expression early and return to the original caller without executing the rest of the function. You probably only want to use this in specialized cases though or it may upset your team, they are tricky to debug.

Lastly and probably most controversially, you can make an extensible "if" statement using CPS if you are in a dynamic language and you have no other tools to do so. Admittedly I do sometimes use this in ClojureScript. This allows you to write "append only" code without continually growing the "cond". Again, most teams don't like this, so it depends on the circumstances, but might be nice to have in your back pocket if you need it.

[1]: https://github.com/norvig/paip-lisp/blob/main/docs/chapter12...

mattalex · a year ago

This is essentially the principle behind algebraic effects (which, in practice, do get implemented as delimited continuations):

When you have an impure effect (e.g. check a database, generate a random number, write to a file, nondeterministic choices,...), instead of directly implementing the impure action, you instead have a symbol e.g "read", "generate number", ...

When executing the function, you also provide a context of "interpreters" that map the symbol to whatever action you want. This is very useful, since the actual business logic can be analyzed in an isolated way. For instance, if you want to test your application you can use a dummy interpreter for "check database" that returns whatever values you need for testing, but without needing to go to an actual SQL database. It also allows you to switch backends rather easily: If your database uses the symbols "read", "write", "delete" then you just need to implement those calls in your backend. If you want to formally prove properties of your code, you can also do that by noting the properties of your symbols, e.g. `∀ key. read (delete key) = None`.

Since you always capture the symbol using an interpreter, you can also do fancy things like dynamically overriding the interpreter: To implement a seeded random number generator, you can have an interpreter that always overrides itself using the new seed. The interpreter would look something like this

```

Pseudorandom_interpreter(seed)(argument, continuation):

  rnd, new_seed <- generate_pseudorandom(seed, argument)
  with Pseudorandom_interpreter(new_seed):
       continuation(rnd)

```

You can clearly see the continuation passing style and the power of self-overriding your own interpreter. In fact, this is a nice way of handeling state in a pure way: Just put something other than new_seed into the new interpreter.

If you want to debug a state machine, you can use an interpreter like this

``` replace_state_interpreter(state)(new_state, continuation):

  with replace_state_interpreter(new_state ++ state):
       continuation(head state)

```

To trace the state. This way the "state" always holds the entire history of state changes, which can be very nice for debugging. During deployment, you can then replace use a different interpreter

```

replace_state_interpreter(state)(new_state, continuation):

  with replace_state_interpreter(new_state):
       continuation(state)

```

which just holds the current state.

mattalex commented on A DSL for peephole transformation rules of integer operations in the PyPy JIT pypy.org/posts/2024/10/ji... · Posted by u/todsacerdoti

JonChesterfield · a year ago

A cost model would give a proof of termination - require the output of each transform to be cheaper than the input. Confluence is less obvious.

mattalex · a year ago

Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.

mattalex commented on AI engineers claim new algorithm reduces AI power consumption by 95% tomshardware.com/tech-ind... · Posted by u/ferriswil

nelup20 · a year ago

AMD's ROCm just isn't there yet compared to Nvidia's CUDA. I tried it on Linux with my AMD GPU and couldn't get things working. AFAIK on Windows it's even worse.

mattalex · a year ago

That entirely depends on what AMD device you look at: gaming GPUs are not well supported, but their instinct line of accelerators works just as well as cuda. keep in mind that, in contrast to Nvidia, AMD uses different architectures for compute and gaming (though they are changing that in the next generation)

mattalex commented on Sony shutting down Concord, refunds after 2 week launch. 8 year dev, 25k sales arstechnica.com/gaming/20... · Posted by u/IronWolve

gigaflop · a year ago

People can only spend so much time on a Live Service style game. They aim to be "The game you log into daily", but usually only kids have the kind of free time to grind these out week after week, let alone keep up with multiple.

Then, each have their own $10ish Battlepass, and you need to grind to get to the end of it. Aside from a new map or character, these are the bulk of 'new stuff' that gets added.

Gaming as a Service doesn't scale well on most people who can afford to whale out, once they've already found their slot machine.

mattalex · a year ago

To expand on that: there's also the issue that these games have to be (somewhat) competitive multiplayer games: multiplayer because otherwise there's no way to create enough content, and competitive since otherwise there's less of a reason to play the game for long periods of time.

If you've ever played a dead/dying competitive game as a newcomer you will know the problem this creates: since the people that stay around are either new or very dedicated players, the skill gap becomes gigantic, which turns of most new players.

if your game wins the Life-Service race, you draw other players in. If your game dies the very same structure that keep players around will prevent new players from joining.

mattalex commented on Iron as an inexpensive storage medium for hydrogen ethz.ch/en/news-and-event... · Posted by u/bornelsewhere

credit_guy · a year ago

This is a pretty elegant idea. It takes 826 kJ to split a mole of iron oxide (Fe2O3) and it takes 855 kJ to split 3 moles of water (H2O). So if you take H2 and blow over one mole of Fe2O3 you can strip the O3 for the cost of 826 kJ but then by burning the hydrogen in oxygen you get 855 kJ, for a net exothermic effect of 29 kJ, which is a rounding error. The opposite reaction requires 29 kJ, again negligible, there are probably bigger energy losses bringing the reactant mass at the required temperature (400 degrees C).

Unfortunately, I don't see this making any sense for large scale energy storage. Storage tanks for compressed hydrogen enjoy the square-cube law. The larger they are the less expensive they are proportional to the mass of hydrogen they hold.

With this iron oxide method, you need 27 tons of iron oxide for one ton of hydrogen. You can procure right now tanks that can hold 2.7 tons of hydrogen and weigh 77 tons empty [1], the ratio is 28 to 1. But the round-trip efficiency of the tank is virtually 100%. The efficiency of the iron-based storage is only 50%. The tanks are not very expensive.

I can't see the niche that this idea can apply to.

[1] https://www.iberdrola.com/press-room/news/detail/storage-tan...

mattalex · a year ago

There are alternatives to iron that have higher efficiency and lower prices. For instance https://hydrogenious.net/ does exactly that but with benzene like structures. The advantage of this is that you can reuse existing infrastructure for transport and you have higher transport efficiency: while the square cube law exist, the same thing holds for the forces on the chamber walls which have to increase in thickness. Hydrogen tanks are also very expensive as they have to be manufactured to tight tolerances (and they need to be replaced rate often due to hydrogen creep weakening chamber walls)

mattalex commented on Encyclopedia of Optimization link.springer.com/referen... · Posted by u/egorpv

wakawaka28 · a year ago

Can you send me some of these results? I am pretty skeptical of such dramatic algorithmic improvements.

I don't think the point of an encyclopedia is to cover every single topic, as nice as that would be. If you're in the market for an encyclopedia, you are probably looking for a starting point, survey, or summary of stuff that's good to know. The algorithms you're thinking of are probably in very dry papers and monographs, accessible only to experts. If you were writing a commercial-grade generic MINLP solver, you would surely be looking at the latest papers for ideas, or you simply won't be competitive with existing solvers.

mattalex · a year ago

The paper I have mentioned can be found here https://arxiv.org/pdf/2206.09787

There are so many things that have only been invented in the last couple of years like RINS, MCF cuts, conflict analysis, symmetry detection, dynamic search,... (see e.g. Tobias Achterberg's line of work).

On the other hand, hardware improvements were not as relevant for LP and MILP solvers as one would expect: For instance, as of now there is still no solver that really uses GPU compute (though people are working on that). The reason is that parallelization of simplex solvers is quite though since the algorithm is inherently sequential (it's a descend over simplex vertices) and the actual linear algebra is very sparse (if not entirely matrix free). You can do some things like lookahead for better pricing or row/column generation approaches, but you have to be very careful in that (interior point methods are arguably nicer to parallelize but in many cases have a penalty in performance compared to simplex).

MILP/MINLP solvers are much nicer to parallelize at first glance since you can parallelize across branches in the branch-and-bound, but in practice that is also pretty hard: Moderns solvers are so efficient that it can easily happen that you spend a lot of compute exploring a branch that is quickly proven to be unncessary to explore by a different branch (e.g. SCIP, the fastest open-source MINLP solver is completely single threaded and still _somewhat_ competetive). This means that a lot of the algorithmic improvements are hidden inside the parallelization improvements. I.e. a lot of time has been spent on the question of "What do we have to do to parallelize the solver without just wasting the additional threads".

mattalex commented on Encyclopedia of Optimization link.springer.com/referen... · Posted by u/egorpv

wakawaka28 · a year ago

That's not such a long time for math. There have not been so many innovations in the field since then IMO. Mainly the benchmarks might not be as meaningful, and GPU techniques won't be a big part of that book due to its age.

mattalex · a year ago

2008 is ancient for optimization!

People have tested old year 2000 lp and milp solvers against recent ones while correcting for hardware. Hardware improvements made up ~20x improvement, while lp solvers in general sped up 180x. MILP solvers speed up a full 1000x (Progress in mathematical programming solvers from 2001 to 2020).

Solvers from 2008 are entirely different levels of performance: there are many problems that are unsolvable by those that are solved to zero duality gap in less than a second by more modern solvers.

In MINLPs the difference is even more standing. This doesn't mean that those books are useless (they are quite good), but do not expect a solver based on those techniques to even play in the same league as modern solvers.

mattalex commented on Well-known paradox of R-squared is still buggin me statmodeling.stat.columbi... · Posted by u/luu

carlob · a year ago

I don't think it's about not knowing the abs function, more about the fact that the first derivative would be discontinuous and the second doesn't exist in 0. Variance had much nicer properties mathematically than absolute deviation.

mattalex · a year ago

You can solve L1 regression using linear programming at fantastically large scales. In fact in many applications you do the opposite: go from squared to absolute because the latter fits into in lp

mattalex commented on German state moving 30k PCs to LibreOffice blog.documentfoundation.o... · Posted by u/buovjaga

sgift · 2 years ago

The link is just about a move from one part of Munich to another (MS German HQ has always been in Munich). From what I remember it was more that we had a coalition with the CSU for four years and unsurprisingly, the moment conservatives were part of the cities government all went back to "help the companies".

(MS had tried to pressure against the move from the start, but wasn't really successful in the first years)

mattalex · 2 years ago

The problem was mostly that the only guy that was really backing the project (Christian Ude, SPD), was replaced with his successor (Dieter Reiter, SPD) who just didn't have the drive necessary to maintain the project.

The entire design of "LiMux" was doomed from the start: it was a highly customized version of Ubuntu that was only used in Munich (not even throughout the entire state). That made everything ridiculously expensive since the actual advantages of building on an open source solution was never realized. That is combined with the fact that "open source" and "cost savings" were used interchangeably when in reality the budget for Windows should have been pre-allocated into development, rather than cut.

The entire project was half-assed to begin with, which basically meant that Windows and Linux had to coexist since many crucial tools were never ported to Linux.

The "Microsoft killed it" story sounds realistic, but the truth is the much more boring incompetence in execution.