efferifick (u/efferifick)

efferifick commented on Catgrad: A categorical deep learning compiler catgrad.com/... · Posted by u/remexre

statusfailed · 7 months ago

(author here) the goals of catgrad are a bit different to JAX - first of all, the "autodiff" algorithm is really a general approach to composing optics, of which autodiff is just a special case. Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm. Second, catgrad itself is more like an IR right now - we're using it at hellas.ai to serve as a serverless runtime for models.

More philosophically, the motto is "write programs as morphisms directly". Rather than writing a term in some type theory which you then (maybe) give a categorical semantics, why not just work directly in a category?

Long term, the goal is to have a compiler which is a stack of categories with functors as compiler passes. The idea being that in contrast to typical compilers where you are "stuck" at a given abstraction level, this would allow you to view your code at various levels of abstractions. So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.

efferifick · 7 months ago

> Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm.

Hello, as a compiler engineer I am interested in this area. Can you expand a little bit more? How would I be able to plug in my own language for example?

> So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.

So, what you are saying is that catgrad alllows me to write a program and then also plug in a compiler pass? I.e., the application author can also be the compiler developer?

efferifick commented on Bend: a high-level language that runs on GPUs (via HVM2) github.com/HigherOrderCO/... · Posted by u/LightMachine

efferifick · a year ago

Thank you for sharing!

efferifick commented on Dataflow Analyses and Compiler Optimizations That Use Them, for Free blog.regehr.org/archives/... · Posted by u/ingve

electricships · a year ago

Are current compiler optimisations limited by algorithm or by compute?

would a x1000 compute cluster provide a meaningful performance boost ( of the generated binaries)?

efferifick · a year ago

Damn, I am being nerd-sniped here :)

One thing is that you can think of static analysis as building facts about the program. You can for example start by assuming nothing and then adding facts about the program. And you need to iteratively propagate these facts from one line of code to the next. But you can also start by assuming the universe and removing facts from this universe.

Some classes of program analysis are safe to stop early. For example, if I have a static analysis that tries to find the target of virtual calls (also known as a devirtualization), you can stop early after a time out. Not finding the target just implies a missed optimization.

There are some other classes of program analysis that are not safe until the algorithm finishes. For example, if you have to prove that two variables do not alias each other, you cannot stop until you have all possible points-to sets and verify that for each of those two variables, their points-to sets do not overlap.

So, given the above restriction, the first class (early termination) is perhaps more desirable and throwing more compute time would yield a better approximation. For the second one, it wouldn't.

Another thing to keep in mind is that most of these data flow frameworks are not easily parallelized. The only paper I've read (but I haven't kept up with these avenue of research) that implemented a control flow analysis in the GPU is the following:

* Prabhu, Tarun, et al. "EigenCFA: Accelerating flow analysis with GPUs." ACM SIGPLAN Notices 46.1 (2011): 511-522.

I'm sure people are working on it. (I should mention that there are some program analyses written in Datalog and Datalog can be parallelized, but I think this is a processor based parallelization and not a GPU one).

The third thing is that when you say whether we are limited by algorithms or compute, I think it is important to note that it is impossible to find all possible facts *precisely* about a program without running it. There is some relation between static program analysis and the halting problem. We want to be able to guarantee termination of our static program analysis and some facts are just unobtainable without running. However, there is not just static program analyses, but also dynamic program analyses which can analyze a program as it is running. An example of a dynamic program analysis can be value profiling. Imagine that you have a conditional and for 99% of the time, the conditional is false. With a virtual machine, you can add some instrumentation to find out the probability distribution of this conditional and then generate code without this condition, optimize the code, and only if the condition is false then run a less optimized version of the code with an additional penalty. Some virtual machines already do this for types and values. Type profiling and value profiling.

One last thing, when you say a meaningful performance boost, it depends on your code. If your code can be folded away completely at compile time, then yes, we could just generate the solution at compile time and that's it. But if it doesn't or parts of it cannot be folded away / the facts cannot be used to optimize the code, then no matter how much you search, you cannot optimize it statically.

Compilers are awesome :)

As an addendum, it might be desirable in the future to have a repository of analyzed code. Compilers right now are re-analyzing code on every single compile and not sharing their results across the web. It is a fantasy of mine to have a repository that maps some code with equivalent representations and every time one does a local compile it explores a new area and adds it to the repository. Essentially, each time you compile the code, it explores new potential optimizations and all of them get stored online.

efferifick commented on Dataflow Analyses and Compiler Optimizations That Use Them, for Free blog.regehr.org/archives/... · Posted by u/ingve

VHRanger · a year ago

I'm not sure if those priorities are in the right place.

For example, clang compilation times have slowed down something like 5-6x faster than the code they generate.

As a developer I can come up with better structure in my code much faster if I can try more things out in a day.

Now we want to add LLMs of all things in there? I'm not looking forward to code taking one or two orders of magnitude longer to compile

efferifick · a year ago

I am not sure why you mention LLM, the post mentions LLVM. Second, you can have different optimization options with different tradeoffs in compile-time and run-time. Third, even if the default and only options distributed with clang are too compile-time intensive, the good news is that this is open source and you could argue against it, or fork and maintain your own compile-time run-time tradeoff compiler along with other people who also want that behaviour. I don't see any benefit of arguing against research. Having new techniques to improve compilers is not a zero sum game.

efferifick commented on Dataflow Analyses and Compiler Optimizations That Use Them, for Free blog.regehr.org/archives/... · Posted by u/ingve

efferifick · a year ago

This post and the Hydra paper reminds me a lot of Ruler and Enumo.

  * Nandi, Chandrakana, et al. "Rewrite rule inference using equality saturation." Proceedings of the ACM on Programming Languages 5.OOPSLA (2021): 1-28.
  * Pal, Anjali, et al. "Equality Saturation Theory Exploration à la Carte." Proceedings of the ACM on Programming Languages 7.OOPSLA2 (2023): 1034-1062.

I will need to read more about both of these techniques along with Synthesizing Abstract Transformers.

Thanks for sharing! Really exciting stuff!

efferifick commented on Dataflow Analyses and Compiler Optimizations That Use Them, for Free blog.regehr.org/archives/... · Posted by u/ingve

thaliaarchi · a year ago

> Out in the literature we can find a huge number of dataflow analyses, some of which are useful to optimize some kinds of code — but it’s hard to know which ones to actually implement. […] First, implementing the analysis itself, which requires creating an abstract version of each instruction in the compiler’s IR: these are called dataflow transfer functions. For example, to implement the addition operation for integer ranges, we can use [lo1, hi1] + [lo2, hi2] = [lo1 + lo2, hi1 + hi2] as the transfer function.

What papers would you recommend for learning implementing dataflow analysis? For example, foundational or tutorial papers.

efferifick · a year ago

The order goes from simpler to more complex data flow analysis frameworks. These frameworks allow you to encode dataflow problems and solve them.

  * Kam, John B., and Jeffrey D. Ullman. "Monotone data flow analysis frameworks." Acta informatica 7.3 (1977): 305-317.
  * Reps, Thomas, Susan Horwitz, and Mooly Sagiv. "Precise interprocedural dataflow analysis via graph reachability." Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1995.
  * Sagiv, Mooly, Thomas Reps, and Susan Horwitz. "Precise interprocedural dataflow analysis with applications to constant propagation." TAPSOFT'95: Theory and Practice of Software Development: 6th International Joint Conference CAAP/FASE Aarhus, Denmark, May 22–26, 1995 Proceedings 20. Springer Berlin Heidelberg, 1995.
  * Reps, Thomas, et al. "Weighted pushdown systems and their application to interprocedural dataflow analysis." Science of Computer Programming 58.1-2 (2005): 206-263.
  * Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29.

Other areas that may be interesting to look at:

  * Points-to Analysis
  * Abstract Interpretation
  * On demand dataflow analyses

efferifick commented on Coz: Causal Profiling github.com/plasma-umass/c... · Posted by u/keepamovin

efferifick · a year ago

I've been a big fan of Emery's research. Coz is one tool that I am always wanting to use, but I haven't had the chance to do so.

Check his other research. Some of it is highly accessible via youtube videos. I recommend watching / reading:

  * Stabilizer
  * Mesh
  * Scalene

efferifick commented on Find Legal Moves in Brass Birmingham with Datalog blog.pzakrzewski.com/find... · Posted by u/gieksosz

monlockandkey · 2 years ago

Is there an equivalent Python library to model these rules?

efferifick · 2 years ago

I recommend `egglog` which is Datalog + Equality Saturation. It has python bindings and has allowed me to optimize programs in a custom programming language.

https://egg-smol-python.readthedocs.io/en/latest/

efferifick commented on Ask HN: What do you want to see in a systems programming language? · Posted by u/rpnx

efferifick · 2 years ago

I will be in the minority here, but:

Native integration with Datalog.

Many times, I find myself working on a program and I realize that what I need is a database. But having a database, even sqlite3 or Berkely DB, would be an overkill. If I could just express my data and the relationships between them, then I would be able to query what I need in an efficient way.

efferifick commented on Python Is a Compiled Language eddieantonio.ca/blog/2023... · Posted by u/mpweiher

efferifick · 2 years ago

I think a lot of people in the comments are hung up on defining compiler as "taking a source language and producing a binary". I personally know Eddie and I agree with his points. (Even though his title is a bit provocative and contradicts one of the points in the article "A language is not inherently compiled or interpreted; whether a language is compiled or interpreted (or both!) is an implementation detail.")

I perhaps have not had a long professional life working with compilers (5+ years), but to me the definition of "compiles to binary" is too restrictive. The main things I care for in my work are:

1. To be able to perform some sort of static analysis on the program 2. To be able to transform the program representation

To other commenters: in Python, we have two program representations. The human readable string representation and the bytecode representation. The syntactical errors are a kind of static analysis. To me, the maps between the Python string representation and the bytecode representation and the classes of errors we can catch without running the program is far more interesting than pigeon-holing Python in the "compiled" or "interpreted" hole.