There are some things that I've been interested in asking for in a high level scientific computing library. If you're planning on continuing your visualization library can you please come up with some solution for layout specification? Whenever I'm plotting something and I spend 30 minutes getting all of the data in order the last thing I want to do is fight with the plotting library's label positions because they overlap. Or if I say "Let me take this plot, add some more stacked subplots, and show different catagories" I don't want my labels to be perfect but my scatters to be given a 10x10 pixel box to draw into.
On the HP/numerical computing side of things have you looked into implicit GPU operation types? Something that would let you queue up operations that can be run on a parallel computing system. Basically describe complex operations with the high-level object's normal operations. The objects aren't actually calculating anything, they just organize a GPU kernel in the background. As the final stage you can turn the
gpumat a(3, 5);
gpumat b(5, 3);
gpumat gpu_op_queue = (a * b) + (a * b) * 5;
function(a, b) operation = gpu_op_queue.compile();
mat output = operation(some_3x5, some_5x3);
In the backend you'd hopefully be able to great your own types like 'cpumat', 'computerclustermat', or 'gpuclustermat'.If you had some easy way to generically express extremely parallel numerical operations, an abstract way of implementing high-performance back-ends that take those operations and compile them to GPU kernels, and a visualization engine that doesn't feel like it's from the 80s then your library will really take off.
Personally I feel GPU-optimization and fighting with visualization libraries are the two biggest pain points in scientific computing.
I am very unlikely to take on visualization. I don't acutely need it for what I do, and I am some-but-not-nearly-enough interested in visualization for its own sake. I started to read about the grammar of graphics stuff at one point and decided it was too far down the rabbit hole.
I have looked more into gpu stuff, and agree specifying a compute graph (and then implicitly optimizing it) is more likely to be the future. FWIW, this is basically what XLA (from TensorFlow) and whatever it was FB announced on Friday are doing.
I wrote my thoughts up recently on the Breeze mailing list here: https://groups.google.com/forum/#!topic/scala-breeze/_hEFpnI...
I'm starting to think it through but I'm not sure I have time for that either :(. A 4-month old and a startup take up a lot of time.
It happens to depend on Breeze. I would point out that Breeze does not support n-dimensional arrays (most tensors), although that is necessary to do in deep learning.
We wrote ND4S and ScalNet to solve that:
https://github.com/deeplearning4j/nd4s
https://github.com/deeplearning4j/scalnet
Moving computation out of Spark's MLlib and into lower level code like C++, as we do with JavaCPP and libnd4j, also improves speed.
Breeze does a large chunk of (dense) compute via netlib-java, which calls out to "real" lapack if you set it up. Are things really faster than that? Or are you referring to the non BLAS/non Lapack things?
What benefits does this provide over existing python or matlab code?
> Scientific Computing, Machine Learning, and Natural Language Processing
These seem to be three very different problems. Is there a reason why the group of them is called "ScalaNLP"? If the libraries are generic enough then shouldn't other uses be possible/supported?
Is there a reason this doesn't have a generic name similar to the SciPy stack?
Breeze has breeze-viz, which is very basic but at the time there wasn't anything else. I highly endorse using something else. I personally like http://sameersingh.org/scalaplot/
They're under the same aegis basically because they're all mine. ScalaNLP started out as really being just NLP, but it scope-crept. That said, Epic is a library for structured prediction first and foremost, and one of the main applications of structured prediction is NLP.
Breeze is basically like SciPy and large chunks of it power Epic. It's really the only thing that doesn't belong in the namespace.
Not sure why this is on the front page of HN, but I'm happy to answer any questions.
I'm not really giving these libraries the love they need these days. I mostly started them in grad school before the deep learning revolution really hit my subfield (NLP), and I haven't had time to modernize them. They still have their uses, especially Breeze, which is used in Spark's MLLib and directly by a number of companies.
Some of the more popular functional languages have advanced type systems, but not all functional languages are statically typed. Lisp, the granddaddy of all FP languages, is dynamically typed. So is Elixir.
Can we stop conflating FP with static typing?
Semantic Machines is developing technology to power the next generation of conversational artificial intelligence: AIs that you can actually have a conversation with. Think Google Assistant or Alexa or Siri, but without having to carefully craft commands like you're talking to a Bash shell.
Our team has built much of the core technology underlying Siri and Google Now, and our founders (including both the former Chief Speech Scientist for Siri and the head of UC Berkeley's Natural Language Processing group) have multiple >$100 million exits under their belt.
We're looking to hire a few talented software engineers and machine learning engineers to help build out technology, by expanding our core NLP infrastructure, data processing pipelines, neural net clusters, and backend services.
Experience with natural language processing systems is a plus, as is experience with the JVM (especially Scala), but we're mainly interested in passionate engineers who can learn quickly and work effectively in complex systems.
Please reach out to me directly or email info@semanticmachines.com. Thanks!
http://psych.nyu.edu/clash/dp_papers/Ding_nn2015.pdf
Also, for more background on the idea of the "Universal Grammar":
This doesn't really speak to UG.
First, you can believe in the structures they purport to show without accepting the existence of UG, by appealing to the existence of general mechanisms in the brain for assembling hierarchical structures, which is equally validated by this experiment.
Second, they looked at two languages with sentences of up to ~7 syllables each with at most two constituents (Noun Phrase Verb Phrase). You can't show any evidence for any hierarchy of interest in 7 syllables. They demonstrated that phrases exist and phrase boundaries exist, but it's entirely possible to have "flat' grammars without interesting hierarchy, especially in simple sentences. If they want to show interesting hierarchy, they should conduct experiments with more interesting structure (say, some internal PPs and some limited center embedding) and show something that correlates with multiple levels of the "stack" getting popped, or something.
It's still interesting work, but as usual oversold by the university press office.
Foundation models like ChatGPT, PaLM, and Stable Diffusion are transforming the world around us. The Stanford Center for Research on Foundation Models (CRFM; https://crfm.stanford.edu/), which is part of Stanford HAI, is an interdisciplinary initiative that aims to make foundation models more reliable, transparent and accessible to the world. We take on ambitious projects that seek to rigorously evaluate existing foundation models and to build new ones.
We are currently seeking a research engineer to join our engineering team. This is an unique opportunity to work with seasoned engineers who have spent many years in industry as well as PhD students, post-docs, and faculty at CRFM. You will contribute to cutting-edge research, publish papers, gain access to the latest foundation models, and be immersed in the vibrant CRFM community.
You will work on our open source software projects, including:
For more information or to apply, please go to https://careersearch.stanford.edu/jobs/research-engineer-213...