Every two weeks or so I peruse github looking for something like this and I have to say this looks really promising. In statistical genetics we make really big scatterplots called Manhattan plots https://en.wikipedia.org/wiki/Manhattan_plot and we have to use all this highly specialized software to visualize at different scales (for a sense of what this looks like: https://my.locuszoom.org/gwas/236887/). Excited to try this out
Hey! This sounds like a really interesting use case. If you run into any issues or need help with the visualization, please don't hesitate to post an issue on the repo. We can also think about adding an example demo of a manhattan plot to help too!
This looks very promising. I'll have to think my visualization cases against new possibilities this enables.
I have been intermittently following Rerun, a "robotics-style data visualization" app [1]. Their architecture bears certain similarities [2]. Wgpu in both, egui and imgui, Rust with Python. Rerun's stack does compile to WASM and works in browser. Use cases seem different, but somewhat the same. I don't do scientific nor robotic stuff at all, so no opinions on feasibility of either...
I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.
This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.
Nobody cares about optimization for relatively big datasets like million points, maybe it's not a very popular use case. Even libraries that do able to render these datasets, do that incorrectly e.g. skip peaks, show black rectangles instead of showing internal distribution of noisy data, etc.
I ended up with writing my own tool that's able to show millions of points and never looked back.
> powered by WGPU, a cross-platform graphics API that targets Vulkan (Linux), Metal (Mac), and DX12 (Windows).
The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.
That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?
What is GSP in this context? Searching Python GSP brings up Generalized Sequence Pattern (GSP) algorithm [1] and Graph Signal Processing [2], neither of which seem to be a protocol. I also found "Generic Signaling Protocol" and "Global Sequence Protocol" which also don't seem relevant. Forgive me if GSP is some well know thing which I am just not familiar with.
To clarify this a bit, wgpu is a Rust implementation of WebGPU, just like Dawn is a C++ implementation of WebGPU (by Google). Both projects expose a C-api following webgpu.h. wgpu-py Should eventually be able to work with both. (Disclaimer: I'm the author of wgpu-py)
I have watched recordings of your recent representation and decided to finally give it a try last week. My goal is to create some interactive network visualizations - like letting you click/box select nodes and edges to highlight subgraphs which sounds possible with the callbacks and selectors.
Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).
Hi! I've seen some of your work on wgpu-py! Definitely let us know if you need help or have ideas, if you're on the main branch we recently merged a PR that allows events to be bidirectional.
But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?
Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.
Do you have any numbers for the rough number of datapoints that can be handled? I'm curious if this enables plotting many millions of datapoints in a scatterplot for example.
Yes! The number of data points can range in the millions. Quite honestly, the quality of your GPU would be the limiting factor here. I will say, however, that for most use cases, an integrated GPU is sufficient. For reference, we have plotted upwards of 3 million points on a mid-range integrated GPU from 2017.
I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).
I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)
Fastplotlib is very different from bokeh and holoviz, and has different use cases.
Bokeh and holoviz send data to a JS front end that draws (to the best of my knowledge), whereas fastplotlib does everything on the python side and uses jupyter_rfb to send a compressed frame buffer when used in jupyter. Fastplotlib also works as a native desktop application in Qt and glfw, which is very different from bokeh/holoviz.
Fastplotlib also has higher raw render speed, you can scroll though a 4k video at 60Hz with thousands of extra objects on your desktop which I haven't ever been able to accomplish with bokeh (I haven't tried it in years, not sure if things have changed)
The events system is also very different, we try to keep the API to simple function callbacks in fastplotlib.
At the end of the day use the best tool for your use case :)
Deleted Comment
https://github.com/3b1b/manim/releases
Super awesome, and you can make it into an MCP for Cursor.
I have been intermittently following Rerun, a "robotics-style data visualization" app [1]. Their architecture bears certain similarities [2]. Wgpu in both, egui and imgui, Rust with Python. Rerun's stack does compile to WASM and works in browser. Use cases seem different, but somewhat the same. I don't do scientific nor robotic stuff at all, so no opinions on feasibility of either...
[1] https://rerun.io [2] https://github.com/rerun-io/rerun/blob/main/ARCHITECTURE.md
[1] https://stackoverflow.com/questions/73470828/ggplot2-is-slow...
Base R graphics would plot 100,000 points in about 100 milliseconds.
A quick benchmark with writing to a file:I ended up with writing my own tool that's able to show millions of points and never looked back.
The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.
That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?
jupyter-rfb lets you do remote rendering for this, render to a remote frame buffer and send over a jpeg byte stream. We and a number of our scientific users use it like this. https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...
> defining a protocol for transferring plot points
This sounds more like GSP, which Cyrille Rossant (who's made some posts here) works on, it has a slightly different kind of use case.
1. https://github.com/jacksonpradolima/gsp-py
2. https://pygsp.readthedocs.io/en/stable/
1. https://github.com/pygfx/wgpu-py
2. https://github.com/gfx-rs/wgpu-native
https://pygraphistry.readthedocs.io/en/latest/performance.ht...
Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).
But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?
Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.
I believe the performance is pretty decent, especially if you run the kernel locally
Their docs also cover this as mentioned by @clewis7 below: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...
Just to add on, colab is weird and not performant, this PR outlines our attempts to get jupyter-rfb working on colab: https://github.com/vispy/jupyter_rfb/pull/77
I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).
Certainly! A comparison of performance with specialized tools for large point clouds would be very interesting (like cloudcompare and potree).
I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)
[1] https://holoviz.org/
Bokeh and holoviz send data to a JS front end that draws (to the best of my knowledge), whereas fastplotlib does everything on the python side and uses jupyter_rfb to send a compressed frame buffer when used in jupyter. Fastplotlib also works as a native desktop application in Qt and glfw, which is very different from bokeh/holoviz. Fastplotlib also has higher raw render speed, you can scroll though a 4k video at 60Hz with thousands of extra objects on your desktop which I haven't ever been able to accomplish with bokeh (I haven't tried it in years, not sure if things have changed)
The events system is also very different, we try to keep the API to simple function callbacks in fastplotlib.
At the end of the day use the best tool for your use case :)