Readit News logoReadit News
CreRecombinase · 9 months ago
Every two weeks or so I peruse github looking for something like this and I have to say this looks really promising. In statistical genetics we make really big scatterplots called Manhattan plots https://en.wikipedia.org/wiki/Manhattan_plot and we have to use all this highly specialized software to visualize at different scales (for a sense of what this looks like: https://my.locuszoom.org/gwas/236887/). Excited to try this out
clewis7 · 9 months ago
Hey! This sounds like a really interesting use case. If you run into any issues or need help with the visualization, please don't hesitate to post an issue on the repo. We can also think about adding an example demo of a manhattan plot to help too!

Deleted Comment

j_bum · 9 months ago
If you’re working in R with ggplot2, you could also consider the `ggrastr` package, specifically, `ggrastr::geom_point_rast`
swalsh · 9 months ago
These really large scatterplots are also useful for visualizing claims, and finding fraud.
samstave · 9 months ago
Have you tried ManimGL?

https://github.com/3b1b/manim/releases

Super awesome, and you can make it into an MCP for Cursor.

jarpineh · 9 months ago
This looks very promising. I'll have to think my visualization cases against new possibilities this enables.

I have been intermittently following Rerun, a "robotics-style data visualization" app [1]. Their architecture bears certain similarities [2]. Wgpu in both, egui and imgui, Rust with Python. Rerun's stack does compile to WASM and works in browser. Use cases seem different, but somewhat the same. I don't do scientific nor robotic stuff at all, so no opinions on feasibility of either...

[1] https://rerun.io [2] https://github.com/rerun-io/rerun/blob/main/ARCHITECTURE.md

dcl · 9 months ago
I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.
sieste · 9 months ago
This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.

[1] https://stackoverflow.com/questions/73470828/ggplot2-is-slow...

kkoncevicius · 9 months ago
From R side i think this is mainly because ggplot2 is really really slow.

Base R graphics would plot 100,000 points in about 100 milliseconds.

    x <- rnorm(100000)
    plot(x)
A quick benchmark with writing to a file:

    x <- rnorm(100000)
    system.time({
      png("file.png")
      plot(x)
      dev.off()
    })

     user  system elapsed
    0.179   0.002   0.180

stackedinserter · 9 months ago
Nobody cares about optimization for relatively big datasets like million points, maybe it's not a very popular use case. Even libraries that do able to render these datasets, do that incorrectly e.g. skip peaks, show black rectangles instead of showing internal distribution of noisy data, etc.

I ended up with writing my own tool that's able to show millions of points and never looked back.

zoogeny · 9 months ago
> powered by WGPU, a cross-platform graphics API that targets Vulkan (Linux), Metal (Mac), and DX12 (Windows).

The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.

That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?

kushalkolar · 9 months ago
> the data is available on a machine in a cluster rather than on the local machine of a user

jupyter-rfb lets you do remote rendering for this, render to a remote frame buffer and send over a jpeg byte stream. We and a number of our scientific users use it like this. https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...

> defining a protocol for transferring plot points

This sounds more like GSP, which Cyrille Rossant (who's made some posts here) works on, it has a slightly different kind of use case.

zoogeny · 9 months ago
What is GSP in this context? Searching Python GSP brings up Generalized Sequence Pattern (GSP) algorithm [1] and Graph Signal Processing [2], neither of which seem to be a protocol. I also found "Generic Signaling Protocol" and "Global Sequence Protocol" which also don't seem relevant. Forgive me if GSP is some well know thing which I am just not familiar with.

1. https://github.com/jacksonpradolima/gsp-py

2. https://pygsp.readthedocs.io/en/stable/

mkl · 9 months ago
WGPU is a Rust thing more than a Python thing.
almarklein · 9 months ago
To clarify this a bit, wgpu is a Rust implementation of WebGPU, just like Dawn is a C++ implementation of WebGPU (by Google). Both projects expose a C-api following webgpu.h. wgpu-py Should eventually be able to work with both. (Disclaimer: I'm the author of wgpu-py)
zoogeny · 9 months ago
Fair, I was looking at the wgpu-py [1] page but only skimmed it. It does indeed look like a wrapper over wgpu-native [2] which is written in Rust.

1. https://github.com/pygfx/wgpu-py

2. https://github.com/gfx-rs/wgpu-native

Swannie · 9 months ago
What you describe sounds a bit like Graphistry:

https://pygraphistry.readthedocs.io/en/latest/performance.ht...

Vipitis · 9 months ago
I have watched recordings of your recent representation and decided to finally give it a try last week. My goal is to create some interactive network visualizations - like letting you click/box select nodes and edges to highlight subgraphs which sounds possible with the callbacks and selectors.

Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).

kushalkolar · 9 months ago
Hi! I've seen some of your work on wgpu-py! Definitely let us know if you need help or have ideas, if you're on the main branch we recently merged a PR that allows events to be bidirectional.
crazygringo · 9 months ago
Sounds really compelling.

But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?

Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.

ivoflipse · 9 months ago
Fastplotlib definitely works in Jupyterlab through jupyter-rfb https://github.com/vispy/jupyter_rfb

I believe the performance is pretty decent, especially if you run the kernel locally

Their docs also cover this as mentioned by @clewis7 below: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...

kushalkolar · 9 months ago
Thanks Ivo!

Just to add on, colab is weird and not performant, this PR outlines our attempts to get jupyter-rfb working on colab: https://github.com/vispy/jupyter_rfb/pull/77

clewis7 · 9 months ago
Thanks Ivo!
theLiminator · 9 months ago
Do you have any numbers for the rough number of datapoints that can be handled? I'm curious if this enables plotting many millions of datapoints in a scatterplot for example.
clewis7 · 9 months ago
Yes! The number of data points can range in the millions. Quite honestly, the quality of your GPU would be the limiting factor here. I will say, however, that for most use cases, an integrated GPU is sufficient. For reference, we have plotted upwards of 3 million points on a mid-range integrated GPU from 2017.

I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

enriquto · 9 months ago
>I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

Certainly! A comparison of performance with specialized tools for large point clouds would be very interesting (like cloudcompare and potree).

wodenokoto · 9 months ago
How is it compared to HoloViz?[1]

I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)

[1] https://holoviz.org/

kushalkolar · 9 months ago
Fastplotlib is very different from bokeh and holoviz, and has different use cases.

Bokeh and holoviz send data to a JS front end that draws (to the best of my knowledge), whereas fastplotlib does everything on the python side and uses jupyter_rfb to send a compressed frame buffer when used in jupyter. Fastplotlib also works as a native desktop application in Qt and glfw, which is very different from bokeh/holoviz. Fastplotlib also has higher raw render speed, you can scroll though a 4k video at 60Hz with thousands of extra objects on your desktop which I haven't ever been able to accomplish with bokeh (I haven't tried it in years, not sure if things have changed)

The events system is also very different, we try to keep the API to simple function callbacks in fastplotlib.

At the end of the day use the best tool for your use case :)

almarklein · 9 months ago
One big difference is that Fastplotlib is based on GPU tech, so its capable of rendering much larger datasets interactively.
unnah · 9 months ago
How much larger? Holoviz includes the datashader library for GPU-based rendering, and here is an example with 10 million points: https://examples.holoviz.org/gallery/nyc_taxi/nyc_taxi.html