Memray – A Memory Profiler for Python

For those Python programmers out there, if you don't mind sharing your experiences, do you spend any time at all on the REPL? If so, what fraction of the time, approximately? Using IPython, JupyterLab, or something else? Or do you just run it directly from an VS Code or PyCharm? Anything you may want to add about your routine would be appreciated.

Oh, if you (experienced programmer or not) happen to know about a good site or YouTube channel to see Python programmers in action (as opposed to tutorials), please share.

Thanks in advance, and apologies for the digression.

morkalork · 2 years ago

I use Jupyter and vscode together. I'll be write new snippets of code in Jupyter then move it into .py stand-alone files when I'm happy it. I'll use vscode to work on already established code. The extension for auto reloading in Jupyter is super helpful.

Notebooks are just plain awesome. Whenever I use a new api or service, I'll make a notebook with cells showing how to call/run each operation and commit as a sort of executable documentation.

SushiHippie · 2 years ago

I'm probably not the average python programmer.

But I normally just create two terminals (I have a tiling window manager) and in one I open a python file under /tmp/ write my code and execute it in the other terminal.

I would probably use a REPL if it was integrated in my favorite editor ( https://helix-editor.com ). But everything else I tried was to "clunky" for me.

Though I work with data scientists, and they love to do everything inside jupyterlab.

benrutter · 2 years ago

A its spooky reading someone with basically the exact same workflow as me!

I use helix in the terminal, regularly opening up a split pane in tmux to either breakpoint in, or test out bits of code interactively. I'm not quite as organized as having two regular panes- I'll close and open them pretty quickly. Often just to try some toy example of reorganising a duct or something before writing it out into code.

rand_r · 2 years ago

I use “breakpoint()” a lot for debugging. It’s by far my #1 tool for figuring out why something isn’t working. Recommend you learn how to use it.

Antitoxic6185 · 2 years ago

Thank you so much. Your comment helped me a lot. I wish a lot more people knew about this.

memco · 2 years ago

I usually develop locally with the vscode debugger. On staging servers I often do remote vscode sessions also. On production I often use the REPL since I don’t want to install additional tools, but still need to inspect the state of a pipeline in a more step-by-step fashion.

shmoogy · 2 years ago

Work with a ton of ETL and web APIs, and build internal tools. Almost everything new starts off in a notebook and then either moves to a standalone .py, ends up in aws lambda (typically Zappa flask projects), or in an airflow dag.

Almost never ever use the REPL.

bravetraveler · 2 years ago

I'm nowhere near a programmer, but I use 'ipython' quite a bit for prototyping.

Particularly in cases where I'm trying to figure out how I want to modify some object.

I dabble largely due to Ansible and system administration purposes. IDEs and the like aren't a thing for me; neovim/LSP instead.

akasakahakada · 2 years ago

REPL 0% JupyterLab 80% Pycharm 20%

The reason is that Jupyter environment is lightyears more powerful than REPL. Feels like REPL is for those who don't code / only those who don't code would use REPL. I don't even use that after the first day.

EasyMark · 2 years ago

for python: 90/100 times, I just run the code/tests/debugger, 8/100 times I'll pull out the code and step through it with my own inputs, 2/100 use a repl and break on areas of interest to debug because the debugger just isn't cooperating. It's just too easy to use the debugger in something like vscode to run a module, especially in python since you can just right click run any old module a lot of the time just make a __main__ and feed some parameters, step through it, unlike with a static language.

I used this tool recently to debug some issues where one of our batch tasks was using more memory than it should and sometimes OOMing in an environment where it shouldn't come close to hitting a memory ceiling.

Found the issue almost immediately with the high watermark analysis which provides visibility into the part of the codebase/call stack that allocated every bit of allocated memory that was still live at the high watermark time. Made diagnosing the issue (an unexpectedly large amount of memory required when using a method from a third party library -- xarray.merge) and remedying the issue incredibly easy.

I was very impressed by the quality and utility of this tool and am now a huge fan!

horaborza · 2 years ago

Ah xarray, both incredibly useful and an extremely obnoxious nightmare

loeg · 2 years ago

Interesting testimonials section. Two out of five are "this looks interesting" and "this might be useful." Another two seem unclear on if they've actually used it. The last one does seem to be from an actual user (and is very positive, for what it's worth).

farhanhubble · 2 years ago

The real testimonials are here: https://github.com/bloomberg/memray/discussions/226

mostthingsweb · 2 years ago

You just gave them another one:

"Interesting testimonials section.

loeg: HN influencer"

metadat · 2 years ago

Where do you see the testimonials section?

sevagh · 2 years ago

There are 5 images of testimonials in the top part of the README

ezequiel-garzon · 2 years ago

dang · 2 years ago

Discussed recently here:

Memray: Python memory profiler - https://news.ycombinator.com/item?id=38561682 - Dec 2023 (21 comments)

(Reposts are fine after a year or so. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.)

Also related:

Memray - https://news.ycombinator.com/item?id=31102918 - April 2022 (2 comments)

Memray: a memory profiler for Python - https://news.ycombinator.com/item?id=31102089 - April 2022 (48 comments)

breatheoften · 2 years ago

codedokode · 2 years ago

I have used memory profilers in PHP. In my experience, it is difficult to analyze why application uses lot of RAM just by looking at stack traces. It is better when you have a graph of references: which variable references which memory block. So that you can see, for example, that `cache.users[23].comments[56].attachments[2].binary_data` uses 10 Mb of RAM.

colechristensen · 2 years ago

And flame graphs excel and this kind of thing

https://www.brendangregg.com/flamegraphs.html

kgm · 2 years ago

When it comes to profiling in Python, never underestimate the power of the standard library's profiler. You can supply it with a custom timing function when instantiating the Profile type [1], and as far as the module is concerned, this can be any function which returns a monotonically-increasing counter.

This means that you can turn the standard profiler into a memory profiler by providing a timing function which reports either total memory allocation or a meaningful proxy for allocation. I've had good results in the past using a timing function which returns the number of minor page faults (via resource.getrusage).

[1] https://docs.python.org/3/library/profile.html#profile.Profi...

cozos · 2 years ago

Ive used Memray and it works great locally. But when I deployed my application over long running processes (i.e. in production) because I want to see memory usage over a long period of time, the profiler outputs get really large, like hundreds of gbs. They cause disk outages and also take forever to download and visualize with the flamegraphs. What do people use to understand memory usage of long running workloads in production?

pablogsal · 2 years ago

Have you tried aggregated capture files? https://bloomberg.github.io/memray/run.html#aggregated-captu...

With that option the files are much much smaller and much easier to analyse

intalentive · 2 years ago

I appreciate the trend of giant companies open sourcing their tools. Thanks Bloomberg!