Useful Python decorators for data scientists

from fastapi import FastAPI from pydantic import BaseModel, constr, conlist from typing import List from transformers import pipeline classifier = pipeline("zero-shot-classification", model="models/distilbert-base-uncased-mnli") app = FastAPI() class UserRequestIn(BaseModel): text: constr(min_length=1) labels: conlist(str, min_items=1) class ScoredLabelsOut(BaseModel): labels: List[str] scores: List[float] @app.post("/classification", response_model=ScoredLabelsOut) def read_classification(user_request_in: UserRequestIn): return classifier(user_request_in.text, user_request_in.labels)

All of these are great examples of useful higher functions, meaning they take and output functions.

I just wouldn't use them as decorators in proper code.

sscarduzio · 3 years ago

Why not tho? Decorators are very explicit markers of higher functions. When I see one, I know what's for.

henrydark · 3 years ago

It's rare that all users of a function will want it to be "production" or "parallel" or the rest in the piece.

Applying these functions as decorators, i.e. with @, means you can't run the non parallel version, or the test not in production, etc.

In the end, decorators, though nice on the first day of usage, reduce composability by restricting usage to whatever you wanted on that same day.

(this is not a general remark, it doesn't apply to DSLs that use decorators, e.g. flask)

Dead Comment

jimmytucson · 3 years ago

You’re saying write it this way instead?

    def my_func:
        ...

    my_decorated_func = my_decorator(my_func)

PartiallyTyped · 3 years ago

No they don't.

The decorator pattern is a well known one, where one "decorates" a function by passing it into another function. GP expresses that they would avoid the pattern with these decorators.

The decorator operator is essentially prefix notation of the form `f1 = f2(f1) = @ f2 f1` which is what the GP alluded to, i.e. that f2 is a higher order function since it takes a function and produces another function. In-fact, the @ operator is a higher order function as well since it takes 2 higher order functions.

Dead Comment

A better title would be 'Useful Python Decorators by a Data Scientist'.

Those decorators are exactly what data scientists would do, while software engineers would be terrified.

rmbyrro · 3 years ago

> while software engineers would be terrified

At least surprised. There are solutions to these problems already.

Author seemed to have learned decorators and is enthusiastic about abusing them, instead of learning the stdlib.

The capturing of print statements. Why not use the Logging machinery, instead?

Or the @stacktrace, it seems what they really want/need is a debugger.

But anyway, if these solutions fit their programming style better, so be it.

agumonkey · 3 years ago

To be fair, logging is badly made. There are at least two libs to make it human again.

mattkrause · 3 years ago

@redirect could be handy when the print statements are embedded in someone else's code.

We use a library that is very...chatty (some function calls send a screenful of info/progress to the screen), and I think I'm going to steal this to make it quieter.

ramblerman · 3 years ago

I found them quite creative tbh. Bad software engineers also tend to hide behind their best practices.

It would be more interesting to point out the parts you feel are so terrible.

Not everything has to be designed for a super critical prod environment with >10 coders working non stop on it.

gjulianm · 3 years ago

> Not everything has to be designed for a super critical prod environment with >10 coders working non stop on it.

You don't need a super critical prod environment to have decent code. Half of these are hardcoding environment configuration, others have hidden side effects that the caller of the function cannot control at all, and others are badly reimplementing things that already exist (@redirect -> you want logging for this, @stacktrace -> use a debugger)

dragonwriter · 3 years ago

> Not everything has to be designed for a super critical prod environment with >10 coders working non stop on it.

And even when it does, cargo-culting rules-of-thumb is generally the wrong way to do that. Best practices are better treated as the Pirate Code than the Divine Writ.

TheAlchemist · 3 years ago

I find them creative too - and I didn't say they are terrible. Actually, I'm also from data background and that's exactly the type of stuff I would come up with too.

But as I'm recently trying to improve my software skills, I notice that while those are indeed useful in the short term, in the long term they are not worth the price. The @production one, seems like a disaster waiting to happen.

BobbyJo · 3 years ago

> Bad software engineers also tend to hide behind their best practices.

I don't believe I've ever encountered this. Can you elaborate on what you mean?

apohn · 3 years ago

>Those decorators are exactly what data scientists would do, while software engineers would be terrified.

Actually, when I read the post I'd guessed this is what an ex Software Engineer who is now a Data Scientist would do. And looking at the author's LinkedIn confirmed it.

You have to have a software engineering background to come up with this stuff in the first place.

Deleted Comment

jstx1 · 3 years ago

It's kind of cool that this is possible but I'm not adding that kind of complexity to my code unless I really need to. And I really never need to.

z3c0 · 3 years ago

Decorators are one of those language features that I want to use more of, but every time I attempt to, I realize that what I'm doing could be achieved more easily another way.

zmgsabst · 3 years ago

I use two with any regularity:

1. @dataclass

https://docs.python.org/3/library/dataclasses.html

2. @cuda.jit

https://numba.pydata.org/numba-doc/latest/cuda/kernels.html

Abishek_Muthian · 3 years ago

Have felt the same way till I used one for fastapi , One can literally implement production grade* Machine Learning pipeline in 15 lines of code using transformers in a single python file;

*: Production grade if used in combination with workers, A python quirk I felt is not relevant to the topic of decorators.

kwertyoowiyop · 3 years ago

@production looks risky, compared to checking an environment variable. Hopefully that host_name function is guaranteed to return the same result every time it is called, and can never fail or raise an exception.

It's also a bad idea, because it implies you have different implementations for each environment.

Which means your dev and production environments are quite different, increasing risks of letting stupid mistakes slip into prod.

Oh so true. No matter how you express it, “if production” will likely land you in debugging hell sooner or later.

giantrobot · 3 years ago

You can do the @production one with an envar or any other mechanism. I've been using it for years (with an envar) and it works really well.

Besides switching between something like production and test you can catch a case where an envar isn't set or is an unexpected value.

julvo · 3 years ago

To add to the list of very handy footguns:

@reloading to hot-reload a function from source before every invocation (https://github.com/julvo/reloading)

gknoy · 3 years ago

That looks amazing, thank you! I wouldn't want it in my production code but it seems like it would be great for something that I'm working on in the REPL.

neves · 3 years ago

if you are using a REPL, first you should be using ipython. Second you should be running:

%load_ext autoreload %autoreload 2

now all your imports will be automatically reloaded.

So good that I have it in ipython startup scripts

pizza · 3 years ago

There's also reloadium [0] for tight dev iteration loops

https://reloadium.io

spywaregorilla · 3 years ago

I've only really ever needed one decorator. The lovely @fuckit from fuckit. Tells your code to ignore all errors and keep on truckin.

Great for prod if an error is unnacceptable

(or more realistically, web scraping where you just don't care about one off errors)

staticautomatic · 3 years ago

Does it just wrap the code in a context manager where __exit__ returns True or something?

dnadler · 3 years ago

I'd imagine it's just a try/except wrapper with a funny name.

lysecret · 3 years ago

I am strugeling a bit with the @production decorator. I never had the issue of only wanting to run a function on prod. Often I want slightly different behaviour, but then id use env variables. (say using a Prod and a Dev API adr or DB adr). Wouldnt it also be better design to keep dev and prod as close as possible?

There's more environments than just dev and prod. Even if the environments differ having some extra telemetry in a test environment often makes sense. So a @telemetry or @production decorator that does something in a non-production environment is an easy way to add that capability to functions.

I've found I like a decorator better than some envar testing custom logging/telemetry. If I have a custom_log function I use everywhere, I have to use it everywhere I might want it. With a decorator I can add it to only a few functions and get far less noise.

blunova · 3 years ago

I often use my custom timer decorator to time execution of functions/methods. This is not the only way I can do that, but I think it's a very convenient option.