craigching (u/craigching)

craigching commented on Lossless Log Aggregation – Reduce Log Volume by 99% Without Dropping Data bit.kevinslin.com/p/lossl... · Posted by u/benshumaker

iampims · a year ago

Or sampling :)

craigching · a year ago

Sampling is lossy though

craigching commented on Towards Idempotent Rebuilds? blog.josefsson.org/2024/0... · Posted by u/JNRowe

algernonramone · 2 years ago

I feel that "deterministic" is probably a better word here than "idempotent".

craigching · 2 years ago

Maybe even hermetic builds? https://bazel.build/basics/hermeticity

craigching commented on An open-source implementation of Apple code signing and notarization (2022) gregoryszorc.com/blog/202... · Posted by u/tosh

craigching · 2 years ago

We have an enhancement opened with Apple to have a way to delete .cstemp files if the tool runs into them. You'd think we could just add a `find . -name '*.cstemp' -exec rm {} \;` to our build toolchains before building, but we're in a large mono-repo and that would add a lot of time to our builds. Having something like a `--force` to delete the .cstemp files instead of quitting and reporting an error would make us change to this tool pretty quickly I'd think.

craigching commented on After 14 years in the industry, I still find programming difficult piglei.com/articles/en-pr... · Posted by u/piglei

regus · 2 years ago

Are "TPS Reports" a real thing?!

I thought they were just a jokey gag name that they created in the Office Space movie to represent pointless busy work. This is like all the times as an adult I finally understood a joke I heard in The Simpsons back when I was a kid!

craigching · 2 years ago

At one company, I was responsible for putting together all the third-party software and their licenses. I called it the TPS report :)

craigching commented on Building a high performance JSON parser dave.cheney.net/paste/gop... · Posted by u/davecheney

zlg_codes · 2 years ago

What on Earth are you storing in JSON that this sort of performance issue becomes an issue?

How big is 'large' here?

I built a simple CRUD inventory program to keep track of one's gaming backlog and progress, and the dumped JSON of my entire 500+ game statuses is under 60kB and can be imported in under a second on decade-old hardware.

I'm having difficulty picturing a JSON dataset big enough to slow down modern hardware. Maybe Gentoo's portage tree if it were JSON encoded?

craigching · 2 years ago

In my case, sentry events that represent crash logs for Adobe Digital Video applications. I’m trying to remember off the top of my head, but I think it was in the gigabytes for a single event.

craigching commented on Building a high performance JSON parser dave.cheney.net/paste/gop... · Posted by u/davecheney

coldtea · 2 years ago

What line of work are you in that you've "written far too many JSON parsers already" in your career?!!!

craigching · 2 years ago

Probably anywhere that requires parsing large JSON documents. Off the shelf JSON parsers are notoriously slow on large JSON documents.

craigching commented on Understanding Automatic Differentiation in 30 lines of Python vmartin.fr/understanding-... · Posted by u/sebg

eachro · 3 years ago

I really enjoy small elegant code demonstrations like this that really allow you to get your hands dirty to try to understand a concept. Another example is Sasha Rush's gpu puzzles, tensor puzzles -https://github.com/srush/GPU-Puzzles -https://github.com/srush/Tensor-Puzzles

craigching · 3 years ago

Also micrograd from Andrej Karpathy: https://github.com/karpathy/micrograd

craigching commented on Mpire: A Python package for easier and faster multiprocessing github.com/sybrenjansen/m... · Posted by u/lnyan

milliams · 3 years ago

Why does everyone compare against `multiprocessing` when `concurrent.futures` (https://docs.python.org/3/library/concurrent.futures.html) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.

craigching · 3 years ago

Someone downvoted you, I upvoted because I think you have a good point but it would be nice to back it up. I think I agree with you, but I have only used concurrent.futures with threads.

craigching commented on When did Postgres become cool? crunchydata.com/blog/when... · Posted by u/fforflo

convolvatron · 3 years ago

Postgres started in 1986. its was never less featureful than MySQL...in fact MySQL tried to get by without _transactions_ for the longest time. the fact that MySQL had more market/mindshare at any point is more of a testament about crowd mentality than anything about either of the two databases.

craigching · 3 years ago

I remember maybe circa 2004 debating Postgres and mysql with a colleague. I told him to unplug the machine that was hosting his mysql instance. He did and corrupted his database. He said it didn't matter, he had backups, speed was more important :p This was before mysql had the innodb storage engine, after that it wasn't so bad. I have always stood by Postgres though, it's a fantastic piece of open source software.

craigching commented on An Introduction to Statistical Learning with Applications in Python statlearning.com... · Posted by u/alexmolas

thumbuddy · 3 years ago

I don't think most people realize this but the "old" stuff often works better, has less churn, and has far lower overhead costs for deployment than the "new" stuff. Depends on the domain and the goal.

craigching · 3 years ago

To your point, I replaced an LSTM that required ~$100k of infrastructure with XGBoost that required no more infrastructure (we created and used the model at query time on existing infrastructure we already had for query loads) and only lost about 2% accuracy (LSTM: 98%, XGBoost: 96%). This was two years ago and it's still in use.

u/craigching

KarmaCake day881October 21, 2012View Original