Readit News logoReadit News
craigching commented on Lossless Log Aggregation – Reduce Log Volume by 99% Without Dropping Data   bit.kevinslin.com/p/lossl... · Posted by u/benshumaker
iampims · a year ago
Or sampling :)
craigching · a year ago
Sampling is lossy though
craigching commented on Towards Idempotent Rebuilds?   blog.josefsson.org/2024/0... · Posted by u/JNRowe
algernonramone · a year ago
I feel that "deterministic" is probably a better word here than "idempotent".
craigching · a year ago
Maybe even hermetic builds? https://bazel.build/basics/hermeticity
craigching commented on An open-source implementation of Apple code signing and notarization (2022)   gregoryszorc.com/blog/202... · Posted by u/tosh
craigching · 2 years ago
We have an enhancement opened with Apple to have a way to delete .cstemp files if the tool runs into them. You'd think we could just add a `find . -name '*.cstemp' -exec rm {} \;` to our build toolchains before building, but we're in a large mono-repo and that would add a lot of time to our builds. Having something like a `--force` to delete the .cstemp files instead of quitting and reporting an error would make us change to this tool pretty quickly I'd think.
craigching commented on After 14 years in the industry, I still find programming difficult   piglei.com/articles/en-pr... · Posted by u/piglei
regus · 2 years ago
Are "TPS Reports" a real thing?!

I thought they were just a jokey gag name that they created in the Office Space movie to represent pointless busy work. This is like all the times as an adult I finally understood a joke I heard in The Simpsons back when I was a kid!

craigching · 2 years ago
At one company, I was responsible for putting together all the third-party software and their licenses. I called it the TPS report :)
craigching commented on Building a high performance JSON parser   dave.cheney.net/paste/gop... · Posted by u/davecheney
zlg_codes · 2 years ago
What on Earth are you storing in JSON that this sort of performance issue becomes an issue?

How big is 'large' here?

I built a simple CRUD inventory program to keep track of one's gaming backlog and progress, and the dumped JSON of my entire 500+ game statuses is under 60kB and can be imported in under a second on decade-old hardware.

I'm having difficulty picturing a JSON dataset big enough to slow down modern hardware. Maybe Gentoo's portage tree if it were JSON encoded?

craigching · 2 years ago
In my case, sentry events that represent crash logs for Adobe Digital Video applications. I’m trying to remember off the top of my head, but I think it was in the gigabytes for a single event.
craigching commented on Building a high performance JSON parser   dave.cheney.net/paste/gop... · Posted by u/davecheney
coldtea · 2 years ago
What line of work are you in that you've "written far too many JSON parsers already" in your career?!!!
craigching · 2 years ago
Probably anywhere that requires parsing large JSON documents. Off the shelf JSON parsers are notoriously slow on large JSON documents.
craigching commented on Understanding Automatic Differentiation in 30 lines of Python   vmartin.fr/understanding-... · Posted by u/sebg
eachro · 2 years ago
I really enjoy small elegant code demonstrations like this that really allow you to get your hands dirty to try to understand a concept. Another example is Sasha Rush's gpu puzzles, tensor puzzles -https://github.com/srush/GPU-Puzzles -https://github.com/srush/Tensor-Puzzles
craigching · 2 years ago
Also micrograd from Andrej Karpathy: https://github.com/karpathy/micrograd
craigching commented on Mpire: A Python package for easier and faster multiprocessing   github.com/sybrenjansen/m... · Posted by u/lnyan
milliams · 2 years ago
Why does everyone compare against `multiprocessing` when `concurrent.futures` (https://docs.python.org/3/library/concurrent.futures.html) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.
craigching · 2 years ago
Someone downvoted you, I upvoted because I think you have a good point but it would be nice to back it up. I think I agree with you, but I have only used concurrent.futures with threads.
craigching commented on When did Postgres become cool?   crunchydata.com/blog/when... · Posted by u/fforflo
convolvatron · 2 years ago
Postgres started in 1986. its was never less featureful than MySQL...in fact MySQL tried to get by without _transactions_ for the longest time. the fact that MySQL had more market/mindshare at any point is more of a testament about crowd mentality than anything about either of the two databases.
craigching · 2 years ago
I remember maybe circa 2004 debating Postgres and mysql with a colleague. I told him to unplug the machine that was hosting his mysql instance. He did and corrupted his database. He said it didn't matter, he had backups, speed was more important :p This was before mysql had the innodb storage engine, after that it wasn't so bad. I have always stood by Postgres though, it's a fantastic piece of open source software.
craigching commented on An Introduction to Statistical Learning with Applications in Python   statlearning.com... · Posted by u/alexmolas
thumbuddy · 2 years ago
I don't think most people realize this but the "old" stuff often works better, has less churn, and has far lower overhead costs for deployment than the "new" stuff. Depends on the domain and the goal.
craigching · 2 years ago
To your point, I replaced an LSTM that required ~$100k of infrastructure with XGBoost that required no more infrastructure (we created and used the model at query time on existing infrastructure we already had for query loads) and only lost about 2% accuracy (LSTM: 98%, XGBoost: 96%). This was two years ago and it's still in use.

u/craigching

KarmaCake day881October 21, 2012View Original