heipei (u/heipei) - Readit News

heipei commented on How we replaced Elasticsearch and MongoDB with Rust and RocksDB radar.com/blog/high-perfo... · Posted by u/j_kao

dewey · 23 days ago

To add another data point: After working with ES for the past 10 years in production I have to say that ES is never giving us any headaches. We've had issues with ScyllaDB, Redis etc. but ES is just chugging along and just works.

The one issue I remember is: On ES 5 we once had an issue early on where it regularly went down, turns out that some _very long_ input was being passed into the search by some scraper and killed the cluster.

heipei · 23 days ago

I agree, and I don't get where the claims that ES is hard to operate originate from. Yeah, if you allow arbitrary aggregations that exceed the heap space, or if you allow expensive queries that effectively iterate over everything you're gonna have a bad time. But apart from those, as long as you understand your data model, your searches and how data is indexed, ES is absolutely rock-solid, scales and performs like a beast. We run a 35-node cluster with ~ 240TB of disk, 4.5TB of RAM, and about 100TB of documents and are able to serve hundreds of queries. The whole thing does not require any maintenance apart from replacing nodes that failed from unrelated causes (hardware, hosting). Version upgrades are smooth as well.

The only bigger issue we had was when we initially added 10 nodes to double the initial capacity of the cluster. Performance tanked as a result, and it took us about half a day until we finally figured out that the new nodes were using dmraid (Linux RAID0) and as a result the block devices had a really high default read-ahead value (8192) compared to the existing nodes, which resulted in heavy read amplification. The ES manual specifically documents this, but since we hadn't run into this issue ourselves it took us a while to realise what was at fault.

heipei commented on FoundationDB: From idea to Apple acquisition [video] youtube.com/watch?v=C1nZz... · Posted by u/zdw

Nican · a month ago

FoundationDB has been growing as my favorite database lately. Even though it is only key-value store.

Out of curiosity: what are the scale limits of FoundationDB? What kind of issues would it start to have? For example, being able to store all of Discord messages on it?

I see blog posts of Discord moving to Scylla and ElasticSearch, but I wonder if there would be any difficulties here.

heipei · a month ago

ScyllaDB discontinued it's free and open source version, so I personally wouldn't build anything new on it.

heipei commented on I wasted weeks hand optimizing assembly because I benchmarked on random data vidarholen.net/contents/b... · Posted by u/thunderbong

heipei · a month ago

Counterpoint: I once wrote a paper on accelerating blockciphers (AES et al) using CUDA and while doing so I realised that most (if not all) previous academic work which had claimed incredible speedups had done so by benchmarking exclusively on zero-bytes. Since these blockciphers are implemenented using lookup tables this meant perfect cache hits on every block to be encrypted. Benchmarking on random data painted a very different, and in my opinion more realistic picture.

heipei commented on VisionOS 26 keeps pushing Apple's newest platform toward the future sixcolors.com/post/2025/0... · Posted by u/tosh

rurp · 3 months ago

My experience is exactly the opposite, most non-techie users hate pointless UI changes. They couldn't care less about some new design paradigm but they care a great deal that some action they've been using for years has now been changed out from under them. For most people a computer is a tool and they care far more about what they can do with the tool than seeing the tool make itself the center of attention for a time.

Redesigns are often self-indulgent. Designers like that they get to do something new, employees who stare at the same software every day get to change things up, and managers get a highly visible change they can point to as evidence of their "impact". What's best for the users is often not a top concern.

heipei · 3 months ago

In my experience this affects techie users just as much. Especially when there is a UI that has been crafted and slowly perfected over the years, and where any remaining idiosyncrasy has long been learned by the user, changing that UI has profound negative impact on the productivity of anyone using the platform.

I have rarely seen UI changes where users were genuinely excited to have a new UI with the understanding that they'd have to learn new paradigms. Most web apps should still be Bootstrap apps, but of course then you can't put that on a giant dashboard wall at a conference ;)

heipei commented on Four years of running a SaaS in a competitive market maxrozen.com/on-four-year... · Posted by u/mtlynch

openplatypus · 4 months ago

We are also listing prices in EURO only.

Our server bills are in EURO.

Our salaries are in EURO.

Our subscription for business operations are in EURO.

Our accountancy costs are in EURO.

It helps that our focus is on European customers, but that said, it is hard to justify going with USD pricing.

heipei · 4 months ago

No it's not, not if you want to win customers from the US. Their annual budgets are in USD, so they don't have the flexibility to pay more next year just because the foreign exchange rate has shifted. You take the foreign exchange risk by listing prices in USD, but it could just as well be a windfall, and your customers pay stable prices in return.

heipei commented on IBM completes acquisition of HashiCorp newsroom.ibm.com/2025-02-... · Posted by u/ahurmazda

schmichael · 6 months ago

I joined HashiCorp in 2016 to work on Nomad and have been on the product ever since. Definitely a lot of feelings today. When I joined HashiCorp was maybe 50 people. Armon Dadgar personally onboarded us one at a time, and showed me how to use the coffee maker (remember to wash your own dishes!). There have been a lot of ups (IPO) and downs (BUSL), but the Nomad team and users have been the best I've ever gotten to work with.

I've only ever worked at startups before, but HashiCorp itself left that category when it IPO'd. Each phase is definitely different, but then again I don't want go back to roadmapping on a ridiculously small whiteboard in a terrible sub-leased office and building release binaries on my laptop. That was fun once, but I'm ready for a new phase in my own life. I've heard the horror stories of being acquired by IBM, but I've also heard from people who have reveled in the resources and opportunities. I'm hoping for the best for Nomad, our users, and our team. I'd like to think there's room in the world for multiple schedulers, and if not, it won't be for lack of trying.

heipei · 6 months ago

I just wanted to say thank you for your work on Nomad. It's one of the most pleasant and useful pieces of software I have ever worked with. Nomad allowed us to build out a large fleet of servers with a small team while still enjoying the process.

heipei commented on RabbitMQ 4.0 github.com/rabbitmq/rabbi... · Posted by u/rhodin

wejick · a year ago

People will be surprised on how far you can get NSQ. It doesn't come with any fancy guarantee like only-once or even ordered, this forced developer to think how to design better on the application side. Not saying it's ideal tho.

heipei · a year ago

I don't know why / how messages should be ordered. NSQ is a message queue and not a log. Some messages take longer to process than others, and some messages need to be re-queued and re-tried out of order, and that is a very common use-case.

I would love to be able to use a distributed log like Kafka/Redpanda since it's HA out of the box, but it simply does not fit that use-case.

heipei commented on RabbitMQ 4.0 github.com/rabbitmq/rabbi... · Posted by u/rhodin

fidotron · a year ago

I've regarded RabbitMQ as a secret weapon in plain sight for years. The killer reason people don't use it more is it "doesn't scale" to truly enormous sizes, but for anyone with less than a million users it's great.

Too many people end up with their own half rolled pubsub via things like grpc, and they'd be far better off using this, particularly in the early stages of development.

heipei · a year ago

I could say the same thing about NSQ which is a distributed message queue with very simple semantics and a great HTTP API for message publishing and control actions. What it doesn't offer natively is HA though.

heipei commented on Phishing Campaigns Targeting USPS See as Much Web Traffic as the USPS Itself akamai.com/blog/security-... · Posted by u/rexbee

MattGaiser · a year ago

Is detecting phishing all that straightforward? As banks, travel agents, and even governments, are all terrible at avoiding the signalling of phishing.

Equifax had its entire response to its breach on a different domain, the kind of thing we tell people to watch out for.

https://www.equifaxsecurity2017.com/

This looks like phishing. But it is legitimate.

heipei · a year ago

It is not straightforward, and it is complicated by a number of factors. The first would be bad "brand hygiene": If a company has dozens of legitimate domains across different TLDs, different providers and different geographical locations then it's already more complicated than just one canonical .com domain. If teams within the company are permitted to spin up their own domains (e.g. marketing campaigns, branch offices) then it gets 10x worse. Lastly if a legitimate brand frequently changes its appearance, it will be harder to pin down the true brand identity.

But even if you follow all of these best practices there are still powerful attack vectors. A threat actor could host their phishing page on an unrelated (compromised) domain with good domain reputation, in that case you wouldn't even know about that site until the first email or SMS hits your customers. Or the threat actor could use one of the many file-hosting or website services to create their site and host it on a shared third-party domain with perfect domain reputation (e.g. amazonaws.com).

And then there's incentive: It's no the companies that suffer financial losses, it is their customers. If you were talking about their employees being phished that would be a different story. Same thing for Google Safe Browsing: Their incentive is to protect against most of the obvious phishing, without any false positives, ever. If they are slow to detect something they won't suffer any losses. If they generate a False Positive their Chrome browser might suffer significant reputational damage if a popular legitimate domain is blocked.

heipei commented on So you want to scrape like the big boys (2021) incolumitas.com/2021/11/0... · Posted by u/aragonite

gwittel · a year ago

I’m really mixed on this. Anti bot stuff is increasingly a pain point for security research. Working in this space, I have to work against these systems.

Threat actors use Cloudflare and other services to gate their payloads. That’s a problem for our customers who are trying to find/detect things like brand impersonation and credential phish. Cloudflare has been completely unhelpful. They just don’t care.

heipei · a year ago

Seconding this. Evading detection has become a real cake-walk since threat actors are able to sign up for a free Cloudflare account and then put their phishing site on their 2-hours old domain behind a level of protection backed by a $20B company. Funny that you almost never see phishing on Akamai ;)

Disclaimer: We operate in this space so we obviously have an interest in being able to detect these threats going forward.