diek (u/diek) - Readit News

diek commented on How GM Made the $35,000 Chevrolet Equinox EV into a 319-Mile Range Champ insideevs.com/news/720117... · Posted by u/peutetre

NewJazz · a year ago

They've literally never made a budget sedan EV. Looks like they are happy to let Hyundai and Tesla take that market, while they churn out SUVs?

Also the 35k trim won't be available until later this year.

diek · a year ago

They made the Bolt EV from 2016-2023, and they're revamping it to use the new Ultium platform.

You could buy a 2023 Bolt starting at $26,500, and they're great cars.

diek commented on Postgres as queue leontrolski.github.io/pos... · Posted by u/leontrolski

jakjak123 · 2 years ago

Using the INSERT/UPDATES is kind of limiting for your events. Usually you will want richer event (higher level information) than the raw structure of a single table. Use this feature very sparingly. Keep in mind that LISTEN should also ONLY be used to reduce the active polling, it is not a failsafe delivery system, and you will not get notified of things that happened while you were gone.

diek · 2 years ago

For my use cases the aim is really to not deal with events, but deal with the rows in the tables themselves.

Say you have a `thing` table, and backend workers that know how to process a `thing` in status 'new', put it in status 'pending' while it's being worked on, and when it's done put it in status 'active'.

The only thing the backend needs to know is "thing id:7 is now in status:'new'", and it knows what to do from there.

The way I generally build the backends, the first thing they do is LISTEN to the relevant channels they care about, then they can query/build whatever understanding they need for the current state. If the connection drops for whatever reason, you have to start from scratch with the new connection (LISTEN, rebuild state, etc).

diek commented on Postgres as queue leontrolski.github.io/pos... · Posted by u/leontrolski

Deukhoofd · 2 years ago

> * use LISTEN to be notified of rows that have changed that the backend needs to take action on (so you're not actively polling for new work)

> * use NOTIFY from a trigger so all you need to do is INSERT/UPDATE a table to send an event to listeners

Could you explain how that is better than just setting up Event Notifications inside a trigger in SQL Server? Or for that matter just using the Event Notifications system as a queue.

https://learn.microsoft.com/en-us/sql/relational-databases/s...

> * you can select using SKIP LOCKED (as the article points out)

SQL Server can do that as well, using the READPAST table hint.

https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

> * you can use partial indexes to efficiently select rows in a particular state

SQL Server has filtered indexes, are those not the same?

https://learn.microsoft.com/en-us/sql/relational-databases/i...

diek · 2 years ago

Admittedly I used SQL Server pretty heavily in the mid-to-late-2000s but haven't kept up with it in recent years so my dig may have been a little unfair.

Agree on READPAST being similar to SKIP_LOCKED, and filtered indexes are equivalent to partial indexes (I remember filtered indexes being in SQL Server 2008 when I used it).

Reading through the docs on Event Notifications they seem to be a little heavier and have different deliver semantics. Correct me if I'm wrong, but Event Notifications seem to be more similar to a consumable queue (where a consumer calling RECEIVE removes events in the queue), whereas LISTEN/NOTIFY is more pubsub, where every client LISTENing to a channel gets every NOTIFY message.

diek commented on Postgres as queue leontrolski.github.io/pos... · Posted by u/leontrolski

leontrolski · 2 years ago

Thanks for the comprehensive reply, does the following argument stand up at all? (Going on the assumption that LISTEN is one more concept and one less concept is a good thing).

If I have say 50 workers polling the db, either it’s quiet and there's no tasks to do - in which case I don't particularly care about the polling load. Or, it's busy and when they query for work, there's always a task ready to process - in this case the LISTEN is constantly pinging, which is equivalent to constantly polling and finding work.

Regardless, is there a resource (blog or otherwise) you'd reccomend for integrating LISTEN with the backend?

diek · 2 years ago

In a large application you may have dozens of tables that different backends may be operating on. Each worker pool polling on tables it may be interested on every couple seconds can add up, and it's really not necessary.

Another factor is polling frequency and processing latency. All things equal, the delay from when a new task lands in a table to the time a backend is working on it should be as small as possible. Single digit milliseconds, ideally.

A NOTIFY event is sent from the server-side as the transaction commits, and you can have a thread blocking waiting on that message to process it as soon as it arrives on the worker side.

So with NOTIFY you reduce polling load and also reduce latency. The only time you need to actually query for tasks is to take over any expired leases, and since there is a 'lease_expire' column you know when that's going to happen so you don't have to continually check in.

As far as documentation, I got a simple java LISTEN/NOTIFY implementation working initially (2013?-ish) just from the excellent postgres docs: https://www.postgresql.org/docs/current/sql-notify.html

diek commented on Postgres as queue leontrolski.github.io/pos... · Posted by u/leontrolski

diek · 2 years ago

Postgres is great as a queue, but this post doesn't really get into the features that differentiate it from just polling, say SQL Server for tasks.

For me, the best features are:

  * use LISTEN to be notified of rows that have changed that the backend needs to take action on (so you're not actively polling for new work)
  * use NOTIFY from a trigger so all you need to do is INSERT/UPDATE a table to send an event to listeners
  * you can select using SKIP LOCKED (as the article points out)
  * you can use partial indexes to efficiently select rows in a particular state

So when a backend worker wakes up, it can:

  * LISTEN for changes to the active working set it cares about
  * "select all things in status 'X'" (using a partial index predicate, so it's not churning through low cardinality 'active' statuses)
  * atomically update the status to 'processing' (using SKIP LOCKED to avoid contention/lock escalation)
  * do the work
  * update to a new status (which another worker may trigger on)

So you end up with a pretty decent state machine where each worker is responsible for transitioning units of work from status X to status Y, and it's getting that from the source of truth. You also usually want to have some sort of a per-task 'lease_expire' column so if a worker fails/goes away, other workers will pick up their task when they periodically scan for work.

This works for millions of units of work an hour with a moderately spec'd database server, and if the alternative is setting up SQS/SNS/ActiveMQ/etc and then _still_ having to track status in the database/manage a dead-letter-queue, etc -- it's not a hard choice at all.

diek commented on Git rebase, what can go wrong jvns.ca/blog/2023/11/06/r... · Posted by u/kens

mablopoule · 2 years ago

Because now instead of having a line changed within a granular level of changes, it's lost with the other changes from the same feature branch, which is a more macro level. So if a change in config is needed for the feature, the part when this config change actually need to be handled, or would impact the data-flow is harder to evaluate now that you mix it with template changes, style changes, new interactions needed for the users, etc...

EDIT: On top of that, there's usually a bit of 'related' work you need for a task, by example when you find an edge case related to your feature, and now you also needed to fix a bug, or you did a bit of refactoring on a related service, or needed to change the data on a badly formatted JSON file.

Unbeknownst to you, you added a bug when refactoring the related service, a bug that is spotted a few months after, only on a very specific edge case. If the cause is not obvious, you might want to reach for git bisect, but that won't be very useful now that everything I've talked about is squashed into a single commit.

diek · 2 years ago

> EDIT: On top of that, there's usually a bit of 'related' work you need for a task, by example when you find an edge case related to your feature, and now you also needed to fix a bug, or you did a bit of refactoring on a related service, or needed to change the data on a badly formatted JSON file.

I agree that's related work, but I'd argue that work doesn't belong in that branch. If you find a bug in the process of implementing a feature, create a bugfix branch that is merged separately. If you need to refactor a service, that's also a separate branch/PR.

That's actually the most common pushback I get from people when I talk about squashing. They say "but then a bunch of unrelated changes will be lumped together in the same commit", to which I respond, "why are a bunch of unrelated changes in the same branch/PR?"

diek commented on Git rebase, what can go wrong jvns.ca/blog/2023/11/06/r... · Posted by u/kens

js2 · 2 years ago

Because unless it's the most trivial of features, you'll break it up into smaller commits which each explain what they are doing and make reviewing the change easier.

As a simple example, I recently needed to update a json document that was a list of objects. I needed to add a new key/value to each object. The document had been hand edited over the years and had never been auto-formatted. My PR ended up being three commits:

1. Reformat the document with jq. Commit title explains it's a simple reformat of the document and that the next commit will add `.git-blame-ignore-revs` so that the history of the document isn't lost in `git blame` view.

2. Add `.git-blame-ignore-revs` with the commit ID of (1).

3. Finally, add the new key/value to each object.

The PR then explains that a new key/value has been added, mentions that the document was reformatted through `jq` as part of the work, a recommends that the reviewer step through the commits to ignore the mechanical change made by (1).

A followup PR added a pre-commit CI step to keep the document properly linted in the future.

diek · 2 years ago

In general I agree with you, there are absolutely times where you want to retain commit history on a particular branch (although I try to keep the source tree from knowing about things like commit IDs).

I would argue that those are by far the minority of PRs that I see. As I mentioned in another comment, _most_ PRs that I see have a ton of intermediary commits that are only useful for that branch/PR/review process (fixing tests, whitespace, etc). Generally the advice I give teams is, "squash by default" and then figure out where the exceptions to that rule are. That's mainly because, in my opinion, the downsides of a noisy commit graph filled with "addressing review comments" (or whatever) commits are a much bigger/frequent issue than the benefits you talk about. It really depends on the team.

diek commented on Git rebase, what can go wrong jvns.ca/blog/2023/11/06/r... · Posted by u/kens

nineplay · 2 years ago

I lost patience with the various git commit cleanup tools and now I just go nuclear. I use git diff > output.file, make a new branch, get apply output.file.

Fresh clean branch, no commit history, create pull request.

I'm not convinced there's any value to incremental commit messages. This simple, clean, and undoable as long as I keep my initial branch

diek · 2 years ago

What you're describing is just doing a 'squash' merge

diek commented on Git rebase, what can go wrong jvns.ca/blog/2023/11/06/r... · Posted by u/kens

lelandfe · 2 years ago

Those commits would be the bathwater one casts out alongside the useful commits in using squash merges.

diek · 2 years ago

If the useful commits are the "baby" in your bathwater analogy, all the useful information in those commits is in the squashed commit.

This assumes a branch being merged in represents one logical change (a feature/bugfix/etc) that is "right sized" to be represented by one commit.

diek commented on Git rebase, what can go wrong jvns.ca/blog/2023/11/06/r... · Posted by u/kens

btilly · 2 years ago

There is a giant benefit to it being messy. And that is that the mess is the actual history.

Every time you do a git rebase, you are literally asking your source control system to lie about history. If you mess up, and you eventually will, you're then forced to manually figure out what the history really was despite being lied to. If you mess it up, well, good luck.

I used to work at a company where someone (we never figured out who) in another group would rebase every few weeks. We didn't find out about it until their stuff was pushed then released. The result was that features which we'd written, QAed, and released to production would simply disappear a few weeks later. With no history suggesting that it ever existed.

Have you ever been pulled off of a project to go fix a project from a month ago which has disappeared from source control? You don't know what happened, you no longer have context, you've just got complaints because your stuff no longer works.

Is your desire for a "clean history" worth potentially creating THAT disaster for other developers on your team???

diek · 2 years ago

The golden rule is "do not rewrite history of a public branch". Rebase/squash your PR branches to your heart's content, but once it's merged that's it.

You get clean history by not merging branches with 50 intermediary "fiddling with X" commits in them.