Readit News logoReadit News
munk-a · 10 months ago
While postgres is indeed case sensitive usually writing queries with keywords in all caps is an effort to increase legibility for visual pattern matching. It absolutely isn't needed but if I'm debugging a query of yours I will send it through my prettifier so that I can breeze through your definitions without getting hung up on minor weird syntax things.

It's like prettification in any other language - visual structures that we can quickly recognize (like consistent indentation levels) make us waste less time on comprehension of the obvious so we can focus on what's important.

The only thing I really object to is "actuallyUsingCaseInIdentifiers" I never want to see columns that require double quotes for me to inspect on cli.

wiredfool · 10 months ago
I find all caps identifiers wind up just looking like interchangeable blocks, where lowercase have word shapes. So all caps just slows down my reading.
WorldMaker · 10 months ago
I feel similarly and I also have friends with Dyslexia with even stronger opinions on it. All caps in addition to being "shouting" to my ancient internet-using brain (and thus rude in most cases), creates big similar rectangular blocks as word shapes and is such a big speed bump to reading speed for everyone (whether or not they notice it). For some of my friends with Dyslexia that have a huge tough time with word shapes at the best of times, all caps can be a hard stop "cannot read" blocker for them. They say it is like trying to read a redacted document where someone just made rectangular black marker cross outs.

Personally, given SQL's intended similarity to English, I find that I like English "sentence case" for it, with the opening keyword starting with a capital letter and nearly every remaining letter lower case (except for Proper Nouns, the truly case-sensitive parts of SQL like table names). Sentence case has been helpful to me in the past in spotting things like missing semicolons in dialects like Postgres' that require them, and/or near keywords like `Merge` that require them or helping to visually make sure the `select` under a `Merge` is intended as a clause rather than starting a new "sentence".

conductr · 10 months ago
I prefer lower case for my personal legibility reasons and it seems like a prettyfier should be able to adjust to that user’s preference. It’s not a team sport for me so I never had a conflict of styles other than converting public code samples to match my way.
yen223 · 10 months ago
I've always found it funny that SQL was designed the way it is to be as close to natural English as possible, but then they went ahead and made everything all-caps
andrei_says_ · 10 months ago
Also sql editors like datagrip color the sql syntax very well.
gwbas1c · 10 months ago
It's really useful to know this when working with SQL interactively.

Specifically, if I'm banging out an ad-hoc query that no one will ever see, and I'm going to throw away, I don't worry about casing.

Otherwise, for me, all SQL that's checked in gets the commands in ALL CAPS.

archsurface · 10 months ago
My understanding is that the caps were syntax highlighting on monochrome screens; no longer needed with colour. Can't provide a reference, it's an old memory.
_Wintermute · 10 months ago
Most of the SQL I write is within a string of another programming language, so it's essentially monochrome unless there's some really fancy syntax highlighting going on.
CoastalCoder · 10 months ago
I think the "color is all we need" idea makes sense in proportion to how many of our tools actually support colorization.

E.g., the last time I used the psql program, I don't think it had colorization of the SQL, despite running in a color-capable terminal emulator.

It probably doesn't help that terminal colors are a bit of mess. E.g., piping colored output through 'less' can result in some messy control-character rendering rather than having the desired effect.

vundercind · 10 months ago
Like sigils in that regard. Perl-type sigils are extremely nice... if you're editing in Notepad or some ancient vi without syntax highlighting and the ability to ID references on request. Little point to them, if you've got more-capable tools.
o11c · 10 months ago
Note that case handling is a place where postgres (which folds to lowercase) violates the standard (which folds to uppercase).

This is mostly irrelevant since you really shouldn't be mixing quoted with unquoted identifiers, and introspection largely isn't standardized.

yen223 · 10 months ago
Given that other mainstream RDBMSes lets you configure how case handling should happen, Postgres is arguably the closest to the standard.

Usual caveat of how nobody sticks to the ANSI standard anyway applies.

emmanuel_1234 · 10 months ago
Any recommendation for a prettifier / SQL linter?

Deleted Comment

gnulinux · 10 months ago
I'm curious about this for DuckDB [1]. In the last couple months or so I've been using DuckDB as a one-step solution to all problems I solve. In fact my development environment rarely requires anything other than Python and DuckDB (and some Rust if native code is necessary). DuckDB is an insanely fast and featureful analytic db. It'd be nice to have a linter, formatter etc specifically for DuckDB.

There is sqlfluff etc but I'm curious what people use.

[1] DuckDB SQL dialect is very close to Postgres, it's compatible in many ways but has some extra QOL features related to analytics, and lacks a few features like `vacuum full`;

homebrewer · 10 months ago
IDEA if you want to use it for other things (or any other JetBrains IDE). Nothing comes close feature-wise.

If you don't:

- https://www.depesz.com/2022/09/21/prettify-sql-queries-from-...

- https://gitlab.com/depesz/pg-sql-prettyprinter

Or https://paste.depesz.com for one-off use.

NegativeLatency · 10 months ago
prettier plugin sql, or pg_format

Dead Comment

mannyv · 10 months ago
The uppercase is usually there to show people what's SQL vs what's custom to your database. In books it's usually set in courier.

I thought he was talking about psql's case sensitivity with table names, which is incredibly aggravating.

avg_dev · 10 months ago
i agree; but i use caps in my codebase, and lowercase when testing things out manually, just for ease of typing.
munk-a · 10 months ago
Ditto - if I'm throwing out an inspection query just to get a sense of what kind of data is in a column I won't bother with proper readable syntax (i.e. a `select distinct status from widgets`). I only really care about code I'll need to reread.
MetaWhirledPeas · 10 months ago
> writing queries with keywords in all caps is an effort to increase legibility for visual pattern matching

Considering most programming languages do just fine without ALL CAPS KEYWORDS I'd say it's a strange effort. I wish SQL didn't insist on being different this way.

I agree with you on prettification though. As long as the repository chooses a prettifier you can view it your way then commit it their way. So that's my advice: always demand prettification for pull requests.

sensanaty · 10 months ago
I don't write or read SQL too often, but I prefer the ALL_CAPS because it usually lets me know if something is part of the SQL syntax itself or a function or whatever, or if it's referencing a table/column/etc.

Obviously not very foolproof, but automated linters/prettifiers like the one in DataGrip do a good job with this for every query I've ever thrown at it.

grahamplace · 10 months ago
For checked-in SQL queries, we follow: https://www.sqlstyle.guide/

The combination of all caps keywords + following "the river" whitespace pattern dramatically improves readability in my opinion

zusammen · 10 months ago
The all caps syntax also helps queries stand out as distinct from typical source languages. It is often helpful, since SQL tends to end up in all sorts of applications.
neves · 10 months ago
You can convey more info with color. Any half decent editor can color your SQL.

All caps letters are more similar and harder to read.

marcosdumay · 10 months ago
Well, as long as you aren't imposing the noisy syntax into everybody by pushing the case-change back into the code...

But getting some editor that highlights the SQL will completely solve your issue.

munk-a · 10 months ago
I think fighting in PRs over syntax preferences is pretty useless so dev shops should generally have code style guidelines to help keep things consistent. In my company we use all caps casing since we have momentum in that direction but I think that decision can be reasonable in either direction as long as it's consistent - it's like tabs vs. spaces... I've worked in companies with both preferences, I just configure my editor to auto-pretty code coming out and auto-lint code going in and never worry about it.
troyvit · 10 months ago
> no longer needed with colour.

I think the increased legibility for visual pattern matching also makes SQL easier to read for many of the 350 million color blind people in the world.

jillyboel · 10 months ago
what is your prettifier of choice for postgres?
christophilus · 10 months ago
I’d never stumbled across the “don’t do this” wiki entry[0] before. Very handy.

[0] https://wiki.postgresql.org/wiki/Don%27t_Do_This

gwbas1c · 10 months ago
Why don't they deprecate some of these features? If they're such easy stumbling blocks, seems like it makes sense to disable things like table inheritance in new schemas, and require some kind of arcane setting to re-enable them.
Macha · 10 months ago
e.g. the suggested replacement for timestamp is timestamptz, which has its own problems (notably, it eagerly converts to UTC, which means it cannot account for TZ rule changes between storing the date and reading it). If medium term scheduling across multiple countries is something that needs to work in your app, you're kind of stuck with a column with a timestamp and another column with a timezone.
mixmastamyk · 10 months ago
Several of the broken are SQL standard.
__loam · 10 months ago
Changing defaults can screw over existing users.
SoftTalker · 10 months ago
Presumably there are rare exceptions where you DO want to do the thing.
Parodper · 10 months ago
The money one honestly sounds like a bug.
datadrivenangel · 10 months ago
This reminds me of SQL Anti-patterns, which is a book that everyone who works with databases should read.
samarthr1 · 10 months ago
That was a fun read, thanks!

Made me reconsider a few habits I picked up from MySQL land

pavel_lishin · 10 months ago
A lot of these aren't postgres-specific. (null weirdness, index column order, etc.)

For example, how nulls work - especially how interact with indexes and unique constraints - is also non-intuitive in mysql.

If you have a user table with a non-nullable email column and a nullable username column, and a uniqueness constraint on something like (email, username), you'll be able to insert multiple identical emails with a null username into that table - because a null isn't equivalent to another null.

anarazel · 10 months ago
> If you have a user table with a non-nullable email column and a nullable username column, and a uniqueness constraint on something like (email, username), you'll be able to insert multiple identical emails with a null username into that table - because a null isn't equivalent to another null.

FWIW, since 15 postgres you can influence that behaviour with NULLS [NOT] DISTINCT for constraints and unique indexes.

https://www.postgresql.org/docs/devel/sql-createtable.html#S...

EDIT: Added link

bpicolo · 10 months ago
I think this is a good pragmatic default. The use case for the alternative is much more rare.
pavel_lishin · 10 months ago
I totally agree - but it's not an intuitive default.
marcosdumay · 10 months ago
> Normalize your data unless you have a good reason not to

Ouch. You don't want to just say that and move on.

The author even linked to a page citing 10 different kinds of normalization (11 with the "non-normalized"). Most people don't even know what those are, and have no use for 7 of those. Do not send people on wild-goose chases after those higher normal forms.

pavel_lishin · 10 months ago
But the author did have a paragraph explaining, in general, what they mean.

And they're right! I've had to fix a few issues of this in a project I recently got moved to. There's almost never a reason to duplicate data.

Deleted Comment

hobs · 10 months ago
I guess this is targeted towards noobs, but the answer is pretty much always 3rd normal form if you are clicking this and are not sure.
makr17 · 10 months ago
Interesting, I would have said Boyce-Codd unless you have a good reason to vary in either direction.
cryptonector · 10 months ago
The general rule is to normalize to the max, then denormalize till you get the performance that you need.
DanHulton · 10 months ago
The one exception I'll make from the very start is "add tenant identifier to every row, yes, even if it's linked to another table that has tenant identifiers."

Sure, this means you will have some "unnecessary" `tenant_id` columns in some tables that you could get through a relation, but it saves you from _having_ to include that relation just to limit by tenant, which you will some day almost be guaranteed to want. (Especially if you want to enable row-level security¹ someday.)

¹ - https://www.postgresql.org/docs/current/ddl-rowsecurity.html

marcosdumay · 10 months ago
Oh, no, it's absolutely not.

It's usually to normalize into the 3rd form. But that's not enough on some cases, that's too much on some other cases, and the reason it breaks is performance about as often as it's not.

jayknight · 10 months ago
yen223 · 10 months ago
"normalise til it hurts, denormalise til it works"
jedberg · 10 months ago
My number one tip: Vacuum every day!

I didn't know this when I started, so I never vacuumed the reddit databases. Then one day I was forced to, and it took reddit down for almost a day while I waited for it to finish.

FreakLegion · 10 months ago
No autovacuum? At reddit's scale I'm surprised you didn't run out of transaction IDs.
jedberg · 10 months ago
I turned it off because it would run at inopportune times.

And we did run out of transaction IDs, which is why I was forced to do it.

I never turned on the auto-vacuumer but I did set up a daily vacuum.

Keep in mind I left Reddit 13 years ago and I’m sure they’ve made improvements since.

gtaylor · 10 months ago
Auto vacuuming is enabled now. We did have some near misses due to long running vacuums that barely completed before wraparounds, but got things tuned over time.

I did none of that work but was on the team where it happened.

ellisv · 10 months ago
I really wish developers cared more about normalization and stop shoving everything into a JSON(b) column.
abraae · 10 months ago
Long before databases could even store structured JSON data, junior developers used to bikeshed viciously over the correct degree of normalization.

More experienced developers knew that the correct answer was to duplicate nothing (except for keys obviously) and then to denormalize only with extreme reluctance.

Then databases like mongo came along and encouraged those juniors by giving them something like a database, but where normalization was difficult/irrelevant. The result was a brief flowering of horrible database designs and unmaintainable crap towers.

Now the pendulum has swing back and people have rediscovered the virtues of a normalized database, but JSON columns provide an escape hatch where those bad practices can still flower.

christophilus · 10 months ago
Eh. JSON has its place. I have some stateful data that is fairly transient in nature and which doesn’t really matter all that much if it gets lost / corrupted. It’s the sort of thing I’d throw into Redis if we had Redis in our stack. But the only storage in my project is S3 and Postgres. Postgres allows me to trivially query, update, analyze the usage of the feature, etc. Normalization wouldn’t buy me much, if anything, for my use case, but it would make “save this stuff for later” more of a nuisance (a sync across a bunch of rows vs a single upsert).

That said, I’ve worked on projects that had almost no normalization, and it was pure hell. I’m certainly not arguing against normalizing; just saying that data blobs are useful sometimes.

codr7 · 10 months ago
Yeah, I'm def not taking a any more mongodb jobs if I can avoid it.

I'm fine with using it for simple throw away stuff, but deciphering someone else's ball of json is soul killing.

yen223 · 10 months ago
There are two reasons to use a jsonb column:

1. To store JSON. There's a pattern where when your webserver calls into some third-party API, you store the raw API response in a JSONB column, and then process the response from there. This gives you an auditable paper trail if you need to debug issues coming from that 3rd-party API.

2. To store sum types. SQL not supporting sum types is arguably the biggest deficiency when modelling data in SQL databases. There are several workarounds - one of them being "just chuck it in a JSONB column and validate it in the application" - but none of the workarounds is particularly great.

zie · 10 months ago
I would add:

3. End-user extra fields. Stuff you don't care about, but someone somewhere does.

devin · 10 months ago
Even if you care about it, you will still often wind up with a junk drawer of JSONB. I don't really see it as a problem unless people are writing bad queries against it instead of lifting values out of the JSONB into their own columns, etc.
mnahkies · 10 months ago
Yeah exactly, and I'll take a JSON(B) column over TEXT with maybe valid serialised JSON, maybe RON, maybe something completely random any day
randomdata · 10 months ago
Most developers using these kinds of tools these days are actually building their own database management systems, just outsourcing the persistence to another DMBS, so there isn't a strong imperative to think about good design so long as it successfully satisfies the persistence need.

Whether we actually should be building DMBSes on top of DMBSes is questionable, but is the current state of affairs regardless.

tightbookkeeper · 10 months ago
A previous employer thought that sql databases didn’t understand graphs. So they made their own system for serializing/deserializing graphs of objects into Postgres

. They never used queries and instead had their own in-memory operators for traversing the graph, had to solve problems like deleting an entry and removing all references, partial graph updates.

And I still don’t think it works.

theamk · 10 months ago
This needs working schema migration process, including ability to undo schema change if the new column tanks the performance or breaks stuff.

If there are CLI tools involved, you also need to ensure you can handle some downtime, or do synchronized version update across company, or support both old and new schemas for a while.

If a database is not part of team's primary product all of this could be missing.

tomcam · 10 months ago
I wrote this to help beginners: https://tomcam.github.io/postgres/
avg_dev · 10 months ago
this is really nice. i am glad the author put it together. i didn't know the pg docs were 3200 pages long! i have been using it for a while and try to learn as i go. i really do like the docs. and i also like to read articles on various particular subjects as i find a need to.

i think the author might find it helpful to readers to add a note to https://challahscript.com/what_i_wish_someone_told_me_about_... that if someone is selecting for b alone, then an index on columns (b, a) would work fine. i think this is kind of implied when they talk about selecting on a alone, but maybe it wouldn't hurt to be extra explicit.

(i didn't spend much time on the json/jsonb part since i rarely use that stuff)