Readit News logoReadit News
munk-a · a year ago
While postgres is indeed case sensitive usually writing queries with keywords in all caps is an effort to increase legibility for visual pattern matching. It absolutely isn't needed but if I'm debugging a query of yours I will send it through my prettifier so that I can breeze through your definitions without getting hung up on minor weird syntax things.

It's like prettification in any other language - visual structures that we can quickly recognize (like consistent indentation levels) make us waste less time on comprehension of the obvious so we can focus on what's important.

The only thing I really object to is "actuallyUsingCaseInIdentifiers" I never want to see columns that require double quotes for me to inspect on cli.

wiredfool · a year ago
I find all caps identifiers wind up just looking like interchangeable blocks, where lowercase have word shapes. So all caps just slows down my reading.
WorldMaker · a year ago
I feel similarly and I also have friends with Dyslexia with even stronger opinions on it. All caps in addition to being "shouting" to my ancient internet-using brain (and thus rude in most cases), creates big similar rectangular blocks as word shapes and is such a big speed bump to reading speed for everyone (whether or not they notice it). For some of my friends with Dyslexia that have a huge tough time with word shapes at the best of times, all caps can be a hard stop "cannot read" blocker for them. They say it is like trying to read a redacted document where someone just made rectangular black marker cross outs.

Personally, given SQL's intended similarity to English, I find that I like English "sentence case" for it, with the opening keyword starting with a capital letter and nearly every remaining letter lower case (except for Proper Nouns, the truly case-sensitive parts of SQL like table names). Sentence case has been helpful to me in the past in spotting things like missing semicolons in dialects like Postgres' that require them, and/or near keywords like `Merge` that require them or helping to visually make sure the `select` under a `Merge` is intended as a clause rather than starting a new "sentence".

conductr · a year ago
I prefer lower case for my personal legibility reasons and it seems like a prettyfier should be able to adjust to that user’s preference. It’s not a team sport for me so I never had a conflict of styles other than converting public code samples to match my way.
yen223 · a year ago
I've always found it funny that SQL was designed the way it is to be as close to natural English as possible, but then they went ahead and made everything all-caps
andrei_says_ · a year ago
Also sql editors like datagrip color the sql syntax very well.
gwbas1c · a year ago
It's really useful to know this when working with SQL interactively.

Specifically, if I'm banging out an ad-hoc query that no one will ever see, and I'm going to throw away, I don't worry about casing.

Otherwise, for me, all SQL that's checked in gets the commands in ALL CAPS.

archsurface · a year ago
My understanding is that the caps were syntax highlighting on monochrome screens; no longer needed with colour. Can't provide a reference, it's an old memory.
_Wintermute · a year ago
Most of the SQL I write is within a string of another programming language, so it's essentially monochrome unless there's some really fancy syntax highlighting going on.
CoastalCoder · a year ago
I think the "color is all we need" idea makes sense in proportion to how many of our tools actually support colorization.

E.g., the last time I used the psql program, I don't think it had colorization of the SQL, despite running in a color-capable terminal emulator.

It probably doesn't help that terminal colors are a bit of mess. E.g., piping colored output through 'less' can result in some messy control-character rendering rather than having the desired effect.

vundercind · a year ago
Like sigils in that regard. Perl-type sigils are extremely nice... if you're editing in Notepad or some ancient vi without syntax highlighting and the ability to ID references on request. Little point to them, if you've got more-capable tools.
o11c · a year ago
Note that case handling is a place where postgres (which folds to lowercase) violates the standard (which folds to uppercase).

This is mostly irrelevant since you really shouldn't be mixing quoted with unquoted identifiers, and introspection largely isn't standardized.

yen223 · a year ago
Given that other mainstream RDBMSes lets you configure how case handling should happen, Postgres is arguably the closest to the standard.

Usual caveat of how nobody sticks to the ANSI standard anyway applies.

emmanuel_1234 · a year ago
Any recommendation for a prettifier / SQL linter?

Deleted Comment

gnulinux · a year ago
I'm curious about this for DuckDB [1]. In the last couple months or so I've been using DuckDB as a one-step solution to all problems I solve. In fact my development environment rarely requires anything other than Python and DuckDB (and some Rust if native code is necessary). DuckDB is an insanely fast and featureful analytic db. It'd be nice to have a linter, formatter etc specifically for DuckDB.

There is sqlfluff etc but I'm curious what people use.

[1] DuckDB SQL dialect is very close to Postgres, it's compatible in many ways but has some extra QOL features related to analytics, and lacks a few features like `vacuum full`;

homebrewer · a year ago
IDEA if you want to use it for other things (or any other JetBrains IDE). Nothing comes close feature-wise.

If you don't:

- https://www.depesz.com/2022/09/21/prettify-sql-queries-from-...

- https://gitlab.com/depesz/pg-sql-prettyprinter

Or https://paste.depesz.com for one-off use.

NegativeLatency · a year ago
prettier plugin sql, or pg_format

Dead Comment

mannyv · a year ago
The uppercase is usually there to show people what's SQL vs what's custom to your database. In books it's usually set in courier.

I thought he was talking about psql's case sensitivity with table names, which is incredibly aggravating.

avg_dev · a year ago
i agree; but i use caps in my codebase, and lowercase when testing things out manually, just for ease of typing.
munk-a · a year ago
Ditto - if I'm throwing out an inspection query just to get a sense of what kind of data is in a column I won't bother with proper readable syntax (i.e. a `select distinct status from widgets`). I only really care about code I'll need to reread.
MetaWhirledPeas · a year ago
> writing queries with keywords in all caps is an effort to increase legibility for visual pattern matching

Considering most programming languages do just fine without ALL CAPS KEYWORDS I'd say it's a strange effort. I wish SQL didn't insist on being different this way.

I agree with you on prettification though. As long as the repository chooses a prettifier you can view it your way then commit it their way. So that's my advice: always demand prettification for pull requests.

sensanaty · a year ago
I don't write or read SQL too often, but I prefer the ALL_CAPS because it usually lets me know if something is part of the SQL syntax itself or a function or whatever, or if it's referencing a table/column/etc.

Obviously not very foolproof, but automated linters/prettifiers like the one in DataGrip do a good job with this for every query I've ever thrown at it.

grahamplace · a year ago
For checked-in SQL queries, we follow: https://www.sqlstyle.guide/

The combination of all caps keywords + following "the river" whitespace pattern dramatically improves readability in my opinion

neves · a year ago
You can convey more info with color. Any half decent editor can color your SQL.

All caps letters are more similar and harder to read.

marcosdumay · a year ago
Well, as long as you aren't imposing the noisy syntax into everybody by pushing the case-change back into the code...

But getting some editor that highlights the SQL will completely solve your issue.

munk-a · a year ago
I think fighting in PRs over syntax preferences is pretty useless so dev shops should generally have code style guidelines to help keep things consistent. In my company we use all caps casing since we have momentum in that direction but I think that decision can be reasonable in either direction as long as it's consistent - it's like tabs vs. spaces... I've worked in companies with both preferences, I just configure my editor to auto-pretty code coming out and auto-lint code going in and never worry about it.
troyvit · a year ago
> no longer needed with colour.

I think the increased legibility for visual pattern matching also makes SQL easier to read for many of the 350 million color blind people in the world.

jillyboel · a year ago
what is your prettifier of choice for postgres?

Dead Comment

christophilus · a year ago
I’d never stumbled across the “don’t do this” wiki entry[0] before. Very handy.

[0] https://wiki.postgresql.org/wiki/Don%27t_Do_This

gwbas1c · a year ago
Why don't they deprecate some of these features? If they're such easy stumbling blocks, seems like it makes sense to disable things like table inheritance in new schemas, and require some kind of arcane setting to re-enable them.
Macha · a year ago
e.g. the suggested replacement for timestamp is timestamptz, which has its own problems (notably, it eagerly converts to UTC, which means it cannot account for TZ rule changes between storing the date and reading it). If medium term scheduling across multiple countries is something that needs to work in your app, you're kind of stuck with a column with a timestamp and another column with a timezone.
mixmastamyk · a year ago
Several of the broken are SQL standard.
__loam · a year ago
Changing defaults can screw over existing users.
SoftTalker · a year ago
Presumably there are rare exceptions where you DO want to do the thing.
Parodper · a year ago
The money one honestly sounds like a bug.
datadrivenangel · a year ago
This reminds me of SQL Anti-patterns, which is a book that everyone who works with databases should read.
samarthr1 · a year ago
That was a fun read, thanks!

Made me reconsider a few habits I picked up from MySQL land

pavel_lishin · a year ago
A lot of these aren't postgres-specific. (null weirdness, index column order, etc.)

For example, how nulls work - especially how interact with indexes and unique constraints - is also non-intuitive in mysql.

If you have a user table with a non-nullable email column and a nullable username column, and a uniqueness constraint on something like (email, username), you'll be able to insert multiple identical emails with a null username into that table - because a null isn't equivalent to another null.

anarazel · a year ago
> If you have a user table with a non-nullable email column and a nullable username column, and a uniqueness constraint on something like (email, username), you'll be able to insert multiple identical emails with a null username into that table - because a null isn't equivalent to another null.

FWIW, since 15 postgres you can influence that behaviour with NULLS [NOT] DISTINCT for constraints and unique indexes.

https://www.postgresql.org/docs/devel/sql-createtable.html#S...

EDIT: Added link

bpicolo · a year ago
I think this is a good pragmatic default. The use case for the alternative is much more rare.
pavel_lishin · a year ago
I totally agree - but it's not an intuitive default.
marcosdumay · a year ago
> Normalize your data unless you have a good reason not to

Ouch. You don't want to just say that and move on.

The author even linked to a page citing 10 different kinds of normalization (11 with the "non-normalized"). Most people don't even know what those are, and have no use for 7 of those. Do not send people on wild-goose chases after those higher normal forms.

pavel_lishin · a year ago
But the author did have a paragraph explaining, in general, what they mean.

And they're right! I've had to fix a few issues of this in a project I recently got moved to. There's almost never a reason to duplicate data.

Deleted Comment

hobs · a year ago
I guess this is targeted towards noobs, but the answer is pretty much always 3rd normal form if you are clicking this and are not sure.
makr17 · a year ago
Interesting, I would have said Boyce-Codd unless you have a good reason to vary in either direction.
cryptonector · a year ago
The general rule is to normalize to the max, then denormalize till you get the performance that you need.
DanHulton · a year ago
The one exception I'll make from the very start is "add tenant identifier to every row, yes, even if it's linked to another table that has tenant identifiers."

Sure, this means you will have some "unnecessary" `tenant_id` columns in some tables that you could get through a relation, but it saves you from _having_ to include that relation just to limit by tenant, which you will some day almost be guaranteed to want. (Especially if you want to enable row-level security¹ someday.)

¹ - https://www.postgresql.org/docs/current/ddl-rowsecurity.html

marcosdumay · a year ago
Oh, no, it's absolutely not.

It's usually to normalize into the 3rd form. But that's not enough on some cases, that's too much on some other cases, and the reason it breaks is performance about as often as it's not.

jayknight · a year ago
yen223 · a year ago
"normalise til it hurts, denormalise til it works"
jedberg · a year ago
My number one tip: Vacuum every day!

I didn't know this when I started, so I never vacuumed the reddit databases. Then one day I was forced to, and it took reddit down for almost a day while I waited for it to finish.

FreakLegion · a year ago
No autovacuum? At reddit's scale I'm surprised you didn't run out of transaction IDs.
jedberg · a year ago
I turned it off because it would run at inopportune times.

And we did run out of transaction IDs, which is why I was forced to do it.

I never turned on the auto-vacuumer but I did set up a daily vacuum.

Keep in mind I left Reddit 13 years ago and I’m sure they’ve made improvements since.

gtaylor · a year ago
Auto vacuuming is enabled now. We did have some near misses due to long running vacuums that barely completed before wraparounds, but got things tuned over time.

I did none of that work but was on the team where it happened.

ellisv · a year ago
I really wish developers cared more about normalization and stop shoving everything into a JSON(b) column.
beachy · a year ago
Long before databases could even store structured JSON data, junior developers used to bikeshed viciously over the correct degree of normalization.

More experienced developers knew that the correct answer was to duplicate nothing (except for keys obviously) and then to denormalize only with extreme reluctance.

Then databases like mongo came along and encouraged those juniors by giving them something like a database, but where normalization was difficult/irrelevant. The result was a brief flowering of horrible database designs and unmaintainable crap towers.

Now the pendulum has swing back and people have rediscovered the virtues of a normalized database, but JSON columns provide an escape hatch where those bad practices can still flower.

christophilus · a year ago
Eh. JSON has its place. I have some stateful data that is fairly transient in nature and which doesn’t really matter all that much if it gets lost / corrupted. It’s the sort of thing I’d throw into Redis if we had Redis in our stack. But the only storage in my project is S3 and Postgres. Postgres allows me to trivially query, update, analyze the usage of the feature, etc. Normalization wouldn’t buy me much, if anything, for my use case, but it would make “save this stuff for later” more of a nuisance (a sync across a bunch of rows vs a single upsert).

That said, I’ve worked on projects that had almost no normalization, and it was pure hell. I’m certainly not arguing against normalizing; just saying that data blobs are useful sometimes.

codr7 · a year ago
Yeah, I'm def not taking a any more mongodb jobs if I can avoid it.

I'm fine with using it for simple throw away stuff, but deciphering someone else's ball of json is soul killing.

yen223 · a year ago
There are two reasons to use a jsonb column:

1. To store JSON. There's a pattern where when your webserver calls into some third-party API, you store the raw API response in a JSONB column, and then process the response from there. This gives you an auditable paper trail if you need to debug issues coming from that 3rd-party API.

2. To store sum types. SQL not supporting sum types is arguably the biggest deficiency when modelling data in SQL databases. There are several workarounds - one of them being "just chuck it in a JSONB column and validate it in the application" - but none of the workarounds is particularly great.

zie · a year ago
I would add:

3. End-user extra fields. Stuff you don't care about, but someone somewhere does.

devin · a year ago
Even if you care about it, you will still often wind up with a junk drawer of JSONB. I don't really see it as a problem unless people are writing bad queries against it instead of lifting values out of the JSONB into their own columns, etc.
mnahkies · a year ago
Yeah exactly, and I'll take a JSON(B) column over TEXT with maybe valid serialised JSON, maybe RON, maybe something completely random any day
randomdata · a year ago
Most developers using these kinds of tools these days are actually building their own database management systems, just outsourcing the persistence to another DMBS, so there isn't a strong imperative to think about good design so long as it successfully satisfies the persistence need.

Whether we actually should be building DMBSes on top of DMBSes is questionable, but is the current state of affairs regardless.

tightbookkeeper · a year ago
A previous employer thought that sql databases didn’t understand graphs. So they made their own system for serializing/deserializing graphs of objects into Postgres

. They never used queries and instead had their own in-memory operators for traversing the graph, had to solve problems like deleting an entry and removing all references, partial graph updates.

And I still don’t think it works.

theamk · a year ago
This needs working schema migration process, including ability to undo schema change if the new column tanks the performance or breaks stuff.

If there are CLI tools involved, you also need to ensure you can handle some downtime, or do synchronized version update across company, or support both old and new schemas for a while.

If a database is not part of team's primary product all of this could be missing.

tomcam · a year ago
I wrote this to help beginners: https://tomcam.github.io/postgres/
avg_dev · a year ago
this is really nice. i am glad the author put it together. i didn't know the pg docs were 3200 pages long! i have been using it for a while and try to learn as i go. i really do like the docs. and i also like to read articles on various particular subjects as i find a need to.

i think the author might find it helpful to readers to add a note to https://challahscript.com/what_i_wish_someone_told_me_about_... that if someone is selecting for b alone, then an index on columns (b, a) would work fine. i think this is kind of implied when they talk about selecting on a alone, but maybe it wouldn't hurt to be extra explicit.

(i didn't spend much time on the json/jsonb part since i rarely use that stuff)