Quirks, Caveats, and Gotchas in SQLite

ajkjk · 3 years ago

> The key point is that SQLite is very forgiving of the type of data that you put into the database. For example, if a column has a datatype of "INTEGER" and the application inserts a text string into that column, SQLite will first try to convert the text string into an integer, just like every other SQL database engine. Thus, if one inserts '1234' into an INTEGER column, that value is converted into an integer 1234 and stored. But, if you insert a non-numeric string like 'wxyz' into an INTEGER column, unlike other SQL databases, SQLite does not throw an error. Instead, SQLite stores the actual string value in the column.

wtf. who would ever want that?

at_a_remove · 3 years ago

Me!

I'm often exploring data where either there's no defined standard or the use of the data has drifted from the standard. Now, I could go over this line-by-line, but instead my go-to has been "Hey, let's throw this into SQLite, then run stats on it!" See what shakes out. SQLite kindly and obediently takes all of this mystery data, which ends up being nothing like what I was told, and just accepting it. Then I can begin prodding it and see what is actually going on.

This is something that has come up for me for at least a decade: chuck it in SQLite, then figure out what the real standard is.

thfuran · 3 years ago

It's reasonable to have an arbitrary/untyped data type but an integer data type should not accept non-integers.

muttled · 3 years ago

It's especially great for large important legacy datasets that similarly did no enforcement. You can then interact with the data and clean it up through some commands without trying to work with massive datasets in Excel where it sometimes just crashes the entire app.

rscho · 3 years ago

I do exactly the same. You import from CSV, everything ends up as a string, and then you copy the data to a STRICT table so you can understand what's really going on.

throw10920 · 3 years ago

Were SQLite not such a stellar piece of software, I would complain, and ask that this kind of flexibility be disabled by default, with an option to explicitly enable it if necessary.

ajkjk · 3 years ago

well, it seems like you should chuck it into an untyped table first. If there are types on a column I'd... really want them to do something.

rscho · 3 years ago

> wtf. who would ever want that?

Each and every biostatistician on this planet. Especially those touching clinical data. Personally, I was saddened to learn that DuckDB did not include dynamically typed, or at least untyped, columns. Happily my data loads are usually small enough for a row-oriented data store.

CHECK constraints, now in conjunction with STRICT tables, are the best invention since sliced bread! If I could improve on one thing, it would be to remove any type notion from non-STRICT tables.

masklinn · 3 years ago

Fwiw the ANY type is valid in strict tables, and does what it says.

Slightly more so than in non-strict tables in fact: it will store exactly what you give it without trying to reinterpret it e.g. in non-strict, a quoted literal composed of digits will be parsed and stored as an integer, in strict, as a string.

Strict mode also doesn’t influence the parser, so e.g. inserting a `true` in a strict table will work, because `true` is interpreted as an integer literal.

mytherin · 3 years ago

We have a PR in the works that is adding support for sum types to DuckDB [1]. Not quite fully dynamic, but perhaps still useful :)

[1] https://github.com/duckdb/duckdb/pull/4966

dspillett · 3 years ago

This always put me off a bit too. I have memories of mySQL supporters back in the early days defending it storing dates like 2022-02-31 and other such type/range/other issues with the standard refrain of “but SELECT * is blazing fast!”…

As of last year there is an option to make things more strict (https://www.sqlite.org/stricttables.html) though as SQLite doesn't have real date types, unless you are using one of the integer storage options code could insert invalid dates like mysql used to allow.

--

EDIT: having actually read the linked article, it explicitly mentions the date type issue also.

paulclinger · 3 years ago

Even in the absence of strict tables, you can also add a CHECK constraint whenever limits are needed on the value types. Here is one example from a recent discussion in SQLite forum threads: `check(datetime(x, '+0 seconds') IS x)`.

RedShift1 · 3 years ago

If you use integer unix timstamp as datatype for dates, you technically cannot insert an invalid date

jrockway · 3 years ago

It's even more fun in a boolean column, where some databases accept 't'/'f', 'true'/'false', etc. SQLite accepts some of those, but treats other ones as text.

groue · 3 years ago

I do, for JSON columns. I store UTF8 strings in SQLite, so that it is easy to see JSON values with a plain `SELECT *`). And I load blobs, because I code in Swift and the standard JSON decoder eats raw UTF8 memory buffers, not strings.

This avoids two useless conversions:

- first from a C string loaded from SQLite to a Swift Unicode string (with UTF8 validation).

- next from this Swift Unicode string to a UTF8 memory buffer (so that JSONDecoder can do its job).

SQLite is smart enough to strip the trailing \0 when you load a blob from a string stored in the database :-)

systemvoltage · 3 years ago

SQLite is an excellent app database for local things. It doesn't compete with PostgreSQL or MySQL for huge number of reasons. So, to make it fit in the competition model, perhaps there are a bunch of things are glossed over.

Does anyone see a massive pro-sqlite movement going on? Sort of like what happens in JS-ecosystem. Everyone is bandwagoning on it. Criticism of SQLite is much welcomed, specifically exemplifying what its role is and which use cases it serves really well.

SPBS · 3 years ago

SQLite needs this, because changing column types is such a pain in the ass. The answer to "I want to move from INT primary keys to TEXT/UUID, how do I change the column type?" is "just insert the data into the column because SQLite allows everything".

And the reason changing column types is so hard is because, uh, SQLite stores its schema as human readable plaintext (easier to keep compatibility between versions) and not normalized tables like other databases.

dekhn · 3 years ago

What does it mean for a database to have primary keys of different types? In particular, primary keys are frequently used for clustering and other performance enhancements and I would expect that they would need a total ordering that made sense.

tejtm · 3 years ago

> wtf. who would ever want that?

Have you met non-computer scientists?

Anything and everything is fair game and it is probably for the best.

Job security anyway

npilk · 3 years ago

Setting aside some of the technical choices, I wish we saw more product builders publish transparent perspectives on the 'shortcomings' of their product. SQLite also has a list of reasons why you would or wouldn't want to use SQLite that I've always enjoyed: https://www.sqlite.org/whentouse.html

donatj · 3 years ago

I'd say it's a lot easier to do when you've already got mass adoption.

herge · 3 years ago

I hit annoyances when using sqlite when dealing with some corner-cases. I wanted to implement a lock and share some small amount of data (who did what when) between two different linux users on the same machine.

I figured "sqlite is better than fopen, let's use that!", but between directory permissions, all the WAL files, probably the sqlite3 python lib and Diskcache (https://pypi.org/project/diskcache/) not helping things, it was a real pain, where regularly under different race conditions, we would get permission denied errors. I managed to paper it over with retries on each side, but I still wonder if there was a missing option or setting I should have used.

dspillett · 3 years ago

> I managed to paper it over with retries on each side

You'll hit that elsewhere as well. “Papering over with retries” is standard MS advice for Azure SQL and a number of other services: https://learn.microsoft.com/en-us/azure/architecture/best-pr...

m_st · 3 years ago

I feel you! I had an ASP.NET Core Web API using a SQLite file for data storage. It was working exceptionally fine in development. When we released it, as soon as more than 10 users were throwing request upon it, it got so deadly slow. Obviously caching and some tweaking helped a lot. But it was far from easy to find, understand and test (!) all required options and optimizations.

Other than that I still love SQLite.

Scarbutt · 3 years ago

Was WAL enabled?

masklinn · 3 years ago

> Foreign Key Enforcement Is Off By Default

That is by far my biggest annoyance with sqlite.

Not only that, but the FK enforcement must be enabled on a per-connection basis. Unlike the WAL, you can’t just set it once and know that your FKs will be checked, every client must set the pragma on every connection. Such a pain in the ass.

The error reporting on FK constraint errors is also not great (at least when using implicit constraints via REFERENCES sqlite just reports that an FK constraint failed, no mention of which or why, good luck, have fun).

More generally, I find sqlite to have worse error messages than postgres when feeding it invalid SQL.

massysett · 3 years ago

There is a compile-time option to always enable foreign-key constraints.

jeffreyrogers · 3 years ago

Can't you just wrap the connection handling code with some logic that automatically executes the pragma?

masklinn · 3 years ago

Sure.

The problem is that you have to carefully remember to do that in every project, and if other applications need write access that they do so as well.

drej · 3 years ago

Anyone can share their experience with the somewhat new STRICT mode? Does it help? I tend to use Postgres when available, primarily for the added strictness, but I'd surely prefer SQLite in more scenarios as it's easier to operate.

masklinn · 3 years ago

I use strict tables. I’m now realising it did help when I migrated tables from one data type to an other, because I’d missed updating some of the code.

I didn’t realise because it was just what I’d expected, but without strict tables I’d have had to debug strange errors on retrieval rather than the type error on insertion I got.

bob1029 · 3 years ago

We have extremely heavy SQLite usage throughout. Strict mode would only cause trouble for us. Most of our SQLite access is managed through a domain-specific C# library we developed. It handles all of the parsing & conversion internally. We don't touch the databases directly in most cases.

rch · 3 years ago

I've run into situations where I need a db, but all I have is Python, so the embedded SQLite is my best option. It would be ideal if the interface was fully compatible with postgres (or a practical subset), even if SQLite was handling storage under the hood.

torbica · 3 years ago

It does not really help in our scenario. I really hope that this feature will evolve more in next versions.

srcreigh · 3 years ago

What's missing for your scenario?

chasil · 3 years ago

Like Oracle, SQLite lacks boolean (bit) columns, but it does support bitwise operators, so multiple boolean values can be packed into an integer with powers of 2.

Setting a value with an update is a bitwise or, and checking a value is a bitwise and.

  $ sqlite3
  SQLite version 3.36.0 2021-06-18 18:36:39
  Enter ".help" for usage hints.
  Connected to a transient in-memory database.
  Use ".open FILENAME" to reopen on a persistent database.
  sqlite> select 2 | 1;
  3
  sqlite> select 3 & 1;
  1

Oracle only has a "bitand" function, but "bitor" has been posted on Oracle user sites:

  create or replace function bitor(p_dec1 number, p_dec2 number) return number is
  begin if p_dec1 is null then return p_dec2;
        else return p_dec1-bitand(p_dec1,p_dec2)+p_dec2;
        end if;
  end;
  /

That isn't necessary in SQLite.

Searches on these will likely require full table scans, a definite performance disadvantage.

christophilus · 3 years ago

The group by behavior is useful. I wish Postgres did that.

dspillett · 3 years ago

Most DBs don't (and won't ever) support that because the results become undefined. You could get different values out for the same query in different versions of the DB (due to query planner changes) or even within the same version and the same database over time (as the relative sizes of your tables change so the planner choses a different route, or index options are changed, so the planner is forced to go with a different route or has the option of a better one).

Though in a world where “eventual consistency” is accepted, may be “eeny, meeny, miny, moe” will be more generally considered OK at some point :)

At least SQLite tries to be consistent (taking the value from the row where the aggregate found its value) where that is possible (which it often isn't) which mySQL (which also allows non-grouped non-aggregated columns in the projection list) does not.

deathanatos · 3 years ago

Use of that group by behavior is a bug, nearly every time I've seen it. (Though usually I'm seeing it in MySQL.)

Columns that are not part of the group, an aggregate of the group, or a functional dependency of the group aren't stable — there are multiple rows that could supply the value, and they needn't have the same value.

kwoff · 3 years ago

Yeah, I've seen that bug in a monitoring graph that was showing data for say every 5 minutes but the (MySQL) database table had data per minute; so you'd see a spike in the graph, but refresh a minute later and voila no spike anymore.

unterdenlinden · 3 years ago

You can do that with window functions in Postgres:

SELECT MAX(salary) OVER (), first_name, last_name FROM employee;

systems · 3 years ago

exactly, SQL being a declarative language, you should always prefer explicit over implicit

SQL bugs are very hard to detect, when the query return a result that looks right, and because the language is declarative its easy to do those mistakes

tritiy · 3 years ago

Number 6 was really surprising:

SELECT max(salary), first_name, last_name FROM employee;

This returns one row! AFAIK all other databases would return one row per record in the table where first_name, last_name would be from the row while max(salary) would be the value from the row with max salary. Is this SQL ANSI compatible?