As a non-dev intruder I have to say that I love SQLite. I do a lot of data-analysis and it makes everything easy, from fast SQL Wizardry to sharing the DB just coping a file! Just how amazing is that?!
It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!
I work in data analysis too and SQLite is to me what MS Excel is to other people in the industry. I have development background so I never picked up in MS Excel to same degree as my colleagues from statistics backgrounds. In corporate organisations where DBAs monitor the use of their database with an eagle eye SQLite is a fantastic option to analyse several millions of lines of data on your desktop. I find DB Browser for SQLite (DB4S)[0] is invaluable when using SQLite.
If you like SQLite for data analysis, you might want to check out DuckDB https://github.com/cwida/duckdb which is billed as "SQLite for analytics."
SQLite is a row store, which is best for OLTP (point queries, inserting/updating/deleting one or a few rows at a time), while DuckDB is a column store, which means the data layout has values from the same column stored contiguously, making aggregation queries (GROUP BY) perform much better.
Excel is nice as a simple sandbox, and PowerQuery is amazing IMO, the problem is that it's basically impossible to work with large datasets. If you didn't know about it I recommend you to play a bit with it.
I used to use this but if you have a JetBrains sub you can open these files directly. Or even better if its a full subscription DataGrip is fantastic and they recently added MongoDB support. Which is not perfect but good enough!
My tools of choice are similar: sqlite, DB browser, Jupyter. I've found that DB browser struggles with analytical queries on tables over a few GB though. Still an invaluable exploratory tool.
Most of my web apps’ databases are an SQLite file. It’s more than enough for the ammount of traffic they serve and the db files are easy to set up and backup..
I also always wondered why sqlite isn't used more in websites. Especially if you split heavy write workloads to a separate database file it scales quite far.
I can't even tell you how many times a dev came to me with some weird question that was very simply answered by "just use sqlite."
I've never thrown more than 20GB of data at it, but it always handles things admirably, and it often (to me) makes the right calls about the level of complexity required (accepts column type defs but really just doesnt care if it maps to string.)
I have done several multi-terabyte queries on it. One time I had a race with another dev to see who could get the right answer on a 3TB dataset. I was finished before his spark cluster had even spun up.
I'm a front-end dev that can wing backend from time to time, and I use SQLite as much as possible. On multiple projects now I've run into complications due to complexity or environments, and adding a simplified local development backend with sqlite kept down time to a minimum.
Same. The last time I was this excited about an RDBMS in your pocket was when BerkleyDB released their Java database in a single JAR file. I am not sure when they initially released it but I remember doing some hobby projects in 2003.
I do occasionally use Apache Derby[0] in similar manner to SQLite. Mostly when data types are critical. Works well when using DBeaver or any SQL editor that allows you to add JDBC connections.
At my job we started using H2 which much like SQLite has shareable databases. We now wish more databases were this simple. Its just too efficient to share a db in its total current state and someone with a little more experience or who does not have tunnel vision can probe and resolve issues. Plus you can always go between states of broken vs working.
For our B2B application, we've been using SQLite as the exclusive means for reading and writing important bytes to/from disk for over 3 years now.
We still have not encountered a scenario that has caused us to consider switching to a different solution. Every discussion that has come up regarding high availability or horizontal scaling ended at "build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore". We have yet to go down this path, but we have a really good picture of how it will work for our application now.
For the single-node-only case, there is literally zero reason to use anything but SQLite if you have full autonomy over your data and do not have near term plans to move to a massive netflix-scale architecture. Performance is absolutely not an argument, as properly implemented SQLite will make localhost calls to Postgres, SQL Server, Oracle, et. al. look like a joke. You cannot get much faster than an in-process database engine without losing certain durability guarantees (and you can even turn these off with SQLite if you dare to go faster).
+1 we use SQLite in our BI tool to keep information about user accounts, data sources (cubes) / reports configs, access control rules etc. SQLite works like a charm even in cloud version where we have > 6,000 registered user accounts.
A successful business, but not a demanding database: 6000 users accessing the database hundreds of times per day each, for small reads and small writes, are unlikely to need more than a few GB of space (the parts of the database in use should fit in the operating system's disk cache) and unlikely to exceed 20-30 small I/O operations per second.
With current volume, our largest transactional datastore is ~50GB.
That said, if we had a situation where we were pushing such volume that the transactional store was 50 TB, and we were still within the scope on the overall project, I see no reason why this would be a problem for us. As I mentioned prior, the context of our applications is single process, single server. Obviously, this is not suitable for all edge cases and you will potentially be better off with a hosted solution in such cases.
At the end of the day, there is no arbitrary number of bytes where B-Tree indicies just stop working because you didn't shell out for a SQL Server license.
Out of interest, what circles are you moving in? Are people filling their databses with videos or something?
I'm sure I just lack the imagination, but i cant imagine how there can be thwt many companies who can make TB of data they then want to shove in a database. The complete sales history of even a medium sized company should still easily fit in a GB or two.
I often connect to production databases in read only users to do various data analysis. Is this something you can do with SQLite (besides maybe SSHing into the machine)? If not, how do you get around it (if it ever even comes up)?
One is to simply zip up the entire database and send it across the wire. This is most applicable for local development and QA testing scenarios.
Another is to have something in the business application and relevant tooling that allows for programmatic querying of the data we need to look into.
We also have some techniques where we do ETL of the data range we care about from 1 SQLite db to another, then pull down the consolidated db for analysis.
While I love SQLite as much as the next person (and the performance and reliability is really quite remarkable), I can’t understand all the effusive praise when you can’t do basic things like dropping columns. How do people get around this? Do you just leave columns in forever? Or go through the dance of recreating tables every time you need to drop a column?
SQLite is for storing data in an environment where SQL is meaningful. Anyone wanting to do database admin tasks (like adjusting table schema) would be well advised to go with a Real Database.
SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.
I just sucked the existing table into RAM and recreated the table. I did it on a transaction so there was no risk of data loss.
In my case the data was always 10s of MBs.
Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.
When do you need to drop a column in a production DB? Maybe my anecdotal bubble is about to burst, but I work in the public sector, and have for a while and on our 200 different production DBs behind around 300 systems we’ve never dropped a column.
Depends on the maturity of your schema - if it's all figured out based on your business domain it won't happen much. If you're still finding product-market fit (or equivalent) splitting a table into two happens sometimes.
Pretty sure they must, row based storage on disk would practically require it just to not completely waste all of the space you've just gained from deleting the column by leaving a gap on every single row.
I agree with you. SQLite drove me nuts when it came to changing your database. This is one of the reasons I just use DB Browser for SQLite (DB4S). It takes care of all the complexity.
The general strategy is to create a new table, insert data from the old table,
drop the old table, rename the new table, and re-create the indexes:
create table foo2 (
col1 int,
col2 text
);
insert into foo2 sleect col1, col2 from foo;
drop table foo;
alter table foo2 rename to foo;
create index on foo(col1);
As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.
We use sqlite as a smaller cache of a much larger postgres db. The cache is generated by a job and yes is regenerated every time before being pushed to machines that need it.
Think of SQLite as a file format which happens to have a query interface, and not a database.
MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.
(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)
For data analysis workloads i just load in my raw source data and then develop a series of scripts that create new tables or views on top of those raw inputs.
For my use cases I've thusly never had to drop/alter a column... but I understand it could be very annoying.
One of the great things one can learn from SQLite is the degree to which they unit (and integration) test their source code. It's honestly the best unit test document I have read in my career to date: https://www.sqlite.org/testing.html.
IIRC, some company wanted to use SQLite on an airplane, so they paid the devs enough to bring the test suite up FAA standards. IIRC, they have code coverage of every machine instruction.
That was my business plan: Do the intense testing required for avionics, then sell the test cases to aviation manufacturers. That plan didn't work out - I've never sold the tests to any aviation manufacturer; not one. But the TH3 test harness has had side benefits that I did not anticipate, not the least of which is that it allows us to maintain a complex code base that is run on billions of devices with just a few developers.
SQLite is great but its decision in not having a standard datetime/timestamp datatype -- a standard in all other relational databases -- has always struck me as a surprising omission, but in retrospect I kind of understand why. Datetimes are undeniably difficult.
So sqlite leaves the datetime storage decision to the user: either TEXT, REAL or INTEGER [1]. This means certain datetime optimizations are not available, depending on what the user chooses. If one needs to ETL data with datetimes, a priori knowledge of the datetime type a file is encoded in is needed.
In that sense, sqlite really is a "file-format with a query language" rather than a "small database".
Simple. Bulk insert with a 999-parameter limit is just painful; if each entry has 9 columns, you can’t even insert 112 rows at once. In practice distros already compile with higher default; e.g. Debian compiles with -DSQLITE_MAX_VARIABLE_NUMBER=250000, still way higher than this new default.
What's the point? Inserting batches of 1000 rows at once, or even 10k rows at once is hardly any faster overall than using batches of 100 rows, assuming there are no delays in presenting the batches to the DB.
It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!
[0]https://sqlitebrowser.org/
SQLite is a row store, which is best for OLTP (point queries, inserting/updating/deleting one or a few rows at a time), while DuckDB is a column store, which means the data layout has values from the same column stored contiguously, making aggregation queries (GROUP BY) perform much better.
And check SQLiteStudio for Windows, is nice too.
Why naive? That's the experience of all of us with SQLite.
I've never thrown more than 20GB of data at it, but it always handles things admirably, and it often (to me) makes the right calls about the level of complexity required (accepts column type defs but really just doesnt care if it maps to string.)
I'm a front-end dev that can wing backend from time to time, and I use SQLite as much as possible. On multiple projects now I've run into complications due to complexity or environments, and adding a simplified local development backend with sqlite kept down time to a minimum.
SQLite is awesome.
[0]https://db.apache.org/derby/
We still have not encountered a scenario that has caused us to consider switching to a different solution. Every discussion that has come up regarding high availability or horizontal scaling ended at "build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore". We have yet to go down this path, but we have a really good picture of how it will work for our application now.
For the single-node-only case, there is literally zero reason to use anything but SQLite if you have full autonomy over your data and do not have near term plans to move to a massive netflix-scale architecture. Performance is absolutely not an argument, as properly implemented SQLite will make localhost calls to Postgres, SQL Server, Oracle, et. al. look like a joke. You cannot get much faster than an in-process database engine without losing certain durability guarantees (and you can even turn these off with SQLite if you dare to go faster).
There even are solutions that do this already:
- rqlite (https://github.com/rqlite/rqlite)
- actordb (https://www.actordb.com/)
I regularly see 50TB total of databases on SQL Server, and scaling up to thousands of clients.
That said, if we had a situation where we were pushing such volume that the transactional store was 50 TB, and we were still within the scope on the overall project, I see no reason why this would be a problem for us. As I mentioned prior, the context of our applications is single process, single server. Obviously, this is not suitable for all edge cases and you will potentially be better off with a hosted solution in such cases.
At the end of the day, there is no arbitrary number of bytes where B-Tree indicies just stop working because you didn't shell out for a SQL Server license.
I'm sure I just lack the imagination, but i cant imagine how there can be thwt many companies who can make TB of data they then want to shove in a database. The complete sales history of even a medium sized company should still easily fit in a GB or two.
How big is each database? With SQLite, each database is its own file.
The theoretical limit for such a file is 140 TB, but the practical limit is probably much lower (https://www.sqlite.org/whentouse.html).
One is to simply zip up the entire database and send it across the wire. This is most applicable for local development and QA testing scenarios.
Another is to have something in the business application and relevant tooling that allows for programmatic querying of the data we need to look into.
We also have some techniques where we do ETL of the data range we care about from 1 SQLite db to another, then pull down the consolidated db for analysis.
https://sqlite.org/inmemorydb.html
SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.
In my case the data was always 10s of MBs.
Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.
Do the FK relationships get destroyed??
As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.
MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.
(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)
For my use cases I've thusly never had to drop/alter a column... but I understand it could be very annoying.
So sqlite leaves the datetime storage decision to the user: either TEXT, REAL or INTEGER [1]. This means certain datetime optimizations are not available, depending on what the user chooses. If one needs to ETL data with datetimes, a priori knowledge of the datetime type a file is encoded in is needed.
In that sense, sqlite really is a "file-format with a query language" rather than a "small database".
[1] https://stackoverflow.com/questions/17227110/how-do-datetime...
Deleted Comment
I don't want to know the use case for this.
Keep rocking on, SQLite. It's the first tool I reach for when prototyping anything that needs a DB.