Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

Why do everyone seem so hyped on this when it can't really work properly IRL? If you have large amounts of data that need to be stored the app would die instantly, or leave all your users waiting for their changes to be saved.

What am I missing?



> Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

"Lots of readers few writers" is an extremely common application property tho. Your average HN has significantly more reads than it has writes, especially if you bump the "worthless" writes (e.g. vote counts) out of using the DB and only flush them to the durable DB one in a while, for instance.

And with SQLite's WAL support it's supported even better: while it still has a singular writer, the writer doesn't block the readers anymore, which is a workload issue in the exclusive case (as the single writer would bring read concurrency down to 0).


Another pattern to avoid "worthless" writes is using statistical sampling. I.e. if you have votes in the >100.000 range, generate a random number p in [0, 1], and only perform a write if p < 0.01 - when reading votes, multiply by 100 etc. Of course, you have to assess individually if its feasible for your operation, hence the "worthless".


Is there something you can point to that explains this "flush them to the durable DB once in a while" pattern in more detail?


Instead of running each small write query as the event occurs (a vote), collect them in memory and at some time interval (1m, 5m, or whatever your UX will tolerate) write them to the DB in batch with one (or a few) writes.


Thank you. This fits a current need of mine perfectly.

Do you have any other helpful tips you’d like to share? Any thoughts between Redis and Memcache?


Not the GP, but if the GP described a scenario that is useful to you, redis can #1 act as the memory cache if the processes themselves don't last long enough for the optimization to be useful, and #2 can act as a cheap message queue to funnel similar requests into a single (or small number of) processes for aggregation before writing to disk.


- Most transactions are read-only

- "Large" applications can usually be sharded by account. This means 1 file per account, and can easily be put on the most optimal geolocation of the account

- If you defer locking until commit, allowing multiple writers ( https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ). This is good enough for most applications anayway.

- Sqlite simple, fast enough for almost anything, supports a good set of features and datatypes, is very easy to embed.


Why would I bake all of those assumptions and limitations into my system though just on the hope it won't ever become a problem


because you can move faster and explore you problem domain cheaper and validate your solution earlier. then, if you "struck gold" and happened to arrive to some product market fit, then even if you would need to rewrite big chunks of your solution, to swap out your persistence layer, you have a solid specification to follow, which is your sqlite-based implementation!

such a rewrite is a lot more predictable endeavor, then building the initial solution, that it's a great problem to have :)

meanwhile, your UI don't have to change and a lot of your other glue code or business-logic code don't have to change either, IF you haven't hardcoded direct calls to SQLite everywhere in your program :)

eg. I used HoneySQL with great success! My queries are safely assembled from Clojure data structures and I had a single function, which I used to format them to the desired SQL dialect H2DB/SQLite/MySQL/Postgres, execute them and parse the results back into Clojure data structures and even take care of lazily paginating through long result sets, without burdening the consumer of the data with such details.

https://github.com/seancorfield/honeysql


Depending on your expected lifecycle of the app, you should use an adapter between the DB and the rest of your app.

Have an interface made up of clear methods which you do all your business through. That way, if you want to change the DB, you only need to rewrite the adapter.

I believe SQLite is a great, low-insallation-needs portable DB you can start your project with. If it turns out your startup gets millions of users, you can afford to switch.


> Depending on your expected lifecycle of the app, you should use an adapter between the DB and the rest of your app.

A sane design, but realise that limitations tend to leak through abstractions.


This questions goes both ways: Why put effort into something that might never even be a problem?


Simpler backups. Simpler restores. Easier multi tenancy. Easier to have data+app near customer. No firewalls. No passwords. No keys.

Why do you assume you'll run into problems? The moment you're running into problems, you better have a good plan with any RDBMS.


How hard is it to migrate if/when.


Quite right it’s not one size fits all but for any site that’s mostly read only it’s a brilliant solution.

Simon Willison has written about it and coined the term “baked data”: https://simonwillison.net/2021/Jul/28/baked-data/

Mozilla.org uses this architecture, Django app running off SQLite with the db rsync’ed to each application server.


The confusion is probably a lot of us work at smaller companies that serve a wide solution to a niche customer, and that kind of app has a lot of reads and writes but doesn't need to scale. This app might be doing the invoicing/shipments/specialist parts of a business for example.

Whereas there is another different kind of Engineering which I probably will never be a part of (simply due to mathematics of available positions doing it) where you are scaling something up for millions of users but the app is much simpler like a Twitter or Reddit, and the challenge is in the scaling.


Even for those small niche apps for businesses there are a huge number that are unlikely to be doing more than the 10-20 write transactions / second that SQLite can support.


Probably on average correct, but there can be bursts.


The default settings of SQLite are very conservative and essentially enforce serial writes. With tuning and loosening that enforcement, you can go from 50 writes per second to 50,000.

Edit: forgot to mention that yes a major part of that is batching writes into fewer, bigger transactions; AFAIK you can't really get around that.


https://www.sqlite.org/faq.html#q19

>INSERT is really slow - I can only do few dozen INSERTs per second

>Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.

>By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction.

>Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.


What am I missing?

Many sites are Read (almost) Only. For sites where users interactively query/view/explore the data, but (almost) never write their own, it works great.


Speaking of this, I really wish there was SQLite support in WordPress...


A blog is the perfect example of where SQLite should be used other a DB server.


If you chuck Varnish in front of it, does it matter what you use?

Edit: was being serious: if your data is that static you can statically generate it. But I get that CMS is convenient so with that caching is where you get the performance win. A blog post either never updates or gets 1 or 2 edits max.


use more than one SQLite file? we have one per day and project for example.


I don't know if you're joking or not, but this would just be reinventing the Postgres/SQL Server/Oracle/MySQL wheel using duct tape and wishes.

If you're doing something that multiple systems have had millions of hours of development to do, just use one of those.


You are wrong. Using multiple SQLite files solves quite many problems, e.g. let's say your user can have multiple products where each product has its own properties. This can be presented as 3 tables in SQL database: users, products, properties. Now with SQLite you can drop one table (users). This simplifies queries, depending on your situation you can use simpler indexes and etc.

Extra benefit: you can allow your user to download all his SQLite db as per GDPR requirement.


How do you join data across hundreds of databases?

And how is the query plan for indexes when doing so?


Using ATTACH. Works a treat. SQLite query planner knows about ATTACH.


I have seen it suggested to use one table per db to minimize lock contention and simplify backups/vacuums. Do you know if there really is nothing to lose with ATTACH spam? Any tradeoffs at all?


I wouldn't design that way. SQLite has a default limit of 10 attached databases and a hard limit of 125.

Having several areas of concern managed by distinct databases, with some occasions which warrant joins across the domains, this is the more natural use pattern. ATTACH loses foreign key constraints and triggers, among other things.


My understanding is that it just works. They're still all in the same process, and SQLite has all the information it needs to do joins. Transactions get more complex but are still atomic.


You don’t.


But why? That seems such an unnecessary hack.


Until we know how they are used, we are just speculating: I for one am curious ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: