Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Because an SQL database uses a schema or structure, this means changes are difficult. Say you’re running a production database full of a million records.

Articles like this one perpetuate the myths in the minds of young developers. First off, “millions of records” is nothing this days. More importantly, the scheme ends up living somewhere. If it’s not in your database, you’re likely managing it in the app. There’s no free lunch when it comes to scheme for a typical SaaS.



I personally stoped reading after this sentence :

> And while SQL statements are fun, it’s easy to drop all tables while futzing with a key or corrupting an entire repository with a malformed query.

This seem to indicate that the writer don't real get a grasp of very basic modern SQL databases features such as permissions, constraints and transactions.

Without such understanding how can one form an accurate comparison of NoSQL benefits vs SQL. Or maybe worse the author has a good understanding but prefer to make bold and false statements to push his point.


It isn’t even a permission thing. If you are just trying to query some data you don’t accidentally replace “select” with “delete”. The author is trying too hard to put sql in a bad light. Any syntax I have seen to query nosql databases looks clumsy to me as well. Sql I am familiar with, so I am probably biased, but they both have they pros and cons.


Totally agree, I thought this was a big red flag. What an odd thing to say.

Sure enough, the rest of the article it littered with similar strange, and inaccurate assertions. This is not a good article.


It reads like SEO spam: a plausible collection of statements scraped from other articles and then cobbled together with no understanding. It's a shame that stackoverflow has sunk to this.


It really does. Just kind of rambles, throws around a lot of tech words. Meanwhile none of the points really made any sense.

Never really read anything on their blog before, but this doesn’t make me want to.


> If it’s not in your database, you’re likely managing it in the app

“Managed in your app” sounds like a benign state of affairs.

Anything that doesn’t have a home ends up smeared across your entire codebase. It isn’t that it’s in the app, it’s that it’s everywhere in the app, meaning changing it becomes a huge investment of energy that people will try to avoid or put off.


You can have a SQL database and still end up with assumptions about the data smeared across your codebase. I've worked on multi-million line codebases on top of a SQL database where nobody dared change the schema of some (very non-optimal) tables because too much code directly depended on the structure of those tables. Having a clean and DRY data access layer is necessary regardless of the underlying database.

As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database. Otherwise the internal details of how the tables are structured will wrap itself into the codebases and make it very hard to evolve the database's schema.


> Having a clean and DRY data access layer is necessary regardless of the underlying database.

SQL databases (via views and even sprocs) allow you to abstract particular client’s view of the data from the base storage layer inside the database.

> As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database.

An RDBMS is is an integrated service that owns its own datastore with a very-well-defined, extremely battle-tested API designed to support multiple clients with completely different views of and access to the data, all as logically isolated as necessary from the design of the base storage layer.

If you aren't using an RDBMS, sure you may need to wrap something around the datastore that provides a tricky-to-get-right subset of what an RDBMS provides fairly trivial-to-use facilities for out of the box, just like like not using an RDBMS often forces you to do for another subset if you are concerned about integrity.


Corollary: do not let multiple applications to access the same database objects.


This is where views shine. "It's microservices for data!"


But if you 'have it in the database' it will still be smeared across your app too.

'Putting stuff in one place', regardless of that place, is hard. Necessary, but hard. And it requires tradeoffs.

If you need a 'sorry this username is taken' friendly error, your app needs to handle constraint errors from your DB. Even if only on the translation layer. At which point you'll have it duplicated on multiple layers, add tight coupling between layers, or need to forego that message and e.g. settle with a generic exception instead.


The difference is that the actual constraint lives in one place and the rest of the locations are UX benefits to help the user. The system doesn't get into a bad state just because you forgot to add the constraint in the 100th location.


> rest of the locations are UX benefits to help the user.

In my experience this is a pipe dream.

Maybe in my simple example, one could parse the Constraint Exception and map that to a field and user-friendly error. Maybe. No framework or ORM that I've ever seen that does this though.

But even when it does: it still requires you to do the parsing and mapping in the application: introducing a tight coupling (e.g. you cannot add a constraint without releasing new locales).

In practice, is my experience, you'll most likely have some constraints in the DB, some validations in your ORM, some of which overlap, some of which are unique to one of both.

Which is arguably worse than having each app that uses the database repeat that. It all depends on the use-case, obviously.


Definitely. You won't have the rich exception messages all over. However, if the rule is a _business rule_ then it must either live _in_ the DB, or (very frequently) live in a dedicated repository layer that all application access goes through. Otherwise its not a rule and you _will_ forget to enforce it at some point.


This is the distinction we make too: business validations go in the database, ux-validations go in the application.

In practice, however, this means the business validations are duplicated all over the place (but always enforced, as last line of defence, in the database).

It also means customers get more frequent 500 errors (exceptions): when a business rule is implemented in DB but not (yet) in an application.

It is messy.


> It isn’t that it’s in the app, it’s that it’s everywhere in the app

That's only if you don't know how to properly code a data access layer in your application. And if you have many apps using the DB, perhaps the data layer should be in a library.


I think you need to take a long hard look at the Venn diagram of the user base we are talking about.

People who haven’t even learned about Chesterton’s fence have no idea how much they don’t know about robust software.


It's also a lot easier to mess up evolving the logical schema and result with unexpected and incoherent database state if your store doesn't enforce the logical. schema. Sure, the more the logical schema is enforced, the more you are forced to do up-front when the logical schema changes, but that work prevents you from:

(1) apply a data migration that fails to result in a state that complies with the logical schema, or

(2) producing a state inconsistent with the logical schema because your application code doesn't correctly observe the schema, as defective code will fail for violating constraints instead.


The book Designing Data-Intensive Applications talks about “schema on write” vs “schema on read”. In order to interpret your data, you must apply a schema, so your choice is whether you do that explicitly when the data is written, or implicitly when it’s read.

Or as Yoda would say, Schema read or schema write, there is no “no schema”.


Adding a column to a table might be a single line change in the application. No matter if you are using SQL or NoSQL.

Say your users table looked like this:

    id name country
And now you want to add their city:

    id name country city
This might cause the whole table to be locked and copied in SQL. While it might need no action at all in a NoSQL DB.


> This might cause the whole table to be locked and copied in SQL.

The answer is always the same to these kinds of complaints: just use Postgres please.


If you don’t mind me asking, how does Postgres fix this? Do they have a more sophisticated locking mechanism, or maybe a copy offline until it’s ready kind of a system?


If the added column is nullable or if you provide a constant default value, the table is not copied, neither online nor offline.


And the code dealing with the missing "city" field across your whole codebase just appears out of nowhere now? :)

Dealing with schema changes is complex - NoSQL just moves it elsewhere where its easier to forget about it.


That’s what default field values and ORMs are for...


You still have to decide if you want to use the default value or something else at every place.


> This might cause the whole table to be locked and copied in SQL

That was indeed the case in the past, but not so much anymore except for certain situations. MySQL, for example, has had support for in-place table alterations for a while [1]. I've used it in production and it works very well IME.

[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.ht...


“might cause”, yes. But at least with Postgres that only happens if you add a default value to the new column. You can add the default value in your application instead, just like you would with your average NoSQL DB. Then, when you have low load on your system, you can migrate the rows in batches to have a default value, and eventually remove the application default.


Since Pg12, even adding a column with a default doesn't lock the table anymore - that only happens if you later change or remove the default


But if you don't control the client, you will now have to deal with client side migration and server database version management. If you create an additional v3, you will need to decide to either keep v1->v2->v3 code or v1->v2, v1->v3. Also, reporting.


Can’t you do this concurrently these days? Also, managing scheme in the database was the least of my issues when deleting/adding fields. You still need to make sure your clients are resilient against null or missing fields in the responses, and your scheduled jobs don’t query the data which isn’t there anymore. Point is, not having to alter the db scheme doesn’t make such a big difference. You still need to make sure your system overall is build in way that allows smooth data migrations.


> a single line change in the application

That's if you only have one application accessing your data.

Once you get multiple apps adding and retrieving data from the same data store, having the database do some work starts to make more sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: