> use case of yours and trying to make it a general rule
This is a fair critique. Definitely our system is a bit unique in what it works with and the amount of random data it needs to pull together.
> it's about all of the overhead involved in serializing, transferring, deserializing, and all the way back.
Serializing and deserializing are typically not a huge cost in DB communications. Most DB protocols have binary data transfer protocols which minimize the amount of effort on server or client side needed to transform the data into native language datatypes. It's not going to be a Json protocol.
Transfer can be a problem, though, if the dataset is large.
> I assume you're not retrieving 10 million Foo for the user, god forbid
In our most extreme cases, yeah we are actually pulling 10 million foo. Though a lot of our ETL backend is where these big data requests are happening as the upstream data is being processed. That's primarily where I end up working rather than the frontend service.
And I'll agree with you. If you are talking about requests which result in the order of 10 to 100 items then yes, it's faster to do that all within the database. It depends (which is what I've been trying to communicate throughout this thread).
> you are joining 10 million rows of Foo to Bar in a subquery without a WHERE clause
No, properly formed SQL. The issue is the mass of data being transferred and, as I mentioned earlier, the temporary memory being stored in the DB while it waits to transfer everything to the application.
Splitting things into the smaller and multiple queries ends up being faster for us because the DB doesn't end up storing as much temp data, nor does it end up serializing a bunch of `null` values which ultimately take up a significant chunk of the transfer.
Also, you should recognize that now you are talking about query structure that it's not universal on what's the best/fastest way to structure a query. What's good for postgresql might be bad for mssql.
This is a fair critique. Definitely our system is a bit unique in what it works with and the amount of random data it needs to pull together.
> it's about all of the overhead involved in serializing, transferring, deserializing, and all the way back.
Serializing and deserializing are typically not a huge cost in DB communications. Most DB protocols have binary data transfer protocols which minimize the amount of effort on server or client side needed to transform the data into native language datatypes. It's not going to be a Json protocol.
Transfer can be a problem, though, if the dataset is large.
> I assume you're not retrieving 10 million Foo for the user, god forbid
In our most extreme cases, yeah we are actually pulling 10 million foo. Though a lot of our ETL backend is where these big data requests are happening as the upstream data is being processed. That's primarily where I end up working rather than the frontend service.
And I'll agree with you. If you are talking about requests which result in the order of 10 to 100 items then yes, it's faster to do that all within the database. It depends (which is what I've been trying to communicate throughout this thread).
> you are joining 10 million rows of Foo to Bar in a subquery without a WHERE clause
No, properly formed SQL. The issue is the mass of data being transferred and, as I mentioned earlier, the temporary memory being stored in the DB while it waits to transfer everything to the application.
Splitting things into the smaller and multiple queries ends up being faster for us because the DB doesn't end up storing as much temp data, nor does it end up serializing a bunch of `null` values which ultimately take up a significant chunk of the transfer.
Also, you should recognize that now you are talking about query structure that it's not universal on what's the best/fastest way to structure a query. What's good for postgresql might be bad for mssql.