git diff REL_16_RC1..REL_16_0 doesn't show any changes that would require a pg_upgrade (at least from my read), so you should be able to upgrade without it.
Blog author. I've done some separate testing on storing ~500GB of embeddings (~1B embeddings) in a partitioned table. The partition key was built using IVFFLAT as a "coarse quantizer" (in this case, sampling the entire dataset and finding K means), storing the mean vectors in a separate table, and then loading each vector into the partition with closest center. After that, I built an IVFFLAT index on each partition. With the indexes, this added up to ~1TB storage. This was primarily a "is it possible test" vs. thorough benchmarking.
Blog author. You can choose to use any distance metrics. One reason cosine similarity is popular (and used) is that for many of these higher dimensional datasets, it gives a better representation of "nearness" across all the data basd on the nature of "angular" distance. But depending on how your data is distributed, something like L2 distance (Euclidean) could make more sense.
Blog author. Thanks for the analysis -- I agree that the ANN Benchmark does provide a nice framework for helping with apples-to-apples comparisons. In this case, being able to use the "--local" flag made it easier to run using the native environment, vs. putting it into a container. I'm looking for to ANN Benchmark having more datasets!
Hey, I helped to write the original article, and I also work at AWS.
I'm happy to let you know that PostgreSQL 16 Beta 1 is available in the Amazon RDS Database Preview Environment (https://aws.amazon.com/about-aws/whats-new/2023/05/postgresq...). I hope you get a chance to test, I'm very eager to here any and all feedback about PostgreSQL 16.
Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.
I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.