More

jkatz05 · on Sept 14, 2023

git diff REL_16_RC1..REL_16_0 doesn't show any changes that would require a pg_upgrade (at least from my read), so you should be able to upgrade without it.

olavgg · on Sept 14, 2023

I just tested, I had to use pg_upgrade.

Error message: The database cluster was initialized with CATALOG_VERSION_NO 202306141, but the server was compiled with CATALOG_VERSION_NO 202307071.

yobert · on Sept 14, 2023

You might just need to run:

    alter database mystuff refresh collation version;

jkatz05 · on Aug 12, 2023

Blog author. I've done some separate testing on storing ~500GB of embeddings (~1B embeddings) in a partitioned table. The partition key was built using IVFFLAT as a "coarse quantizer" (in this case, sampling the entire dataset and finding K means), storing the mean vectors in a separate table, and then loading each vector into the partition with closest center. After that, I built an IVFFLAT index on each partition. With the indexes, this added up to ~1TB storage. This was primarily a "is it possible test" vs. thorough benchmarking.

jkatz05 · on Aug 10, 2023

Blog author. You can choose to use any distance metrics. One reason cosine similarity is popular (and used) is that for many of these higher dimensional datasets, it gives a better representation of "nearness" across all the data basd on the nature of "angular" distance. But depending on how your data is distributed, something like L2 distance (Euclidean) could make more sense.

jkatz05 · on Aug 10, 2023

Blog author. Thanks for the analysis -- I agree that the ANN Benchmark does provide a nice framework for helping with apples-to-apples comparisons. In this case, being able to use the "--local" flag made it easier to run using the native environment, vs. putting it into a container. I'm looking for to ANN Benchmark having more datasets!

jkatz05 · on May 26, 2023

Hey, I helped to write the original article, and I also work at AWS.

I'm happy to let you know that PostgreSQL 16 Beta 1 is available in the Amazon RDS Database Preview Environment (https://aws.amazon.com/about-aws/whats-new/2023/05/postgresq...). I hope you get a chance to test, I'm very eager to here any and all feedback about PostgreSQL 16.

LunaSea · on May 28, 2023

Thanks for the info!

Would you also know by any chance when RDS will get Graviton3 instances in non-Dublin EU regions?

jkatz05 · on Sept 30, 2021

Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.

jkatz05 · on Sept 30, 2021

I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.

darksaints · on Oct 3, 2021

Check out the Azure team's benchmarks. Pretty damn impressive.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

jkatz05 · on June 1, 2021

PostgreSQL 14 made improvements to this area around indexing:

https://www.postgresql.org/docs/14/btree-implementation.html...

mattashii · on June 1, 2021

Yes, but that doesn't cover the problem of bloat in the TOAST tables due to excessive churn of large attributes that was mentioned in the parent post.

jkatz05 · on June 1, 2021

"@>" is one of the operators for the GIN index, which is what Postgres uses for the JSONB type.

Ref: https://www.postgresql.org/docs/current/gin-builtin-opclasse...

atonse · on June 1, 2021

Just feels like it would be nice to have some syntactic sugar to convert these expressions.

jkatz05 · on May 20, 2021

TDE won't be in 14, but there is ongoing work to try to have it ready for 15.