Hacker Newsnew | past | comments | ask | show | jobs | submit | jkatz05's commentslogin

git diff REL_16_RC1..REL_16_0 doesn't show any changes that would require a pg_upgrade (at least from my read), so you should be able to upgrade without it.


I just tested, I had to use pg_upgrade.

Error message: The database cluster was initialized with CATALOG_VERSION_NO 202306141, but the server was compiled with CATALOG_VERSION_NO 202307071.


You might just need to run:

    alter database mystuff refresh collation version;


Blog author. I've done some separate testing on storing ~500GB of embeddings (~1B embeddings) in a partitioned table. The partition key was built using IVFFLAT as a "coarse quantizer" (in this case, sampling the entire dataset and finding K means), storing the mean vectors in a separate table, and then loading each vector into the partition with closest center. After that, I built an IVFFLAT index on each partition. With the indexes, this added up to ~1TB storage. This was primarily a "is it possible test" vs. thorough benchmarking.


Blog author. You can choose to use any distance metrics. One reason cosine similarity is popular (and used) is that for many of these higher dimensional datasets, it gives a better representation of "nearness" across all the data basd on the nature of "angular" distance. But depending on how your data is distributed, something like L2 distance (Euclidean) could make more sense.


Blog author. Thanks for the analysis -- I agree that the ANN Benchmark does provide a nice framework for helping with apples-to-apples comparisons. In this case, being able to use the "--local" flag made it easier to run using the native environment, vs. putting it into a container. I'm looking for to ANN Benchmark having more datasets!


Hey, I helped to write the original article, and I also work at AWS.

I'm happy to let you know that PostgreSQL 16 Beta 1 is available in the Amazon RDS Database Preview Environment (https://aws.amazon.com/about-aws/whats-new/2023/05/postgresq...). I hope you get a chance to test, I'm very eager to here any and all feedback about PostgreSQL 16.


Thanks for the info!

Would you also know by any chance when RDS will get Graviton3 instances in non-Dublin EU regions?


Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.


I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.


Check out the Azure team's benchmarks. Pretty damn impressive.

https://techcommunity.microsoft.com/t5/azure-database-for-po...


PostgreSQL 14 made improvements to this area around indexing:

https://www.postgresql.org/docs/14/btree-implementation.html...


Yes, but that doesn't cover the problem of bloat in the TOAST tables due to excessive churn of large attributes that was mentioned in the parent post.


"@>" is one of the operators for the GIN index, which is what Postgres uses for the JSONB type.

Ref: https://www.postgresql.org/docs/current/gin-builtin-opclasse...


Just feels like it would be nice to have some syntactic sugar to convert these expressions.


TDE won't be in 14, but there is ongoing work to try to have it ready for 15.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: