Postgres Insider Terminology

didgetmaster · on Nov 5, 2022

When studying the internals of Postgres years ago, I learned that pages could greatly affect the disk space required to hold a table. If the page size was 8192 bytes (the default) and if the schema defined each row as holding 4097 bytes, then two rows would not fit within a single page. This would cause every row to be within its own page and would waste almost half of the space.

Anyone know if this is still true?

Helmut10001 · on Nov 5, 2022

Only slightly related, but the biggest increase in space I saw was after transitioning my 1.8 TB Postgres database to ZFS, which has compression turned on by default. Afterwards, the size needed was 310 GB, with no noticeable loss in speed.

TheNewsIsHere · on Nov 7, 2022

That is insanely impressive. Could you share more about the characteristics and nature of that workload? How did it perform above typical loads?

Helmut10001 · on Nov 8, 2022

Yes. This is likely not the typical workload. Our PG database is only used in research, in "burst" situations - e.g. big batch jobs written to the DB (2 days 400 Million tuples) and read in big chunks (e.g. 400 Million tuples exported in 2 hours to CSV/Postgres FDW). ZFS is on spinning Rust (Sata 6GB), 6x8TB drives in a Raidz2 pool, the ZFS dataset is both compressed and encrypted. In Proxmox, I do not see I/O in any way limited during these burst writes/reads, the bottleneck is the CPU. However, the CPU was the bottleneck also before ZFS, so I cannot say how much impact the compression/encryption has. Other ZFS parameters are default (eg. filesystem recordsize is 128KB - lower values will yield better read speed, but less compression, and we were aiming for a lot of compression).

Likely no impact, but we use Postgres in Docker on unprivileged LXC, the `/data` directory is mounted from the LXC from the host ZFS pool. Since LXC runs all processes on the host, the performance impact is negligible (unlike, e.g. running this in a full VM).

I have the _feeling_ (not actually tested) that the Postgres database is faster with ZFS, since less data needs to be read, especially since we have a lot of sequential scans.

Helmut10001 · on Nov 9, 2022

Just looked it up [1]

> LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core (>0.15 Bytes/cycle). It features an extremely fast decoder, with speed in multiple GB/s per core (~1 Byte/cycle).

[1]: https://lz4.github.io/lz4/

jeff-davis · on Nov 5, 2022

What you are describing is called "internal fragmentation" and it's always a problem at some level in any system.

There are tons of ways to mitigate the problem including variable-length data, out-of-line storage, and compression. Postgres does those things, but I suppose there's always room to improve.

Best to just see how much storage a given table uses, and see if it's a problem.

j16sdiz · on Nov 5, 2022

Afaik, this is still true in most cases.* this is true for almost every database.

* Except, maybe, toast, hot or compressed..

hashmash · on Nov 5, 2022

> this is true for almost every database.

...that uses fixed sized pages.

smegsicle · on Nov 5, 2022

so true for almost every sql database? or not?

didgetmaster · on Nov 5, 2022

I think it is true for every row-oriented database that stores all the values for a single row together within a fixed-sized structure. Columnar stores on the other hand may or may not do this.

I have built a database engine that is a columnar store. All the values for each column are stored together. This requires just about every query to fetch each row value separately, but it has proven to be incredibly fast. Queries on big tables run several times faster than on Postgres and it is also about 5x faster than on SQLite. https://www.youtube.com/watch?v=Va5ZqfwQXWI

tomnipotent · on Nov 6, 2022

Most columnar stores I'm aware of are hybrid, so all columns of a row are still colocated in the same page.

didgetmaster · on Nov 7, 2022

Mine is not. If a table has 4 columns (e.g. name, address, phone, email), then all the names are stored separately from the addresses. Likewise all the phone numbers are stored separately from the emails. The data is de-duped so it is incredibly easy to find out how many of each value is in each column (e.g. there are 1,234,567 rows in the table where name = 'John').

tomnipotent · on Nov 8, 2022

The downside is that projecting a row requires random I/O across a larger number of pages, which also means more evictions from the in-memory buffer and worse cache efficiency. Apache Arrow, Parquet, Redshift, Bigtable/Spanner, Snowflake are all hybrid columnar, for example. You get good row data locality while still being able to exploit SIMD/vectorized ops.

hashmash · on Nov 5, 2022

MyRocks is MySQL backed by an LSM tree, and so it doesn't have fixed sized pages.

smegsicle · on Nov 5, 2022

so a mysql storage engine that uses the rocksdb key/value store?

sounds like could be one of those rule-proving exceptions, even if it is a big facebook project

to11mtm · on Nov 8, 2022

Yeah it's probably still true. It's also true of other DBs; There is usually a certain amount of overhead per-row on the page, or per-row.

So yeah, you have to either shrink to a size acceptable within padding, or pick a more appropriate page size, or play with some TOAST settings in the case of PG.

ialad · on Nov 5, 2022

I was confused by the author's definition of a relation. They said that a single tuple is an "unary relation". They probably meant a single-tuple (1-tuple). Removing the hyphen changes the meaning. https://en.wikipedia.org/wiki/Finitary_relation

layer8 · on Nov 5, 2022

The author seems to have misunderstood Codd’s definition (which is just the usual definition for mathematical relations). An N-ary relation is an N-column table (in the idealized database model where a table is a mathematical set of rows, i.e. no duplicate rows).

hot_gril · on Nov 6, 2022

One term that doesn't seem "insider" but kinda is: `timestamptz`, aka the thing you should always use instead of just `timestamp`. The full name is "timestamp with time zone," but contrary to what most people would assume, it doesn't store a time zone.

eric-hu · on Nov 6, 2022

Could you expand on why timestamptz is preferable to timestamp? What does it additionally store?

biztos · on Nov 6, 2022

Per the docs[0], TIMESTAMP WITH TIME ZONE is preferred because it will automatically convert to/from the stored UTC to whatever your time zone is set to, in the config (by default) or within a query.

I think you also need it for converting input at specified time zones, which is more complicated than one might hope, using AT TIME ZONE.

So the point is about conversion, which is important, but I agree that it’s non-obvious that you need to store the source time zone separately if you ever want to retrieve it.

[0]: https://www.postgresql.org/docs/current/datatype-datetime.ht...

kurmouk · on Nov 6, 2022

`timestamptz` stores the same data I believe. Only that when using `timestamptz` instead of `timestamp`, the server will convert the returned value based on the server time zone configuration. Both datatypes have 8 bytes.

Too · on Nov 6, 2022

But normally you have a server between the user and the db, where you also prefer to keeps things as utc, while in memory. Then conversion to local time happens as the very last step in the presentation layer.

So unless you are using the sql result verbatim as presentation layer, is there any big benefit of this? On the contrary it seems like yet another moving part one could mess up and accidentally convert time where it shouldn’t?

hot_gril · on Nov 9, 2022

It's not what it looks like. Selecting either timestamp or timestamptz gives you a string representation in the database's configured local time zone, which your backend's Postgres driver will convert to whatever date representation (like `Date` in NodeJS). But timestamp leaves the time zone out of the string. Your Postgres driver probably assumes UTC in that case, which is a problem if your database is set to PST.

First off, it's regrettable that Postgres drivers are parsing strings at all. That opens this can of worms.

Secondly, the DB locale setting is weird. It's advisable to always set your Postgres config to UTC locale to eliminate this nonsense, but that's not the default. So if your database is on PST, at least timestamptz will output strings that tell your code the correct time zone.

PST database:

select now() as timestamptz; -- 2022-11-09T14:41:12.110-08

select now() as timestamp; -- 2022-11-09T14:41:12.110

UTC database:

select now() as timestamptz; -- 2022-11-09T22:41:12.110Z, "Z" means UTC

select now() as timestamp; -- 2022-11-09T22:41:12.110

This isn't clearly explained in the official docs, btw. I think the only thing saving a lot of users is how AWS, GCP, etc all set their Postgres instances to UTC by default. Which makes this even more of a landmine if you ever use a DB not configured this way.

Another moral of the story is, every string representation of a datetime has a time zone, so it's better to be explicit about it. Something like "2022-11-09T14:41:12.110" is ambiguous. Put the "+00" or the "Z" if you mean UTC. Unix timestamps, on the other hand, are simply defined as a duration of time that has passed since the epoch and have no concept of time zone (don't even call them UTC).

tl;dr Just say no to `timestamp`

hot_gril · on Nov 9, 2022

It converts the value either way. But with timestamp, the converted value gets the time zone truncated. It makes no sense.

nraynaud · on Nov 6, 2022

I’m pretty sure the “target list” is called the “projection” amongst the tweed-wearing academics. And the “restrictions” would be the “selection”

burmecia · on Nov 6, 2022

Another interesting term not mentioned is 'qual', which basically has same meaning as 'restrictions'.

ryandotsmith · on Nov 5, 2022

Missing a fun one: ctid

pramsey · on Nov 5, 2022

Yes! Actually, I was thinking "WAL" is constantly thrown around without explanation too.

hot_gril · on Nov 6, 2022

pg_stat_activity

asah · on Nov 5, 2022

ctid gives me ptsd !

ryandotsmith · on Nov 5, 2022

ya. i use them when I need to delete duplicates and there is no pk.

hot_gril · on Nov 6, 2022

Ohh. I always used row_number for that.

remram · on Nov 5, 2022

Array and relation are definitely not insider terms.

leetrout · on Nov 5, 2022

That cookie banner takes you to the main page when you try to close it on mobile.

Super annoying.