When studying the internals of Postgres years ago, I learned that pages could greatly affect the disk space required to hold a table. If the page size was 8192 bytes (the default) and if the schema defined each row as holding 4097 bytes, then two rows would not fit within a single page. This would cause every row to be within its own page and would waste almost half of the space.
Only slightly related, but the biggest increase in space I saw was after transitioning my 1.8 TB Postgres database to ZFS, which has compression turned on by default. Afterwards, the size needed was 310 GB, with no noticeable loss in speed.
Yes. This is likely not the typical workload. Our PG database is only used in research, in "burst" situations - e.g. big batch jobs written to the DB (2 days 400 Million tuples) and read in big chunks (e.g. 400 Million tuples exported in 2 hours to CSV/Postgres FDW). ZFS is on spinning Rust (Sata 6GB), 6x8TB drives in a Raidz2 pool, the ZFS dataset is both compressed and encrypted. In Proxmox, I do not see I/O in any way limited during these burst writes/reads, the bottleneck is the CPU. However, the CPU was the bottleneck also before ZFS, so I cannot say how much impact the compression/encryption has. Other ZFS parameters are default (eg. filesystem recordsize is 128KB - lower values will yield better read speed, but less compression, and we were aiming for a lot of compression).
Likely no impact, but we use Postgres in Docker on unprivileged LXC, the `/data` directory is mounted from the LXC from the host ZFS pool. Since LXC runs all processes on the host, the performance impact is negligible (unlike, e.g. running this in a full VM).
I have the _feeling_ (not actually tested) that the Postgres database is faster with ZFS, since less data needs to be read, especially since we have a lot of sequential scans.
> LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core (>0.15 Bytes/cycle). It features an extremely fast decoder, with speed in multiple GB/s per core (~1 Byte/cycle).
What you are describing is called "internal fragmentation" and it's always a problem at some level in any system.
There are tons of ways to mitigate the problem including variable-length data, out-of-line storage, and compression. Postgres does those things, but I suppose there's always room to improve.
Best to just see how much storage a given table uses, and see if it's a problem.
I think it is true for every row-oriented database that stores all the values for a single row together within a fixed-sized structure. Columnar stores on the other hand may or may not do this.
I have built a database engine that is a columnar store. All the values for each column are stored together. This requires just about every query to fetch each row value separately, but it has proven to be incredibly fast. Queries on big tables run several times faster than on Postgres and it is also about 5x faster than on SQLite. https://www.youtube.com/watch?v=Va5ZqfwQXWI
Mine is not. If a table has 4 columns (e.g. name, address, phone, email), then all the names are stored separately from the addresses. Likewise all the phone numbers are stored separately from the emails. The data is de-duped so it is incredibly easy to find out how many of each value is in each column (e.g. there are 1,234,567 rows in the table where name = 'John').
The downside is that projecting a row requires random I/O across a larger number of pages, which also means more evictions from the in-memory buffer and worse cache efficiency. Apache Arrow, Parquet, Redshift, Bigtable/Spanner, Snowflake are all hybrid columnar, for example. You get good row data locality while still being able to exploit SIMD/vectorized ops.
Yeah it's probably still true. It's also true of other DBs; There is usually a certain amount of overhead per-row on the page, or per-row.
So yeah, you have to either shrink to a size acceptable within padding, or pick a more appropriate page size, or play with some TOAST settings in the case of PG.
I was confused by the author's definition of a relation. They said that a single tuple is an "unary relation". They probably meant a single-tuple (1-tuple).
Removing the hyphen changes the meaning. https://en.wikipedia.org/wiki/Finitary_relation
The author seems to have misunderstood Codd’s definition (which is just the usual definition for mathematical relations). An N-ary relation is an N-column table (in the idealized database model where a table is a mathematical set of rows, i.e. no duplicate rows).
One term that doesn't seem "insider" but kinda is: `timestamptz`, aka the thing you should always use instead of just `timestamp`. The full name is "timestamp with time zone," but contrary to what most people would assume, it doesn't store a time zone.
Per the docs[0], TIMESTAMP WITH TIME ZONE is preferred because it will automatically convert to/from the stored UTC to whatever your time zone is set to, in the config (by default) or within a query.
I think you also need it for converting input at specified time zones, which is more complicated than one might hope, using AT TIME ZONE.
So the point is about conversion, which is important, but I agree that it’s non-obvious that you need to store the source time zone separately if you ever want to retrieve it.
`timestamptz` stores the same data I believe. Only that when using `timestamptz` instead of `timestamp`, the server will convert the returned value based on the server time zone configuration. Both datatypes have 8 bytes.
But normally you have a server between the user and the db, where you also prefer to keeps things as utc, while in memory. Then conversion to local time happens as the very last step in the presentation layer.
So unless you are using the sql result verbatim as presentation layer, is there any big benefit of this? On the contrary it seems like yet another moving part one could mess up and accidentally convert time where it shouldn’t?
It's not what it looks like. Selecting either timestamp or timestamptz gives you a string representation in the database's configured local time zone, which your backend's Postgres driver will convert to whatever date representation (like `Date` in NodeJS). But timestamp leaves the time zone out of the string. Your Postgres driver probably assumes UTC in that case, which is a problem if your database is set to PST.
First off, it's regrettable that Postgres drivers are parsing strings at all. That opens this can of worms.
Secondly, the DB locale setting is weird. It's advisable to always set your Postgres config to UTC locale to eliminate this nonsense, but that's not the default. So if your database is on PST, at least timestamptz will output strings that tell your code the correct time zone.
PST database:
select now() as timestamptz; -- 2022-11-09T14:41:12.110-08
select now() as timestamp; -- 2022-11-09T14:41:12.110
UTC database:
select now() as timestamptz; -- 2022-11-09T22:41:12.110Z, "Z" means UTC
select now() as timestamp; -- 2022-11-09T22:41:12.110
This isn't clearly explained in the official docs, btw. I think the only thing saving a lot of users is how AWS, GCP, etc all set their Postgres instances to UTC by default. Which makes this even more of a landmine if you ever use a DB not configured this way.
Another moral of the story is, every string representation of a datetime has a time zone, so it's better to be explicit about it. Something like "2022-11-09T14:41:12.110" is ambiguous. Put the "+00" or the "Z" if you mean UTC. Unix timestamps, on the other hand, are simply defined as a duration of time that has passed since the epoch and have no concept of time zone (don't even call them UTC).
Anyone know if this is still true?