Most columnar stores I'm aware of are hybrid, so all columns of a row are still ...

didgetmaster · on Nov 7, 2022

Mine is not. If a table has 4 columns (e.g. name, address, phone, email), then all the names are stored separately from the addresses. Likewise all the phone numbers are stored separately from the emails. The data is de-duped so it is incredibly easy to find out how many of each value is in each column (e.g. there are 1,234,567 rows in the table where name = 'John').

tomnipotent · on Nov 8, 2022

The downside is that projecting a row requires random I/O across a larger number of pages, which also means more evictions from the in-memory buffer and worse cache efficiency. Apache Arrow, Parquet, Redshift, Bigtable/Spanner, Snowflake are all hybrid columnar, for example. You get good row data locality while still being able to exploit SIMD/vectorized ops.