I have actually, alongside my work of compressing moves. My compression scheme a...

infogulch · on March 15, 2024

Using the extra available piece indexes for "rook that can castle" and "pawn that moved two squares" is a great idea. So e.g. moving the king would not only change the squares where the king moved from and to, but would also convert both of the rooks from "rook that can castle" to "rook that can't castle". Same for a pawn that moved two squares and can be captured by en passant.

I guess you also need some way to encode which player's turn it is. Though maybe you could eliminate that by flipping the board on black's move and always encoding from the perspective of the last player's move?

I'm curious about whether a naive columnar encoding scheme could beat a more complex encoding scheme, after compression. Not columnar in the sense of ranks and files, but columnar in the sense of data science storage formats (e.g. parquet), where each 'column' is the state of a specific square across different board states. Given 64 such columns, a game state is a single row. The hypothesis being that if you're encoding all the game states of a large number of games (say 1000), after RLE and compression etc you would see a net compression better than more complex encoding schemes. Given a big block of game states like this, then a single 'game' would be a list of offsets into the game states. This would probably also compress very nicely in columnar format.

Now I want to try it...