I have actually, alongside my work of compressing moves. My compression scheme averages about 150 bits per position, or about 35% of the standard text notation in EPD format.
The thing I optimized for is that there’s very often repeated blank spaces or repeated pawns on the same rank.
Also instead of storing castling status separately, it’s stored as a separate piece on the appropriate rook.
These take advantage of the fact that there’s 6 pieces, so 3 bits to encode them leaves two options remaining. One is PawnRepeating and the other is RookCastleAvailable, in my scheme.
There’s probably improvements to be made. I’ll write a post on it when it’s finalized.
Using the extra available piece indexes for "rook that can castle" and "pawn that moved two squares" is a great idea. So e.g. moving the king would not only change the squares where the king moved from and to, but would also convert both of the rooks from "rook that can castle" to "rook that can't castle". Same for a pawn that moved two squares and can be captured by en passant.
I guess you also need some way to encode which player's turn it is. Though maybe you could eliminate that by flipping the board on black's move and always encoding from the perspective of the last player's move?
I'm curious about whether a naive columnar encoding scheme could beat a more complex encoding scheme, after compression. Not columnar in the sense of ranks and files, but columnar in the sense of data science storage formats (e.g. parquet), where each 'column' is the state of a specific square across different board states. Given 64 such columns, a game state is a single row. The hypothesis being that if you're encoding all the game states of a large number of games (say 1000), after RLE and compression etc you would see a net compression better than more complex encoding schemes. Given a big block of game states like this, then a single 'game' would be a list of offsets into the game states. This would probably also compress very nicely in columnar format.
The thing I optimized for is that there’s very often repeated blank spaces or repeated pawns on the same rank.
Also instead of storing castling status separately, it’s stored as a separate piece on the appropriate rook.
These take advantage of the fact that there’s 6 pieces, so 3 bits to encode them leaves two options remaining. One is PawnRepeating and the other is RookCastleAvailable, in my scheme.
There’s probably improvements to be made. I’ll write a post on it when it’s finalized.