I’m exploring the idea of building a website that involves analysis of chess positions. Chess positions themselves are described using a format called FEN, e.g. the starting position for a chess board can be described as:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
The part I’m particularly interested in ideas around how to model is the everything up to the first space. This describes the layout of each rank of the chessboard from the 1st rank to the 8th. Pieces are described using a combination of letters and upper/lower case to signify the piece kind and if it belongs to the black player (lowercase) or white (uppercase). The integer 8 indicates the number of empty spaces. After a move like 1. e4
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
You can see now how the 5th rank portion describes 4 spaces, a white pawn then 3 spaces.
So how can we go about modelling this in a relational database like Postgres? My naive approach is to start with storing the entire FEN string as a
varchar, however, in the future, I’d like to migrate this to a structure that would enable me to easily search for structures within a chess position. An example could be, find me all the positions in the database where white pawns occupy the e4 and e5 positions (the rank segment for the 5th position:
A regex search of the FEN string could find the positions I’m interested in but would it be enough to simply index the FEN column? Should I think about storing each rank of the board as an individual column? Or could the entire board be represented as a matrix somehow.
If you want to hold the positions as text that’s fine. But there are better text editors than Postgres. Holding each game as a file and each position as a row within that file, in FEN notation, would be a good approach. The files can be read and searched using the tools of your choice (grep; Python). Expanding the spaces as Daniel Vérité suggests would make life easier.
If you really want to use a DBMS Elasticsearch has a good reputation for finding patterns within textual data.
If you really want to use a relational database, however, I’d suggest designing a data structure using relations (i.e. tables) rather than relying on SQL’s poorer text parsing capabilities.
One must make the mental step from the rules of chess as a game to those of relational schema design. In the latter we define entity types, and facts that are known about them. The entity types can be actual real-world items, concepts or events. There are many possible ontologies for any given problem. Specifically we must separate the database internal representation of information and how that information is formatted for human consumption. I do not present my solution as “correct” in an absolutist sense, only that it comes from a relational perspective on data modelling. Other equally relational approaches are possible.
What tables are required? Most obviously there’s Game:
Game GameId int primary key WhitePlayer varchar(50) BlackPlayer varchar(50) .. others
I’d model each piece as a distinct row. Pawn 1 will be distinct from pawn 2 and so forth. I still need to record colour and type to reproduce a position. When a pawn is promoted the new piece must have an PieceId distinct from those pieces present at the start of a game. This is required by how I re-create a position below.
Piece PieceId smallint primary key Type char(1) -- (P)awn, (R)ook, (K)ing etc. Colour char(1) -- (W)hite, (B)lack;
Finally there will be a table to store moves. The choice here is between storing a move (like PGN) or storing a whole position (like FEN). Storing the positions introduces much redundant data – all the blank squares and the pieces which did not move on this turn. So I would store the moves individually.
Move MoveId int GameId int foreign key referencing Game.GameId PieceId int foreign key referencing Piece.PieceId ToRow char(1) ToColumn smallint IsRemoved boolean -- set when a piece is taken from the board primary key (MoveId, PieceId)
The pair (ToRow, ToColumn) is where the piece ends up after a move. I’m thinking of the a1-h8 format. I do not need FromRow/ FromColumn. If we need this it can be recreated from PieceId and the history of moves. Neither will I check that the rules of chess are obeyed. That is a job for a different tier of the application.
Some chess moves will result in two rows in this table. For example during capture there will be one row for the taking piece and another for the taken. The latter will have IsRemoved set.
Each game starts with the same 32 rows in Move for the initial positions of the pieces. This may seem redundant but is required to reproduce any position. If not explicit it would have to be implied in each query as some pieces may not move during a game. This also allows for puzzle setup.
To reproduce a position we observe that each piece will be placed wherever it was most recently moved before that position. Take the following as pseudo-code; I’ve not tested it.
select p.GameId, p.MoveId, pc.Colour, pc.Type, q.ToRow, Q.ToColumn from Move as p inner join Move as q on q.GameId = p.GameId and q.MoveId <= p.MoveId inner join Piece as pc on pc.PieceId = q.PieceId where not exists ( select 1 from Move as c where c.GameId = q.GameId and c.PieceId = q.PieceId and c.MoveId <= p.MoveId and c.MoveId >= q.MoveId ) and not exists ( select 1 from Move as d where d.GameId = q.GameId and d.PieceId = q.PieceId and d.MoveId <= p.MoveId and d.IsRemoved = True )
Read this as for every move of every game (aliased as p) i.e. every position, find all moves in the same game which happened before this move (aliased as q). Omit those moves which have a subsequent move for the same piece in the current game, or which have been captured.
Once a position can be re-created as above, it is a simple extension to find positions where certain pieces (or types of pieces) sit in given configurations. It would be possible to structure the query to look for certain pieces in certain positions, then ensure they’re in place at the same time, then retrieve the remaining pieces at that point. I would note that white pawns may occupy e4 and e5 for a large portion of a great many games so a lot of positions would satisfy this criteria.
Whether this is faster than the simple-but-dumb text approach only testing can tell.
This question sparked my interest. I wondered what a “relation-y” answer would look like, since an RDBMS was specified. I approached it somewhat as a Gedankenexperiment. Having worked through some queries, however, I’m not sure I have the ontology quite right. If the objective is to identify positions, I think positions are the objects that should be stored. This leads back to the OP’s original suggestion. Yes, there is redundancy in the storage as most pieces do not change between positions. Practically speaking, though, at 64 bytes per position, say 50 moves per game, then a database of 10,000 games needs about 30MB. This is trivial. Although SQL is not great at text parsing it is certainly up to deciding if given offsets in a string contain specific values e.g. e4 and e5 hold white pawns. On balance I think the most practical implementation will be FEN with explicit spaces.