Building JebBot: A Chess Clone

I've been playing chess since I was a little kid, and I would wake up early in the morning to play with my dad before he went to work. While it never became a competitive pursuit of mine, I would say I have a love and passion for the game and have continued to play regularly over my life. Playing chess might be the most consistent activity I partake in, and since I play online, I have found it rewarding to be able to look back over my progress as a player.

Chess Elo over time

Beyond progress, my chess rating helps me look back and track moments in life. There is a big, flat section in 2017 -- while I was a camp counselor at Overland and looking for a full time role. Big gains as I worked at Google and played bughouse regularly with co-workers. Steep drops at the end of 2018 (moving cities) and early 2020 (covid).

This is all to say, in my chess there is a faint sprite of my life.

Everyone in tech right now seems obsessed with a) living forever and b)AI and maybe c) chess.

I thought instead of swapping my blood with a younger person or uploading my consciousness to live forever as a machine (so that I can interface with Ross Gellar), a chessbot would suffice. The idea of training a chessbot on my past game library so that it plays like me, and therefore, is some reflection of me is nice. And if I get hit by a truck tomorrow I think it would give my friends great solace to play against a poorly planned Danish gambit a few more times.

So for this project, I have a few goals:

Here is how it went:

The Approach

I had to figure out how exactly I was going to get a chess model to play like me. It felt a bit complex and redundant to build a chess model from scratch (a la Alphazero) and just extracting insights from my games and hard coding rules in certain positions felt too rudimentary and wouldn't involve training a model.

So I went with -- in my humble opinion -- a pretty clever simple design. Instead of training a full chessbot how to play chess and instilling it with my style, I figured I could train a simple binary classifier. This classifier gets trained on a dataset that is half-my real moves and half- not my moves. As it trains it should get better at differentiating and therefore "learning" my style. The classifier after training can rate any position and move in terms of "does that look like a jeb move?".

How does that translate to a chessbot?

Using an existing chess engine set at my level (I'm using stockfish 1500) I can serve it 5 moves in every position. The style classifier then looks at each of the 5 moves and rates it as the most likely that I would play it. Higher confidence means the move is more like my style. Then we select the move with the highest "jebness" score. Repeat the process each position and we have a chessbot that is always picking the most jeb move. (This is the theory at least)

The Data Pipeline

I started by downloading ~8,000 of my games from Chess.com. Using claude, I wrote a simple client that pulls the games and stores it as JSON. I filtered out abandoned games, time-outs, and (this was a huge mistake) some of the opening moves. I then converted the remaining games into individual positions and moves. So each data example is -- 1) what does the board look like and 2) what move was made, which left 219,663 positions + moves. (filtering out my opponents moves was also maybe a mistake).

Play analysis

I needed to convert these positions into a 12x8x8 array: 12 channels (one for each piece type (p, k, q, n, b, r × 2 colors)) and a 8×8 board. Each channel is binary: 1 where that piece sits, 0 elsewhere. This is how we represent each "position". For each move: there are 64 squares a piece can move from and 64 squares it can move to. So we have an index of 4096 moves.

Now that we have a numerical representation of every position and move, we can start training.

Parsing positions

The Model

The model is a binary classifier. Given a position and a candidate move, it outputs a probability: "Is this something Jeb would play?"

INPUT:
├── Position: 12×8×8 tensor (one channel per piece type)
└── Move: index 0-4095 → embedded to 64-dim vector

ARCHITECTURE:
├── 3 convolutional layers (12→64→128→128 channels)
├── Flatten to 8,192 features
├── Concatenate with 64-dim move embedding
├── 3 fully connected layers (8,256→256→64→1)
└── Sigmoid output (probability 0-1)

OUTPUT: "Is this a Jeb move?" (0.0 to 1.0)

Once I had this architecture set I made a pretty dumb mistake. I tried to train the model which immediately started spamming 1 as the answer (aka these are all jeb moves). I remedied this with generated moves from each position from a 1500 stockfish.

Back to training:

The Training Loop

I trained on a laptop using MPS acceleration.

Training visualization

I made this because I thought it would be fun to see the model actually make guesses. Every 100 batches, the trainer sends 6 random positions to a local web server that displays the board, the move, the model's confidence, and whether it was actually my move.

Results

After training, the model achieves ~60.5% accuracy distinguishing my moves from reasonable alternatives. I found this to be fine for the purposes of my own education and for being slightly more "me" than a neutral chess engine, so we are keeping on.

Some immediate improvements I could have made with training: more aggressive data cleanup, weight recent games higher, play with different dropouts and weight decay, and add a DPO step to give direct preference on if moves felt like me.

Brief aside – I pulled interesting examples out of the training run:

High-Confidence Correct (>90%): The model recognizes my signature moves. Aggressive queen. Danish Gambit continuation.

High-Confidence Wrong (>85%): Moves the model thinks I'd love, but I actually didn't play. The model has learned my patterns but overgeneralizes.

Missed Jeb Moves (<15%): Usually defensive moves.

Overall, it's a hard task for the model as there is often a "correct" best move in a position and even at 1500-1600, I should be finding that often. In those, there is no differentiation between a computer and me.

Playing Against JebBot

Finished bot move picker ƒ

The trained model doesn't play chess by itself—it scores moves. To actually play, I used claude to build an engine that combines: A stockfish 1500 model that serves moves + my classifiers. In initial playthrough, I found it passable with three major errors.

  1. Terrible at openings (remember I removed some opening moves from the dataset)
  2. Blundered too often (the stockfish data had no blunders, so I think it "learned" if I see a blunder it must be Jeb)
  3. Terrible at endgames (I think maybe just not enough data for those positions)

I made some adjustments to make it passable:

1. Opening Book (First ~5 Moves)

A weighted random book based on my actual opening preferences:

2. Endgame Detection

When either side has ≤12 material points (roughly a rook or less), JebBot hands off to pure Stockfish.

3. Style-Quality Balancing

To fix the blunders, I just had to adjust how it picks moves:

  1. Find the "safe pick": first Stockfish move with >50% Jeb score
  2. If another move scores 15%+ higher, play that instead
  3. Fallback: if nothing is >50%, play highest Jeb score anyway

This balances two competing goals:

The Tech Stack

Everything runs locally.

What's Next

JebBot is functional but I'd like to host it on my website and possibly make my friends play the bot/me blind to see if I can beat them in a chessbot turing test.

To do so, I will likely have to continue to improve the selector model with DPO and the other training improvements discussed above.

Appendix

Model Size Breakdown

Component Parameters
Conv1 (12→64)6,976
Conv2 (64→128)73,856
Conv3 (128→128)147,584
Move Embedding262,144
FC1 (8256→256)2,113,792
FC2 (256→64)16,448
FC3 (64→1)65
Total~2.6M

Data Statistics

Training Configuration

Repository Structure

jebbot/
├── jebbot/
│   ├── data/          # Download, parse, encode
│   ├── model/         # StyleSelector network
│   ├── play/          # Engine, openings, server
│   ├── training/      # Training utilities
│   └── visualization/ # Real-time training display
├── scripts/           # CLI entry points
└── data/
    ├── raw/           # Downloaded games
    ├── processed/     # Training positions
    └── models/        # Trained checkpoints

Partly inspired by Peter Whidden's "AI Learns Pokemon" video, but using supervised learning instead of reinforcement learning. Much simpler, faster, and suited for "behavioral cloning."