The game of tic-tac-toe, also called naughts and crosses in many parts of the world, can be an interesting application of neural networks trained by genetic algorithms. Tic-tac-toe has very simple rules. Most human players quickly grow bored with the game, since it is so easy to learn. Once two people have mastered tic-tac-toe, games almost always result in a tie.

    The simple rules for tic-tac-toe are as follows: One player plays X and another plays O. They take turns placing these characters on a 3x3 grid until one player gets three in a row. If the grid is filled before one of them gets three in a row, then the game is a draw. Figure 6.3 shows a game of tic-tac-toe in progress.

Figure 6.3: The game of tic-tac-toe.

The game of tic-tac-toe.

    There are many implementations of tic-tac-toe in Java. This book makes use of one by Thomas David Baker, which was released as open source software. It provides several players that can be matched for games:

  • Boring – Just picks the next open spot.
  • Human – Allows a human to play.
  • Logical – Uses logic to play a near perfect game.
  • MinMax – Uses the min-max algorithm to play a perfect game.
  • Random – Moves to random locations.

    Each player can play any other player. A class simply has to implement the Player interface and it can play against the others. Some of the players are more advanced. This gives the neural network several levels of players to play against.

    The most advanced player is the min-max player. This player uses a min-max algorithm. A min-max algorithm uses a tree to plot out every possible move. This technique can be used in a game as simple as tic-tac-toe; however, it would not be effective for a game with an extremely large number of combinations, such as chess or go.

    The example for this chapter creates a class named PlayerNeural. This class uses a neural network to play against the other players. The neural player will not play a perfect game. Yet, it will play reasonably well against some of the provided players.

    The goal is not to produce a perfect player using only a neural network, but to demonstrate the use of neural networks. Many games use a hybrid approach for optimal results. The min-max algorithm, together with a neural network to prune entire branches of the search tree can be a very effective combination. We will, however, implement a pure neural-network approach.

    The tic-tac-toe example illustrates a very important concept. Thus far, we have only trained neural networks using training sets. The tic-tac-toe example will not use training sets. Rather, the neural network will be matched against any of the other players and a genetic algorithm will be used to train it. Since a genetic algorithm uses “natural selection” it is particularly well suited to learning to play a game.

Using the Sample Tic-Tac-Toe Implementation

     There are many options available with this version of tic-tac-toe. There are several command line options that allow you to specify which two players will play against each other. The general format of a command is as follows:

NeuralTicTacToe [Command] [Player 1] [Player 2]

    The command also specifies the mode in which the program should run. The following modes are available:

  • game – Play a single game.
  • match – Play 100 games.
  • train – Train (and save) a neural network.

    You must choose the two players. The following options are available:

  • Boring
  • Human
  • Logic
  • MinMax
  • NeuralBlank
  • NeuralLoad
  • Random

    For example, to play a match between the MinMax algorithm and the Random player, use the following command:

NeuralTicTacToe Match MinMax Random

     If you would like to train a new blank neural network against the random player, you would use the following command:

NeuralTicTacToe Train NeuralBlank Random

    This will train a blank neural network; but be aware it can take a considerable amount of time. The genetic algorithm will use a thread pool, so a multicore computer will help. Once the training is complete, the neural network will be saved to disk. It will be saved with the name “tictactoe.net”. The download for this book contains an example “tictactoe.net” file that is already trained for tic-tac-toe. This neural network took nearly 20 hours to train on my computer. Appendix A explains how to download the examples for this book.

    To play a trained neural network, use the following command:

NeuralTicTacToe Play NeuralLoad Human

    The sample neural network will always load and save a neural network named “tictactoe.net”.

Saving and Loading Neural Networks

    You may be wondering how to load and save the neural networks in this book. These networks can be loaded and saved using regular Java serialization techniques. For example, the following command saves a neural network:

SerializeObject.save("filename.net", neuralNetwork);

    Loading a neural network is almost as easy. The following command loads a neural network from disk.

FeedforwardNetwork result = 
(FeedforwardNetwork)SerializeObject.load("neuralnet.net");

    Some of the neural networks in this book take a considerable amount of time to train. Thus, it is valuable to save the neural network so it can be quickly reloaded later.

Structure of the Tic-Tac-Toe Neural Network

    One of the most important questions that a neural network programmer must always consider is how to structure the neural network. Perhaps the most obvious way to structure a neural network for tic-tac-toe is with nine inputs and nine outputs. The nine inputs will specify the current board configuration. The nine outputs will allow the neural network to specify where it wants to move.

    The first version of the example used this configuration. It did not work particularly well. One problem is that the neural network had to spend considerable time just learning the valid and invalid moves. Furthermore, this configuration did not play very well against the other players.

    The final version of the neural network has a configuration with nine inputs and a single output. Rather than asking the neural network where to move, this structure asks the neural network if a proposed board position is favorable. Whenever it is the neural player's turn, every possible move is determined. There can be at most nine possible moves. A temporary board is constructed that indicates what the game board would look like after each of these possible moves. Each board is then presented to the neural network. The board position that receives the highest value from the output neuron will be the next move.

Neural Player

    The Player interface requires that the NeuralPlayer class include a getMove method. This method determines the next move the neural player will make. The signature for the getMove method is shown here:

public Move getMove(final byte[][] board, final Move prev, final byte color) 

    First, two local variables are established. The bestMove variable will hold the best move found so far. The bestScore variable will hold the score of the bestMove variable.

Move bestMove = null;
double bestScore = Double.MIN_VALUE;

    Next, all of the potential moves on the board's grid are examined.

for (int x = 0; x < board.length; x++) {
  for (int y = 0; y < board.length; y++) {

    A sample board is constructed to represent this potential move.

    final Move move = new Move((byte) x, (byte) y, color);

    The potential move is examined to determine if it is valid.

    if (Board.isEmpty(board, move)) {

    If the potential move is valid, then the tryMove method is called to see what score its board position would provide.

      final double d = tryMove(board, move);

    If the score for that board position beats the current best score, then this move is saved as the best move encountered thus far.

      if ((d > bestScore) || (bestMove == null)) {
        bestScore = d;
        bestMove = move;
      }
    }
  }
}

    Finally, the best move is returned.

return bestMove;

    The getMove method makes use of the tryMove method to build a temporary board for each possible move. The signature for the tryMove method is shown here:

private double tryMove(final byte[][] board, final Move move) {

    First, an input array is constructed for the neural network. A local variable index is kept that will remember the current position within the input array.

final double input[] = new double[9];
int index = 0;

    Next, every position of the board grid is considered.

for (int x = 0; x < board.length; x++) {
  for (int y = 0; y < board.length; y++) {

    Each square on the grid is checked and the input array is set to reflect the status of that square.

    if (board[x][y] == TicTacToe.NOUGHTS) {
      input[index] = -1;
    } else if (board[x][y] == TicTacToe.CROSSES) {
      input[index] = 1;
    } else if (board[x][y] == TicTacToe.EMPTY) {
      input[index] = 0;
    }

    If the square contains an “X” (cross), then a value of –1 is inserted into the input array. If the square contains an “O” (nought), then a value of 1 is placed in the array.

    If the square contains the current move, then the input is set to –1, or a value of “X,” which is what the neural player is playing.

    if ((x == move.x) && (y == move.y)) {
      input[index] = -1;
    }

    The next element of the input array is then examined.

  index++;
  }
}

    Finally, the output for this input array is computed.

final double output[] = this.network.computeOutputs(input);
return output[0];

    The output is returned to the caller. The higher this value, the more favorable the board position.

Chromosomes

    The tic-tac-toe neural network is trained using a genetic training algorithm. Each chromosome is tested by playing 100 games against its opponent. This causes a score to be generated that allows each chromosome's effectiveness to be evaluated. Each chromosome has a calculateCost method that performs this operation. The signature for calculateCost is shown here:

public void calculateCost() {

    First, the neural network is updated using the gene array.

try {
  this.updateNetwork();

    Next, a PlayerNeural player is constructed to play against the chosen component. To keep this example simple, the neural network player is always player one, and thus always moves first.

  final PlayerNeural player1 = new PlayerNeural(getNetwork());

    The second player is created, and a match is played.

  Player player2;

  player2 = (Player) this.getGeneticAlgorithm().getOpponent()
					.newInstance();
  final ScorePlayer score = new ScorePlayer(player1, player2, false);

    The match is managed by the ScorePlayer class. This class allows two players to play 100 games and a score is generated.

  setCost(score.score());

    The score then becomes the cost for this chromosome.


Copyright 2005 - 2010 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright and trademark information.