This subsection introduces on the training of Proof Cost Networks.
Before launching the solver, a PCN model needs to be trained for it. To train a PCN model, we need three process:
- zero server
- actor
- learner
Before start training, we need a config file.
- Config file can be generated by the following command:
./build/killallgo/killallgo_solver -gen {config file name}
- There are some setting must be changed:
# Program
program_auto_seed=true
# Actor
actor_num_simulation=400
actor_mcts_value_rescale=true
actor_dirichlet_noise_alpha=0.2
actor_resign_threshold=-2
actor_use_random_op=false
# Zero
zero_num_games_per_iteration=2000
zero_actor_ignored_command=reset_actors
# Learner
learner_num_process=6
# Network
nn_num_input_channels=18
nn_input_channel_height=7
nn_input_channel_width=7
nn_num_hidden_channels=256
nn_hidden_channel_height=7
nn_hidden_channel_width=7
nn_num_blocks=3
nn_num_action_channels=2
nn_action_size=50
nn_discrete_value_size=200
nn_board_evaluation_scalar=0
# Environment
env_killallgo_ko_rule=situational
Now we can execute the three process metioned above.
- Zero Server:
./scripts/zero-server.sh killallgo {config file} {model storage path} {iteration}
# ex
./scripts/zero-server.sh killallgo pcn.cfg /7x7Killallgo/Training/pcn 200
- Actor:
./scripts/zero-worker.sh killallgo {zero server host} {zero server port} sp
# ex
./scripts/zero-worker.sh killallgo NV27 9999 sp
- Learner:
./scripts/zero-worker.sh killallgo {zero server host} {zero server port} op
# ex
./scripts/zero-worker.sh killallgo NV27 9999 op
To train a Gumbel PCN, the config file should have something to change.
- Turn on gumbel:
actor_use_dirichlet_noise=false
actor_use_gumbel=true
actor_use_gumbel_noise=true
- Gumbel parameters:
actor_num_simulation=32
actor_gumbel_sample_size=16
- actor_num_simulation: MCTS simulation count
- actor_gumbel_sample_size: Sequential halving threshold
To train a PCN with random opening, the config file should have something to change.
- Turn on random opening:
actor_use_random_op=true
- Random opening parameters:
actor_random_op_max_length=25
actor_random_op_use_softmax=true
actor_random_op_softmax_temperature=2
actor_random_op_softmax_sum_limit=0.7
- actor_random_op_max_length: Opening max length
- actor_random_op_softmax_temperature: Random move softmax temperature
- actor_random_op_softmax_sum_limit: The threshold of the probability sum of the random move softmax distribution