Skip to content

Commit

Permalink
formatting applied
Browse files Browse the repository at this point in the history
  • Loading branch information
izhigal committed Jan 26, 2024
1 parent 7fc56ed commit d48a2d4
Show file tree
Hide file tree
Showing 143 changed files with 1,554 additions and 1,656 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ You can use the the following interface to make an environment. You may optional
* `allow_step_back`: Default `False`. `True` if allowing `step_back` function to traverse backward in the tree.
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_num_players` in Blackjack, .

Once the environemnt is made, we can access some information of the game.
Once the environment is made, we can access some information of the game.
* **env.num_actions**: The number of actions.
* **env.num_players**: The number of players.
* **env.state_shape**: The shape of the state space of the observations.
Expand Down
4 changes: 2 additions & 2 deletions docs/games.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ At each decision point of the game, the corresponding player will be able to obs
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| seen\_cards | Three face-down cards distributed to the landlord after bidding. Then these cards will be made public to all players. | TQA |
| landlord | An integer of landlord's id | 0 |
| self | An integer of current player's id | 2 |
| cls | An integer of current player's id | 2 |
| trace | A list of tuples which records every actions in one game. The first entry of the tuple is player's id, the second is corresponding player's action. | \[(0, '8222'), (1, 'pass'), (2, 'pass'), (0 '6KKK'), (1, 'pass'), (2, 'pass'), (0, '8'), (1, 'Q')\] |
| played\_cards | As the game progresses, the cards which have been played by the three players and sorted from low to high. | \['6', '8', '8', 'Q', 'K', 'K', 'K', '2', '2', '2'\] |
| others\_hand | The union of the other two player's current hand | 333444555678899TTTJJJQQAA2R |
Expand Down Expand Up @@ -134,7 +134,7 @@ If the landlord first get rid of all the cards in his hand, he will win and rece
## Mahjong
Mahjong is a tile-based game developed in China, and has spread throughout the world since 20th century. It is commonly played
by 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until
The goal of the game is to complete the leagal hand using the 14th drawn tile to form 4 sets and a pair.
The goal of the game is to complete the legal hand using the 14th drawn tile to form 4 sets and a pair.
We revised the game into a simple version that all of the winning set are equal, and player will win as long as she complete
forming 4 sets and a pair. Please refer the detail on [Wikipedia](https://en.wikipedia.org/wiki/Mahjong) or [Baike](https://baike.baidu.com/item/麻将/215).

Expand Down
2 changes: 1 addition & 1 deletion docs/high-level-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ Card games usually have similar structures. We abstract some concepts in card ga
To summarize, in one `Game`, a `Dealer` deals the cards for each `Player`. In each `Round` of the game, a `Judger` will make major decisions about the next round and the payoffs in the end of the game.

## Agents
We provide examples of several representative algorithms and wrap them as `Agent` to show how a learning algorithm can be connected to the toolkit. The first example is DQN which is a representative of the Reinforcement Learning (RL) algorithms category. The second example is NFSP which is a representative of the Reinforcement Learning (RL) with self-play. We also provide CFR (chance sampling) and DeepCFR which belong to Conterfactual Regret Minimization (CFR) category. Other algorithms from these three categories can be connected in similar ways.
We provide examples of several representative algorithms and wrap them as `Agent` to show how a learning algorithm can be connected to the toolkit. The first example is DQN which is a representative of the Reinforcement Learning (RL) algorithms category. The second example is NFSP which is a representative of the Reinforcement Learning (RL) with self-play. We also provide CFR (chance sampling) and DeepCFR which belong to Counterfactual Regret Minimization (CFR) category. Other algorithms from these three categories can be connected in similar ways.
2 changes: 1 addition & 1 deletion docs/toy-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ def train(args):
# Seed numpy, torch, random
set_seed(args.seed)

# Initilize CFR Agent
# Initialize CFR Agent
agent = CFRAgent(
env,
os.path.join(
Expand Down
17 changes: 7 additions & 10 deletions examples/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,16 @@
''' An example of evluating the trained models in RLCard
'''
"""An example of evaluating the trained models in RLCard"""
import os
import argparse

import rlcard
from rlcard.agents import (
DQNAgent,
RandomAgent,
)

from rlcard.utils import (
get_device,
set_seed,
tournament,
)


def load_model(model_path, env=None, position=None, device=None):
if os.path.isfile(model_path): # Torch model
import torch
Expand All @@ -29,14 +26,14 @@ def load_model(model_path, env=None, position=None, device=None):
else: # A model in the model zoo
from rlcard import models
agent = models.load(model_path).agents[position]

return agent

def evaluate(args):

def evaluate(args):
# Check whether gpu is available
device = get_device()

# Seed numpy, torch, random
set_seed(args.seed)

Expand All @@ -54,6 +51,7 @@ def evaluate(args):
for position, reward in enumerate(rewards):
print(position, args.models[position], reward)


if __name__ == '__main__':
parser = argparse.ArgumentParser("Evaluation example in RLCard")
parser.add_argument(
Expand Down Expand Up @@ -99,4 +97,3 @@ def evaluate(args):

os.environ["CUDA_VISIBLE_DEVICES"] = args.cuda
evaluate(args)

5 changes: 2 additions & 3 deletions examples/human/blackjack_human.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' A toy example of self playing for Blackjack
'''
"""A toy example of self playing for Blackjack """

import rlcard
from rlcard.agents import RandomAgent as RandomAgent
Expand All @@ -23,7 +22,7 @@

print(">> Blackjack human agent")

while (True):
while True:
print(">> Start a new game")

trajectories, payoffs = env.run(is_training=False)
Expand Down
4 changes: 2 additions & 2 deletions examples/human/gin_rummy_human.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
'''
"""
Project: Gui Gin Rummy
File name: gin_rummy_human.py
Author: William Hale
Date created: 3/14/2020
'''
"""

# You need to install tkinter if it is not already installed.
# Tkinter is Python's defacto standard GUI (Graphical User Interface) package.
Expand Down
5 changes: 2 additions & 3 deletions examples/human/leduc_holdem_human.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' A toy example of playing against pretrianed AI on Leduc Hold'em
'''
"""A toy example of playing against pretrianed AI on Leduc Hold'em"""

import rlcard
from rlcard import models
Expand All @@ -17,7 +16,7 @@

print(">> Leduc Hold'em pre-trained model")

while (True):
while True:
print(">> Start a new game")

trajectories, payoffs = env.run(is_training=False)
Expand Down
5 changes: 2 additions & 3 deletions examples/human/limit_holdem_human.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' A toy example of playing against a random agent on Limit Hold'em
'''
"""A toy example of playing against a random agent on Limit Hold'em"""

import rlcard
from rlcard.agents import LimitholdemHumanAgent as HumanAgent
Expand All @@ -17,7 +16,7 @@

print(">> Limit Hold'em random agent")

while (True):
while True:
print(">> Start a new game")

trajectories, payoffs = env.run(is_training=False)
Expand Down
5 changes: 2 additions & 3 deletions examples/human/nolimit_holdem_human.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' A toy example of playing against pretrianed AI on Leduc Hold'em
'''
"""A toy example of playing against pretrained AI on Leduc Hold'em"""
from rlcard.agents import RandomAgent

import rlcard
Expand All @@ -17,7 +16,7 @@
env.set_agents([human_agent, human_agent2])


while (True):
while True:
print(">> Start a new game")

trajectories, payoffs = env.run(is_training=False)
Expand Down
5 changes: 2 additions & 3 deletions examples/human/uno_human.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' A toy example of playing against rule-based bot on UNO
'''
"""A toy example of playing against rule-based bot on UNO"""

import rlcard
from rlcard import models
Expand All @@ -16,7 +15,7 @@

print(">> UNO rule model V1")

while (True):
while True:
print(">> Start a new game")

trajectories, payoffs = env.run(is_training=False)
Expand Down
4 changes: 1 addition & 3 deletions examples/pettingzoo/run_dmc.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
''' An example of training a Deep Monte-Carlo (DMC) Agent on PettingZoo environments
wrapping RLCard
'''
"""An example of training a Deep Monte-Carlo (DMC) Agent on PettingZoo environments wrapping RLCard"""
import os
import argparse

Expand Down
4 changes: 1 addition & 3 deletions examples/pettingzoo/run_rl.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
''' An example of training a reinforcement learning agent on the PettingZoo
environments that wrap RLCard
'''
"""An example of training a reinforcement learning agent on the PettingZoo environments that wrap RLCard"""
import os
import argparse

Expand Down
7 changes: 4 additions & 3 deletions examples/run_cfr.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' An example of solve Leduc Hold'em with CFR (chance sampling)
'''
"""An example of solve Leduc Hold'em with CFR (chance sampling)"""
import os
import argparse

Expand All @@ -15,6 +14,7 @@
plot_curve,
)


def train(args):
# Make environments, CFR only supports Leduc Holdem
env = rlcard.make(
Expand All @@ -34,7 +34,7 @@ def train(args):
# Seed numpy, torch, random
set_seed(args.seed)

# Initilize CFR Agent
# Initialize CFR Agent
agent = CFRAgent(
env,
os.path.join(
Expand Down Expand Up @@ -71,6 +71,7 @@ def train(args):
# Plot the learning curve
plot_curve(csv_path, fig_path, 'cfr')


if __name__ == '__main__':
parser = argparse.ArgumentParser("CFR example in RLCard")
parser.add_argument(
Expand Down
7 changes: 3 additions & 4 deletions examples/run_dmc.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' An example of training a Deep Monte-Carlo (DMC) Agent on the environments in RLCard
'''
"""An example of training a Deep Monte-Carlo (DMC) Agent on the environments in RLCard"""
import os
import argparse

Expand All @@ -8,8 +7,8 @@
import rlcard
from rlcard.agents.dmc_agent import DMCTrainer

def train(args):

def train(args):
# Make the environment
env = rlcard.make(args.env)

Expand All @@ -29,6 +28,7 @@ def train(args):
# Train DMC Agents
trainer.start()


if __name__ == '__main__':
parser = argparse.ArgumentParser("DMC example in RLCard")
parser.add_argument(
Expand Down Expand Up @@ -95,4 +95,3 @@ def train(args):

os.environ["CUDA_VISIBLE_DEVICES"] = args.cuda
train(args)

3 changes: 1 addition & 2 deletions examples/run_random.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' An example of playing randomly in RLCard
'''
"""An example of playing randomly in RLCard"""
import argparse
import pprint

Expand Down
19 changes: 9 additions & 10 deletions examples/run_rl.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
''' An example of training a reinforcement learning agent on the environments in RLCard
'''
"""An example of training a reinforcement learning agent on the environments in RLCard"""
import os
import argparse

Expand All @@ -16,11 +15,11 @@
plot_curve,
)

def train(args):

def train(args):
# Check whether gpu is available
device = get_device()

# Seed numpy, torch, random
set_seed(args.seed)

Expand All @@ -41,7 +40,7 @@ def train(args):
agent = DQNAgent(
num_actions=env.num_actions,
state_shape=env.state_shape[0],
mlp_layers=[64,64],
mlp_layers=[64, 64],
device=device,
save_path=args.log_dir,
save_every=args.save_every
Expand All @@ -55,8 +54,8 @@ def train(args):
agent = NFSPAgent(
num_actions=env.num_actions,
state_shape=env.state_shape[0],
hidden_layers_sizes=[64,64],
q_mlp_layers=[64,64],
hidden_layers_sizes=[64, 64],
q_mlp_layers=[64, 64],
device=device,
save_path=args.log_dir,
save_every=args.save_every
Expand Down Expand Up @@ -106,6 +105,7 @@ def train(args):
torch.save(agent, save_path)
print('Model saved in', save_path)


if __name__ == '__main__':
parser = argparse.ArgumentParser("DQN/NFSP example in RLCard")
parser.add_argument(
Expand Down Expand Up @@ -163,13 +163,13 @@ def train(args):
type=str,
default='experiments/leduc_holdem_dqn_result/',
)

parser.add_argument(
"--load_checkpoint_path",
type=str,
default="",
)

parser.add_argument(
"--save_every",
type=int,
Expand All @@ -179,4 +179,3 @@ def train(args):

os.environ["CUDA_VISIBLE_DEVICES"] = args.cuda
train(args)

Loading

0 comments on commit d48a2d4

Please sign in to comment.