Quick Start Benchmarking

Execution 

To run an experiment, simple execute the following command with configuration file path input

python3 light_malib/main_pb.py --config expr_configs/xxx/.../xxx.yaml

All the configuration files can be found in .expr_configs/, including those for academy scenarios, full-game scenarios and a PSRO trial.

Each configuration file define detailed settings for rollout, training, data storage, logging, model setup, population and more.

The framework section defines type of learning

`max_round`	maximum number of generation (only in PBT)
`meta_solver`	type of meta solver (only in PBT)
`sync_training`	synchronous if True else asynchronous mode
`stopper.max_steps`	maximum number of rollout iteration

The rollout manager section defines rollout settings

`num_worker`	number of parallel rollout worker
`batch_size`	size of data batch collected
`rollout_length`	rollout episode maximum length
`sample_length`	truncation length (no trucation if 0)
`envs`	configs of the environment

The training manager section defines training settings

`batch_size`	training batch size
`num_trainers`	number of trainers (or GPU number)
`rollout_length`	rollout episode maximum length
`optimizer`	optimizer type
`actor_lr`	learning rate of the actor
`critic_lr`	learning rate of the critic

The data server section defines data storage

`capacity`	data table size
`sampler_type`	data sampling scheme
`sample_max_usage`	maximum reusage of each sample

The population section define the whole population, including the trainable policies. For each algorithm:

Model config:

`model`	model type (actor critic type, feature encoder type)
`initialization`	model initialization
`actor`	actor network setting
`critic`	critic network setting

Custom config:

Policy init cfg

`agent_0`	population pool for agent 0
`init_cfg`	how agent 0 is initialize in each condition (random, pretrained or interit)
`agent_1`	population pool for agent 1

The learning statistics will be recorded in ./logs path, including the tensorboard file, saved config file, saved policies and others.