Understand Trainer¶
How Existing (Single-Cost) Trainers Work¶
Most neural network training tasks are single-cost optimization. Tensorpack provides some trainer implementations for such tasks. These trainers will take care help you define the graph, with the following arguments:
Some
tf.TensorSpec
, the signature of the input.An
InputSource
, where the input come from. See Input Pipeline.A function which takes input tensors and returns the cost.
A function which returns an optimizer.
These are documented in SingleCostTrainer.setup_graph. In practice you’ll not use this method directly, but use high-level interface instead.
Tower Trainer¶
TowerTrainer
is a trainer that uses user-provided “tower function” to build models.
All existing trainers in tensorpack are subclass of TowerTrainer
,
because this concept is able to cover most types of neural-network training tasks.
What is Tower Function¶
Following the terminology in TensorFlow,
a tower function is a callable that takes input tensors and adds one replicate of the model to the graph.
In short, tower function builds your model.
If you can write a function that builds your model, then you can use TowerTrainer
.
The concept of “tower” is used mainly to support:
Data-parallel multi-GPU training, where a replicate is built on each GPU.
Graph construction for inference, where a replicate is built under inference mode.
A user needs to provide a tower function to use TowerTrainer
.
In particular, when working with the commonly used ModelDesc
interface, the build_graph
method will be part of the tower function.
Rules of Tower Function¶
The tower function needs to follow some rules:
It may get called multiple times for data-parallel training or inference. As a result:
You’ll need to be careful when modifying global states, e.g. adding ops to collections, setting attributes of a model instance.
To use a tensorflow-hub module, you need to initialize the module outside the tower function, and call the module inside the tower function.
It must respect variable collections:
(Required) Only put variables trainable by gradient descent into
TRAINABLE_VARIABLES
.(Recommended) Put non-trainable variables that need to be used in inference into
MODEL_VARIABLES
.
It must respect variable scope names:
The name of any trainable variables created in the function must be like “variable_scope_name/other/scopes/and/name”. Strictly speaking, the name of any trainable variables must:
Start with the name of the enclosing variable_scope when the tower function is called.
Not use the same variable_scope’s name twice in its name.
Not depend on name_scope’s name.
Not depend on any tensor’s name (because the tensor’s name may depend on name_scope’s name).
Tensorpack layers create variables based on the name given to the layer: e.g.,
Conv2D('test', x)
will open a variable scope named “test”. In order to respect the above rules, the name of the layer must not depend on name_scope’s name or any tensor’s name.It must respect variable scope reuse:
The creation of any trainable variables must respect reuse variable scope. To respect variable reuse (i.e. sharing), use
tf.get_variable
instead oftf.Variable
in the function.On the other hand, for a non-trainable variable, it may be desirable to not reuse it between towers. In this case,
tf.Variable
can be used to ensure creation of new variables in each tower even whenreuse=True
.Do not modify the reuse option (e.g., by
scope.reuse_variables()
) of a variable scope that is not created by you. This affects other’s code. You can always open new scopes if you need the reuse option.
It must not create scopes or variables containing the name ‘tower’, as it is reserved for special use.
These conventions are easy to follow, and most layer wrappers (e.g., tf.layers/slim/tensorlayer) do follow them. Note that certain Keras layers do not follow these conventions and will need some workarounds if used within tensorpack.
What You Can Do Inside a Tower Function¶
Call any symbolic functions as long as they follow the above rules.
The tower function will be called under a TowerContext, which can be accessed by get_current_tower_context(). The context contains information about training/inference mode, scope name, etc. You can use the context to build a different graph under different mode.
Write a Trainer¶
The existing trainers should be enough for single-dataloader data-parallel single-cost optimization tasks. If you just want to do some extra work during training, first consider writing it as a callback, or write an issue to see if there is a better solution than creating new trainers. If your task is fundamentally different from single-cost optimization, you will need to write a trainer.
You can customize the trainer by either using or inheriting the Trainer
/TowerTrainer
class.
You will need to do two things for a new Trainer:
Define the graph. There are 2 ways you can do this:
Create any tensors and ops you need, before creating the trainer.
Create them inside
Trainer.__init__
.
Define what is the iteration. There are 2 ways to define the iteration:
Set
Trainer.train_op
to a TensorFlow operation. This op will be run by default.Subclass
Trainer
and override therun_step()
method. This way you can do something more than running an op.Note that trainer has
self.sess
andself.hooked_sess
: only the hooked session will trigger thebefore_run
/after_run
callbacks. If you need more than oneSession.run
in one steps, special care needs to be taken to choose which session to use, because many states (global steps, StagingArea, summaries) are maintained throughbefore_run
/after_run
.
If you want to write a new trainer, Tensorpack examples include several different GAN trainers for a reference.