Role of Trainer¶
Tensorpack follows the “define-and-run” paradigm. Therefore a training script has two steps:
Define: Build graph for the model. Users can call whatever tensorflow functions to setup the graph. Users may or may not use tensorpack
ModelDescor other utilities to build the graph. The goal of this step is to define “what to run” in later training steps, and it can happen either inside or outside tensorpack trainer.
Run: Train the model (the Trainer.train() method):
Finalize graph, initialize session.
Run the training loop.
Assumptions of Base Trainer¶
Q: What types of training can you do with tensorpack?
A: Anything that runs in a loop.
In research we do training of various kind.
Tensorpack trainers avoid making assumptions on what type of training
you want to do (e.g., it doesn’t have to be batched, SGD-like, or have
The only assumption is that your training follows this pattern:
for epoch_num in range(starting_epoch, max_epoch): for local_step in range(steps_per_epoch): run_step()
Training is running some iterations. Tensorpack base trainer implements the logic of running the iteration. Users or derived trainers should implement what the iteration is.
Trainer assumes the existence of “epoch”, i.e. that the iterations run in double for-loops. But
steps_per_epochcan be any number you set and it only affects the schedule of callbacks. In other words, an “epoch” in tensorpack is the default period to run callbacks (validation, summary, checkpoint, etc.).
How Existing (Single-Cost) Trainers Work¶
Most neural network training tasks are single-cost optimization. Tensorpack provides some trainer implementations for such tasks. These trainers will take care of step 1 (define the graph), with the following arguments:
tf.TensorSpec, the signature of the input.
InputSource, where the input come from. See Input Pipeline.
A function which takes input tensors and returns the cost.
A function which returns an optimizer.
Write a Trainer¶
The existing trainers should be enough for data-parallel single-cost optimization tasks. If you just want to do some extra work during training, first consider writing it as a callback, or write an issue to see if there is a better solution than creating new trainers. If your task is fundamentally different from single-cost optimization, you will need to write a trainer.
You can customize the trainer by either using or inheriting the base
You will need to do two things for a new Trainer:
Define the graph. There are 2 ways you can do this:
Create any tensors and ops you need, before creating the trainer.
Create them inside
Define what is the iteration. There are 2 ways to define the iteration:
Trainer.train_opto a TensorFlow operation. This op will be run by default.
Trainerand override the
run_step()method. This way you can do something more than running an op.
Note that trainer has
self.hooked_sess: only the hooked session will trigger the
after_runcallbacks. If you need more than one
Session.runin one steps, special care needs to be taken to choose which session to use, because many states (global steps, StagingArea, summaries) are maintained through
There are several different GAN trainers for reference.