tensorpack.callbacks package

Everything other than the training iterations happen in the callbacks. Most of the fancy things you want to do will probably end up here. See relevant tutorials: Callbacks.

class tensorpack.callbacks.Callback[source]

Bases: object

Base class for all callbacks. See Write a Callback for more detailed explanation of the callback methods.














the trainer.




the graph.




These attributes are available only after (and including) _setup_graph().


Called before finalizing the graph. Override this method to setup the ops used in the callback. This is the same as tf.train.SessionRunHook.begin().


Called right before the first iteration. The main difference to setup_graph is that at this point the graph is finalized and a default session is initialized. Override this method to, e.g. run some operations under the session.

This is similar to tf.train.SessionRunHook.after_create_session(), but different: it is called after the session is initialized by tfutils.SessionInit.


Called after training.


It is called before every hooked_sess.run() call, and it registers some extra op/tensors to run in the next call. This method is the same as tf.train.SessionRunHook.before_run. Refer to TensorFlow docs for more details.

_after_run(run_context, run_values)[source]

It is called after every hooked_sess.run() call, and it processes the values requested by the corresponding before_run(). It is equivalent to tf.train.SessionRunHook.after_run(), refer to TensorFlow docs for more details.


Called right before each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately before each epoch.


Called right after each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately after each epoch.


Called after each Trainer.run_step() completes. Defaults to no-op.

You can override it to implement, e.g. a ProgressBar.


Called after the completion of every epoch. Defaults to call self.trigger()


Override this method to define a general trigger behavior, to be used with trigger schedulers. Note that the schedulers (e.g. PeriodicTrigger) might call this method both inside an epoch and after an epoch.

When used without the scheduler, this method by default will be called by trigger_epoch().

property chief_only

Only run this callback on chief training process.

Returns: bool


Get tensors in the graph with the given names. Will automatically check for the first training tower if no existing tensor is found with the name.



name_scope = ''

A name scope for ops created inside this callback. By default to the name of the class, but can be set per-instance.


Set chief_only property, and returns the callback itself.

class tensorpack.callbacks.ProxyCallback(cb)[source]

Bases: tensorpack.callbacks.base.Callback

A callback which proxy all methods to another callback. It’s useful as a base class of callbacks which decorate other callbacks.


cb (Callback) – the underlying callback

class tensorpack.callbacks.CallbackFactory(setup_graph=None, before_train=None, trigger=None, after_train=None)[source]

Bases: tensorpack.callbacks.base.Callback

Create a callback with some lambdas.

__init__(setup_graph=None, before_train=None, trigger=None, after_train=None)[source]

Each lambda takes self as the only argument.

class tensorpack.callbacks.StartProcOrThread(startable, stop_at_last=True)[source]

Bases: tensorpack.callbacks.base.Callback

Start some threads or processes before training.

__init__(startable, stop_at_last=True)[source]
  • startable (list) – list of processes or threads which have start() method. Can also be a single instance of process of thread.

  • stop_at_last (bool) – whether to stop the processes or threads after training. It will use Process.terminate() or StoppableThread.stop(), but will do nothing on normal threading.Thread or other startable objects.

class tensorpack.callbacks.RunOp(op, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]

Bases: tensorpack.callbacks.base.Callback

Run an Op.

__init__(op, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]
  • op (tf.Operation or function) – an Op, or a function that returns the Op in the graph. The function will be called after the main graph has been created (in the setup_graph() callback).

  • run_before (bool) – run the Op before training

  • run_as_trigger (bool) – run the Op on every trigger() call.

  • run_step (bool) – run the Op every step (along with training)

  • verbose (bool) – print logs when the op is run.


The DQN Example uses this callback to update target network.

class tensorpack.callbacks.RunUpdateOps(collection=None)[source]

Bases: tensorpack.callbacks.graph.RunOp

Run ops from the collection UPDATE_OPS every step. The ops will be hooked to trainer.hooked_sess and run along with each hooked_sess.run call.

Be careful when using UPDATE_OPS if your model contains more than one sub-networks. Perhaps not all updates are supposed to be executed in every iteration.

This callback is one of the DEFAULT_CALLBACKS().


collection (str) – collection of ops to run. Defaults to tf.GraphKeys.UPDATE_OPS

class tensorpack.callbacks.ProcessTensors(names, fn)[source]

Bases: tensorpack.callbacks.base.Callback

Fetch extra tensors along with each training step, and call some function over the values. It uses _{before,after}_run method to inject tf.train.SessionRunHooks to the session. You can use it to print tensors, save tensors to file, etc.


ProcessTensors(['mycost1', 'mycost2'], lambda c1, c2: print(c1, c2, c1 + c2))
__init__(names, fn)[source]
  • names (list[str]) – names of tensors

  • fn – a function taking all requested tensors as input

class tensorpack.callbacks.DumpTensors(names)[source]

Bases: tensorpack.callbacks.graph.ProcessTensors

Dump some tensors to a file. Every step this callback fetches tensors and write them to a npz file under logger.get_logger_dir. The dump can be loaded by dict(np.load(filename).items()).


names (list[str]) – names of tensors

class tensorpack.callbacks.DumpTensorAsImage(tensor_name, prefix=None, map_func=None, scale=255)[source]

Bases: tensorpack.callbacks.base.Callback

Dump a tensor to image(s) to logger.get_logger_dir() once triggered.

Note that it requires the tensor is directly evaluable, i.e. either inputs are not its dependency (e.g. the weights of the model), or the inputs are feedfree (in which case this callback will take an extra datapoint from the input pipeline).

__init__(tensor_name, prefix=None, map_func=None, scale=255)[source]
  • tensor_name (str) – the name of the tensor.

  • prefix (str) – the filename prefix for saved images. Defaults to the Op name.

  • map_func – map the value of the tensor to an image or list of images of shape [h, w] or [h, w, c]. If None, will use identity.

  • scale (float) – a multiplier on pixel values, applied after map_func.

class tensorpack.callbacks.CheckNumerics(run_as_trigger=True, run_step=False)[source]

Bases: tensorpack.callbacks.graph.RunOp

Check variables in the graph for NaN and Inf. Raise an exception if such an error is found.

__init__(run_as_trigger=True, run_step=False)[source]

Args: same as in RunOp.

class tensorpack.callbacks.Callbacks(cbs)[source]

Bases: tensorpack.callbacks.base.Callback

A container to hold all callbacks, and trigger them iteratively.

This is only used by the base trainer to run all the callbacks. Users do not need to use this class.


cbs (list) – a list of Callback instances.

class tensorpack.callbacks.CallbackToHook(cb)[source]

Bases: tensorflow.python.training.session_run_hook.SessionRunHook

Hooks are less powerful than callbacks so the conversion is incomplete. It only converts the before_run/after_run calls.

This is only for internal implementation of before_run/after_run callbacks. You shouldn’t need to use this.

class tensorpack.callbacks.HookToCallback(hook)[source]

Bases: tensorpack.callbacks.base.Callback

Make a tf.train.SessionRunHook into a callback. Note that when SessionRunHook.after_create_session is called, the coord argument will be None.


hook (tf.train.SessionRunHook) –

class tensorpack.callbacks.TFLocalCLIDebugHook(*args, **kwargs)[source]

Bases: tensorpack.callbacks.hooks.HookToCallback

Use the hook tfdbg.LocalCLIDebugHook in tensorpack.

__init__(*args, **kwargs)[source]

kwargs (args,) – arguments to create tfdbg.LocalCLIDebugHook. Refer to tensorflow documentation for details.

add_tensor_filter(*args, **kwargs)[source]

Wrapper of tfdbg.LocalCLIDebugHook.add_tensor_filter. Refer to tensorflow documentation for details.

class tensorpack.callbacks.ScalarStats(names, prefix='validation')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Statistics of some scalar tensor. The value will be averaged over all given datapoints.

Note that the average of accuracy over all batches is not necessarily the accuracy of the whole dataset. See ClassificationError for details.

__init__(names, prefix='validation')[source]
  • names (list or str) – list of names or just one name. The corresponding tensors have to be scalar.

  • prefix (str) – a prefix for logging

class tensorpack.callbacks.Inferencer[source]

Bases: tensorpack.callbacks.base.Callback

Base class of Inferencer. Inferencer is a special kind of callback that should be called by InferenceRunner. It has the methods _get_fetches and _on_fetches which are like SessionRunHooks, except that they will be used only by InferenceRunner.


Called before a new round of inference starts.


Called after a round of inference ends. Returns a dict of scalar statistics which will be logged to monitors.


To be implemented by subclasses


To be implemented by subclasses


Return a list of tensor names (guaranteed not op name) this inferencer needs.


Called after each new datapoint finished the forward inference.


results (list) – list of results this inferencer fetched. Has the same length as self._get_fetches().

class tensorpack.callbacks.ClassificationError(wrong_tensor_name='incorrect_vector', summary_name='validation_error')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute true classification error in batch mode, from a wrong tensor.

The wrong tensor is supposed to be an binary vector containing whether each sample in the batch is incorrectly classified. You can use tf.nn.in_top_k to produce this vector.

This Inferencer produces the “true” error, which could be different from ScalarStats('error_rate'). It takes account of the fact that batches might not have the same size in testing (because the size of test set might not be a multiple of batch size). Therefore the result can be different from averaging the error rate of each batch.

You can also use the “correct prediction” tensor, then this inferencer will give you “classification accuracy” instead of error.

__init__(wrong_tensor_name='incorrect_vector', summary_name='validation_error')[source]
  • wrong_tensor_name (str) – name of the wrong binary vector tensor.

  • summary_name (str) – the name to log the error with.

class tensorpack.callbacks.BinaryClassificationStats(pred_tensor_name, label_tensor_name, prefix='val')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute precision / recall in binary classification, given the prediction vector and the label vector.

__init__(pred_tensor_name, label_tensor_name, prefix='val')[source]
  • pred_tensor_name (str) – name of the 0/1 prediction tensor.

  • label_tensor_name (str) – name of the 0/1 label tensor.

class tensorpack.callbacks.InferenceRunnerBase(input, infs)[source]

Bases: tensorpack.callbacks.base.Callback

Base class for inference runner.


  1. InferenceRunner will use input.size() to determine how much iterations to run, so you’re responsible to ensure that input.size() is accurate.

  2. Only works with instances of TowerTrainer.

__init__(input, infs)[source]

hook (tf.train.SessionRunHook) –

class tensorpack.callbacks.InferenceRunner(input, infs, tower_name='InferenceTower', tower_func=None, device=0)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

A callback that runs a list of Inferencer on some InputSource.

__init__(input, infs, tower_name='InferenceTower', tower_func=None, device=0)[source]
  • input (InputSource or DataFlow) – The InputSource to run inference on. If given a DataFlow, will use FeedInput.

  • infs (list) – a list of Inferencer instances.

  • tower_name (str) – the name scope of the tower to build. If multiple InferenceRunner are used, each needs a different tower_name.

  • tower_func (tfutils.TowerFunc or None) – the tower function to be used to build the graph. By defaults to call trainer.tower_func under a training=False TowerContext, but you can change it to a different tower function if you need to inference with several different graphs.

  • device (int) – the device to use

class tensorpack.callbacks.DataParallelInferenceRunner(input, infs, gpus, tower_name='InferenceTower', tower_func=None)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

Inference with data-parallel support on multiple GPUs. It will build one predict tower on each GPU, and run prediction with a large total batch in parallel on all GPUs. It will run the remainder (when the total size of input is not a multiple of #GPU) sequentially.

__init__(input, infs, gpus, tower_name='InferenceTower', tower_func=None)[source]
  • input (DataFlow or QueueInput) –

  • gpus (int or list[int]) – #gpus, or list of GPU id

  • tower_name (str) – the name scope of the tower to build. If multiple InferenceRunner are used, each needs a different tower_name.

  • tower_func (tfutils.TowerFunc or None) – the tower function to be used to build the graph. The tower function will be called under a training=False TowerContext. The default is trainer.tower_func, but you can change it to a different tower function if you need to inference with several different models.


Args: hook (tf.train.SessionRunHook):

class tensorpack.callbacks.SendStat(command, names)[source]

Bases: tensorpack.callbacks.base.Callback

An equivalent of SendMonitorData, but as a normal callback.

class tensorpack.callbacks.InjectShell(file='INJECT_SHELL.tmp', shell='ipython')[source]

Bases: tensorpack.callbacks.base.Callback

Allow users to create a specific file as a signal to pause and iteratively debug the training. Once the trigger() method is called, it detects whether the file exists, and opens an IPython/pdb shell if yes. In the shell, self is this callback, self.trainer is the trainer, and from that you can access everything else.


callbacks=[InjectShell('/path/to/pause-training.tmp'), ...]

# the following command will pause the training and start a shell when the epoch finishes:
$ touch /path/to/pause-training.tmp
__init__(file='INJECT_SHELL.tmp', shell='ipython')[source]
  • file (str) – if this file exists, will open a shell.

  • shell (str) – one of ‘ipython’, ‘pdb’

class tensorpack.callbacks.EstimatedTimeLeft(last_k_epochs=5, median=True)[source]

Bases: tensorpack.callbacks.base.Callback

Estimate the time left until completion of training.

__init__(last_k_epochs=5, median=True)[source]
  • last_k_epochs (int) – Use the time spent on last k epochs to estimate total time left.

  • median (bool) – Use the mean or median time spent on last k epochs.

class tensorpack.callbacks.MonitorBase[source]

Bases: tensorpack.callbacks.base.Callback

Base class for monitors which monitor a training progress, by processing different types of summary/statistics from trainer.


Override this method to setup the monitor.

process(name, val)[source]

Process a key-value pair.


evt (tf.Event) – the most basic format acceptable by tensorboard. It could include Summary, RunMetadata, LogMessage, and more.

process_image(name, val)[source]

val (np.ndarray) – 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.

process_scalar(name, val)[source]

val – a scalar


Process a tf.Summary.

class tensorpack.callbacks.Monitors(monitors)[source]

Bases: tensorpack.callbacks.base.Callback

Merge monitors together for trainer to use.

In training, each trainer will create a Monitors instance, and you can access it through trainer.monitors. You should use trainer.monitors for logging and it will dispatch your logs to each sub-monitor.


Get a history of the scalar value of some data.

If you run multiprocess training, keep in mind that the data is perhaps only available on chief process.


a list of (global_step, value) pairs – history data for this scalar


Get latest scalar value of some data.

If you run multiprocess training, keep in mind that the data is perhaps only available on chief process.




Put an tf.Event. step and wall_time fields of tf.Event will be filled automatically.


evt (tf.Event) –

put_image(name, val)[source]

Put an image.

  • name (str) –

  • val (np.ndarray) – 2D, 3D (HWC) or 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.

put_scalar(name, val)[source]

Put a scalar.


Put a tf.Summary.

class tensorpack.callbacks.TFEventWriter(logdir=None, max_queue=10, flush_secs=120, split_files=False)[source]

Bases: tensorpack.callbacks.monitor.MonitorBase

Write summaries to TensorFlow event file.

__init__(logdir=None, max_queue=10, flush_secs=120, split_files=False)[source]
  • logdirlogger.get_logger_dir() by default.

  • flush_secs (max_queue,) – Same as in tf.summary.FileWriter.

  • split_files – if True, split events to multiple files rather than append to a single file. Useful on certain filesystems where append is expensive.

class tensorpack.callbacks.JSONWriter[source]

Bases: tensorpack.callbacks.monitor.MonitorBase

Write all scalar data to a json file under logger.get_logger_dir(), grouped by their global step. If found an earlier json history file, will append to it.

FILENAME = 'stats.json'

The name of the json file. Do not change it.

static load_existing_epoch_number(dir=None)[source]

Try to load the latest epoch number from an existing json stats file (if any). Returns None if not found.

static load_existing_json(dir=None)[source]

Look for an existing json under dir (defaults to logger.get_logger_dir()) named “stats.json”, and return the loaded list of statistics if found. Returns None otherwise.

class tensorpack.callbacks.ScalarPrinter(enable_step=False, enable_epoch=True, whitelist=None, blacklist=None)[source]

Bases: tensorpack.callbacks.monitor.MonitorBase

Print scalar data into terminal.

__init__(enable_step=False, enable_epoch=True, whitelist=None, blacklist=None)[source]
  • enable_epoch (enable_step,) – whether to print the monitor data (if any) between steps or between epochs.

  • whitelist (list[str] or None) – A list of regex. Only names matching some regex will be allowed for printing. Defaults to match all names.

  • blacklist (list[str] or None) – A list of regex. Names matching any regex will not be printed. Defaults to match no names.

class tensorpack.callbacks.SendMonitorData(command, names)[source]

Bases: tensorpack.callbacks.monitor.MonitorBase

Execute a command with some specific scalar monitor data. This is useful for, e.g. building a custom statistics monitor.

It will try to send once receiving all the stats

__init__(command, names)[source]
  • command (str) – a command to execute. Use format string with stat names as keys.

  • names (list or str) – data name(s) to use.


Send the stats to your phone through pushbullet:

SendMonitorData('curl -u your_id: https://api.pushbullet.com/v2/pushes \
         -d type=note -d title="validation error" \
         -d body={validation_error} > /dev/null 2>&1',
class tensorpack.callbacks.CometMLMonitor(experiment=None, tags=None, **kwargs)[source]

Bases: tensorpack.callbacks.monitor.MonitorBase

Send scalar data and the graph to https://www.comet.ml.


  1. comet_ml requires you to import comet_ml before importing tensorflow or tensorpack.

  2. The “automatic output logging” feature of comet_ml will make the training progress bar appear to freeze. Therefore the feature is disabled by default.

__init__(experiment=None, tags=None, **kwargs)[source]
  • experiment (comet_ml.Experiment) – if provided, invalidate all other arguments

  • tags (list[str]) – experiment tags

  • kwargs – arguments used to initialize comet_ml.Experiment, such as project name, API key, etc. Refer to its documentation for details.

property experiment

The comet_ml.Experiment instance.

class tensorpack.callbacks.HyperParam[source]

Bases: object

Base class for a hyperparam.

abstract get_value()[source]

Get the value of the param.

property readable_name

A name to display

abstract set_value(v)[source]

Set the value of the param.


v – the value to be set


setup the graph in setup_graph callback stage, if necessary

class tensorpack.callbacks.GraphVarParam(name, shape=)[source]

Bases: tensorpack.callbacks.param.HyperParam

A variable in the graph (e.g. learning_rate) can be a hyperparam.

__init__(name, shape=)[source]
  • name (str) – name of the variable.

  • shape (tuple) – shape of the variable.


Evaluate the variable.


Assign the variable a new value.


Will setup the assign operator for that variable.

class tensorpack.callbacks.ObjAttrParam(obj, attrname, readable_name=None)[source]

Bases: tensorpack.callbacks.param.HyperParam

An attribute of an object can be a hyperparam.

__init__(obj, attrname, readable_name=None)[source]
  • obj – the object

  • attrname (str) – the attribute

  • readable_name (str) – The name to display and set with. Defaults to be attrname.


Get the value of the param.


Set the value of the param.


v – the value to be set

class tensorpack.callbacks.HyperParamSetter(param)[source]

Bases: tensorpack.callbacks.base.Callback

An abstract base callback to set hyperparameters.

Once the trigger() method is called, the method _get_value_to_set() will be used to get a new value for the hyperparameter.


param (HyperParam or str) – if is a str, it is assumed to be a GraphVarParam.


The current value of the param.


The value to assign to the variable.


Subclasses will implement the abstract method _get_value_to_set(), which should return a new value to set, or return None to do nothing.

class tensorpack.callbacks.HumanHyperParamSetter(param, file_name='hyper.txt')[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameter by loading the value from a file each time it get called. This is useful for manually tuning some parameters (e.g. learning_rate) without interrupting the training.

__init__(param, file_name='hyper.txt')[source]
  • param – same as in HyperParamSetter.

  • file_name (str) – a file containing the new value of the parameter. Each line in the file is a k:v pair, for example, learning_rate:1e-4. If the pair is not found, the param will not be changed.

class tensorpack.callbacks.ScheduledHyperParamSetter(param, schedule, interp=None, step_based=False, set_at_beginning=True)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameters by a predefined epoch-based schedule.

__init__(param, schedule, interp=None, step_based=False, set_at_beginning=True)[source]
  • param – same as in HyperParamSetter.

  • schedule (list) – with the format [(epoch1, val1), (epoch2, val2), (epoch3, val3)]. Each (ep, val) pair means to set the param to “val” after the completion of epoch ep. If ep == 0, the value will be set before the first epoch (because by default the first is epoch 1). The epoch numbers have to be increasing.

  • interp (str or None) – Either None or ‘linear’. If None, the parameter will only be set when the specific epoch or steps is reached exactly. If ‘linear’, perform linear interpolation (but no extrapolation) every time this callback is triggered.

  • step_based (bool) – interpret schedule as (step, value) instead of (epoch, value).

  • set_at_beginning (bool) – at the start of training, the current value may be different from the expected value according to the schedule. If this option is True, set the value anyway even though the current epoch/step is not at the scheduled time. If False, the value will only be set according to the schedule, i.e. it will only be set if the current epoch/step is at the scheduled time.


                          [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]),
class tensorpack.callbacks.StatMonitorParamSetter(param, stat_name, value_func, threshold, last_k, reverse=False)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Change the param by monitoring the change of a scalar statistics. The param will be changed when the scalar does not decrease/increase enough.

Once triggered, this callback observes the latest one value of stat_name, from the monitor backend.

This callback will then change a hyperparameter param by new_value = value_func(old_value), if: min(history) >= history[0] - threshold, where history = [the most recent k observations of stat_name]


The statistics of interest must be created at a frequency higher than or equal to this callback. For example, using PeriodicTrigger(StatMonitorParamSetter(...), every_k_steps=100) is meaningless if the statistics to be monitored is only updated every 500 steps.

Callbacks are executed in order. Therefore, if the statistics to be monitored is created after this callback, the behavior of this callback may get delayed.


If validation error wasn’t decreasing for 5 epochs, decay the learning rate by 0.2:

StatMonitorParamSetter('learning_rate', 'val-error',
                        lambda x: x * 0.2, threshold=0, last_k=5)
__init__(param, stat_name, value_func, threshold, last_k, reverse=False)[source]
  • param – same as in HyperParamSetter.

  • stat_name (str) – name of the statistics.

  • value_func (float -> float) – a function which returns a new value taking the old value.

  • threshold (float) – change threshold.

  • last_k (int) – use last k observations of statistics.

  • reverse (bool) – monitor increasing instead of decreasing. If True, param will be changed when max(history) <= history[0] + threshold.

class tensorpack.callbacks.HyperParamSetterWithFunc(param, func)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set the parameter by a function of epoch num and old value.

__init__(param, func)[source]
  • param – same as in HyperParamSetter.

  • funcparam will be set by new_value = func(epoch_num, old_value). epoch_num is the number of epochs that have finished.


Decrease by a factor of 0.9 every two epochs:

                         lambda e, x: x * 0.9 if e % 2 == 0 else x)
class tensorpack.callbacks.GPUUtilizationTracker(devices=None)[source]

Bases: tensorpack.callbacks.base.Callback

Summarize the average GPU utilization within an epoch.

It will start a process to obtain GPU utilization through NVML every second within the epoch (the trigger_epoch time was not included), and write average utilization to monitors.

This callback creates a process, therefore it’s not safe to be used with MPI.


devices (list[int]) – physical GPU ids to monitor. If None, will guess from the environment.

static worker(evt, rst_queue, stop_evt, devices)[source]

devices (list[int]) –

class tensorpack.callbacks.GraphProfiler(dump_metadata=False, dump_tracing=True, dump_event=False)[source]

Bases: tensorpack.callbacks.base.Callback

Enable profiling by installing session hooks, and write tracing files / events / metadata to logger.get_logger_dir().

The tracing files can be loaded from chrome://tracing. The metadata files can be processed by tfprof command line utils. The event is viewable from tensorboard.


Note that the profiling is by default enabled for every step and is expensive. You probably want to schedule it less frequently, e.g.:

    GraphProfiler(dump_tracing=True, dump_event=True),
    lambda self: self.trainer.global_step > 20 and self.trainer.global_step < 30)
__init__(dump_metadata=False, dump_tracing=True, dump_event=False)[source]
  • dump_metadata (bool) – Dump tf.RunMetadata to be used with tfprof.

  • dump_tracing (bool) – Dump chrome tracing files.

  • dump_event (bool) – Dump to an event processed by FileWriter and will be shown in TensorBoard.

class tensorpack.callbacks.GPUMemoryTracker(devices=0)[source]

Bases: tensorpack.callbacks.base.Callback

Track peak memory used on each GPU device every epoch, by tf.contrib.memory_stats. The peak memory comes from the MaxBytesInUse op, which is the peak memory used in recent session.run calls. See https://github.com/tensorflow/tensorflow/pull/13107.


devices ([int] or [str]) – list of GPU devices to track memory on.

class tensorpack.callbacks.HostMemoryTracker[source]

Bases: tensorpack.callbacks.base.Callback

Track free RAM on the host.

When triggered, it writes the size of free RAM into monitors.

class tensorpack.callbacks.ThroughputTracker(samples_per_step=None)[source]

Bases: tensorpack.callbacks.base.Callback

This callback writes the training throughput (in terms of either steps/sec, or samples/sec) to the monitors everytime it is triggered. The throughput is computed based on the duration between the consecutive triggers.

The time spent on callbacks after each epoch is excluded.


samples_per_step (int or None) – total number of samples processed in each step (i.e., your total batch size in each step). If not provided, this callback will record “steps/sec” instead of “samples/sec”.

class tensorpack.callbacks.ModelSaver(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections=None)[source]

Bases: tensorpack.callbacks.base.Callback

Save the model once triggered.

__init__(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections=None)[source]
  • max_to_keep (int) – the same as in tf.train.Saver.

  • keep_checkpoint_every_n_hours (float) – the same as in tf.train.Saver. Note that “keep” does not mean “create”, but means “don’t delete”.

  • checkpoint_dir (str) – Defaults to logger.get_logger_dir().

  • var_collections (str or list of str) – collection of the variables (or list of collections) to save.

class tensorpack.callbacks.MinSaver(monitor_stat, reverse=False, filename=None, checkpoint_dir=None)[source]

Bases: tensorpack.callbacks.base.Callback

Separately save the model with minimum value of some statistics.

__init__(monitor_stat, reverse=False, filename=None, checkpoint_dir=None)[source]
  • monitor_stat (str) – the name of the statistics.

  • reverse (bool) – if True, will save the maximum.

  • filename (str) – the name for the saved model. Defaults to min-{monitor_stat}.tfmodel.

  • checkpoint_dir (str) – the directory containing checkpoints.


Save the model with minimum validation error to “min-val-error.tfmodel”:



  1. It assumes that ModelSaver is used with the same checkpoint_dir and appears earlier in the callback list. The default for both ModelSaver and MinSaver is checkpoint_dir=logger.get_logger_dir()

  2. Callbacks are executed in the order they are defined. Therefore you’d want to use this callback after the callback (e.g. InferenceRunner) that produces the statistics.

class tensorpack.callbacks.MaxSaver(monitor_stat, filename=None, checkpoint_dir=None)[source]

Bases: tensorpack.callbacks.saver.MinSaver

Separately save the model with maximum value of some statistics.

See docs of MinSaver for details.

__init__(monitor_stat, filename=None, checkpoint_dir=None)[source]
  • monitor_stat (str) – the name of the statistics.

  • filename (str) – the name for the saved model. Defaults to max-{monitor_stat}.tfmodel.

class tensorpack.callbacks.TensorPrinter(names)[source]

Bases: tensorpack.callbacks.base.Callback

Prints the value of some tensors in each step. It’s an example of how before_run/after_run works.


names (list) – list of string, the names of the tensors to print.

class tensorpack.callbacks.ProgressBar(names=)[source]

Bases: tensorpack.callbacks.base.Callback

A progress bar based on tqdm.

This callback is one of the DEFAULT_CALLBACKS().


names (tuple[str]) – the names of the tensors to monitor on the progress bar.

class tensorpack.callbacks.SessionRunTimeout(timeout_in_ms)[source]

Bases: tensorpack.callbacks.base.Callback

Add timeout option to each sess.run call.


timeout_in_ms (int) –

class tensorpack.callbacks.MovingAverageSummary(collection='MOVING_SUMMARY_OPS', train_op=None)[source]

Bases: tensorpack.callbacks.base.Callback

Maintain the moving average of summarized tensors in every step, by ops added to the collection. Note that it only maintains the moving averages by updating the relevant variables in the graph, the actual summary should be done in other callbacks.

This callback is one of the DEFAULT_CALLBACKS().

__init__(collection='MOVING_SUMMARY_OPS', train_op=None)[source]
  • collection (str) – the collection of EMA-maintaining ops. The default value would work with the tensors you added by tfutils.summary.add_moving_summary(), but you can use other collections as well.

  • train_op (tf.Operation or str) – the (name of) training op to associate the maintaing ops with. If not provided, the EMA-maintaining ops will be hooked to trainer.hooked_session and be executed in every iteration. Otherwise, the EMA-maintaining ops will be executed whenever the training op is executed.

tensorpack.callbacks.MergeAllSummaries(period=0, run_alone=False, key=None)[source]

Evaluate all summaries by tf.summary.merge_all, and write them to logs.

This callback is one of the DEFAULT_CALLBACKS().

  • period (int) – by default the callback summarizes once every epoch. This option (if not set to 0) makes it additionally summarize every period steps.

  • run_alone (bool) – whether to evaluate the summaries alone. If True, summaries will be evaluated after each epoch alone. If False, summaries will be evaluated together with the sess.run calls, in the last step of each epoch. For SimpleTrainer, it needs to be False because summary may depend on inputs.

  • key (str) – the collection of summary tensors. Same as in tf.summary.merge_all. Default is tf.GraphKeys.SUMMARIES.

class tensorpack.callbacks.SimpleMovingAverage(tensors, window_size)[source]

Bases: tensorpack.callbacks.base.Callback

Monitor Simple Moving Average (SMA), i.e. an average within a sliding window, of some tensors.

__init__(tensors, window_size)[source]
  • tensors (str or [str]) – names of tensors

  • window_size (int) – size of the moving window

class tensorpack.callbacks.PeriodicTrigger(triggerable, every_k_steps=None, every_k_epochs=None, before_train=False)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Trigger a callback every k global steps or every k epochs by its trigger() method.

Most existing callbacks which do something every epoch are implemented with trigger() method. By default the trigger() method will be called every epoch. This wrapper can make the callback run at a different frequency.

All other methods (before/after_run, trigger_step, etc) of the given callback are unaffected. They will still be called as-is.

__init__(triggerable, every_k_steps=None, every_k_epochs=None, before_train=False)[source]
  • triggerable (Callback) – a Callback instance with a trigger method to be called.

  • every_k_steps (int) – trigger when global_step % k == 0. Set to None to ignore.

  • every_k_epochs (int) – trigger when epoch_num % k == 0. Set to None to ignore.

  • before_train (bool) – trigger in the before_train() method.

every_k_steps and every_k_epochs can be both set, but cannot be both None unless before_train is True.

class tensorpack.callbacks.PeriodicCallback(callback, every_k_steps=None, every_k_epochs=None)[source]

Bases: tensorpack.callbacks.trigger.EnableCallbackIf

The {before,after}_epoch, {before,after}_run, trigger_{epoch,step} methods of the given callback will be enabled only when global_step % every_k_steps == 0` or ``epoch_num % every_k_epochs == 0. The other methods are unaffected.

Note that this can only makes a callback less frequent than itself. If you have a callback that by default runs every epoch by its trigger() method, use PeriodicTrigger to schedule it more frequent than itself.

__init__(callback, every_k_steps=None, every_k_epochs=None)[source]
  • callback (Callback) – a Callback instance.

  • every_k_steps (int) – enable the callback when global_step % k == 0. Set to None to ignore.

  • every_k_epochs (int) – enable the callback when epoch_num % k == 0. Also enable when the last step finishes (epoch_num == max_epoch and local_step == steps_per_epoch - 1). Set to None to ignore.

every_k_steps and every_k_epochs can be both set, but cannot be both None.

class tensorpack.callbacks.EnableCallbackIf(callback, pred)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Disable the {before,after}_epoch, {before,after}_run, trigger_{epoch,step} methods of a callback, unless some condition satisfies. The other methods are unaffected.

A more accurate name for this callback should be “DisableCallbackUnless”, but that’s too ugly.


If you use {before,after}_run, pred will be evaluated only in before_run.

__init__(callback, pred)[source]
  • callback (Callback) –

  • pred (self -> bool) – a callable predicate. Has to be a pure function. The callback is disabled unless this predicate returns True.