tensorpack.callbacks package

class tensorpack.callbacks.Callback[source]

Bases: object

Base class for all callbacks.

epoch_num

int – the number of the current epoch.

global_step

int – the number of global steps that have finished or is currently running.

local_step

int – the local steps within the current epoch.

trainer

Trainer – the trainer.

graph

tf.Graph – the graph.

Note

These attributes are available only after (and including) _setup_graph().

_setup_graph()[source]

Called before finalizing the graph. Override this method to setup the ops used in the callback. This is the same as tf.train.SessionRunHook.begin().

_before_train()[source]

Called right before the first iteration. The main difference to setup_graph is that at this point the graph is finalized and a default session is initialized. Override this method to, e.g. run some operations under the session.

This is similar to tf.train.SessionRunHook.after_create_session(), but different: it is called after the session is initialized by tfutils.SessionInit.

_after_train()[source]

Called after training.

_before_run(ctx)[source]

It is called before every hooked_sess.run() call, and it registers some extra op/tensors to run in the next call. This method is the same as tf.train.SessionRunHook.before_run. Refer to TensorFlow docs for more details.

_after_run(run_context, run_values)[source]

It is called after every hooked_sess.run() call, and it processes the values requested by the corresponding before_run(). It is equivalent to tf.train.SessionRunHook.after_run(), refer to TensorFlow docs for more details.

_before_epoch()[source]

Called right before each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately before each epoch.

_after_epoch()[source]

Called right after each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately after each epoch.

_trigger_step()[source]

Called after each Trainer.run_step() completes. Defaults to no-op.

You can override it to implement, e.g. a ProgressBar.

_trigger_epoch()[source]

Called after the completion of every epoch. Defaults to call self.trigger()

_trigger()[source]

Override this method to define a general trigger behavior, to be used with trigger schedulers. Note that the schedulers (e.g. PeriodicTrigger) might call this method both inside an epoch and after an epoch.

When used without the scheduler, this method by default will be called by trigger_epoch().

chief_only

Only run this callback on chief training process.

Returns: bool

class tensorpack.callbacks.ProxyCallback(cb)[source]

Bases: tensorpack.callbacks.base.Callback

A callback which proxy all methods to another callback. It’s useful as a base class of callbacks which decorate other callbacks.

__init__(cb)[source]
Parameters:cb (Callback) – the underlying callback
class tensorpack.callbacks.CallbackFactory(setup_graph=None, before_train=None, trigger=None, after_train=None, trigger_epoch=None)[source]

Bases: tensorpack.callbacks.base.Callback

Create a callback with some lambdas.

__init__(setup_graph=None, before_train=None, trigger=None, after_train=None, trigger_epoch=None)[source]

Each lambda takes self as the only argument.

trigger_epoch was deprecated.

tensorpack.callbacks.Triggerable

alias of Callback

class tensorpack.callbacks.StartProcOrThread(startable, stop_at_last=True)[source]

Bases: tensorpack.callbacks.base.Callback

Start some threads or processes before training.

__init__(startable, stop_at_last=True)[source]
Parameters:
  • startable (list) – list of processes or threads which have start() method. Can also be a single instance of process of thread.

  • stop_at_last (bool) – whether to stop the processes or threads after training. It will use Process.terminate() or StoppableThread.stop(), but will do nothing on normal threading.Thread or other startable objects.

class tensorpack.callbacks.RunOp(setup_func, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]

Bases: tensorpack.callbacks.base.Callback

Run an Op.

__init__(setup_func, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]
Parameters:
  • setup_func – a function that returns the Op in the graph

  • run_before (bool) – run the Op before training

  • run_as_trigger (bool) – run the Op on every trigger

  • run_step (bool) – run the Op every step (along with training)

  • verbose (bool) – pring logs when the op is run.

Examples

The DQN Example uses this callback to update target network.

class tensorpack.callbacks.RunUpdateOps(collection='update_ops')[source]

Bases: tensorpack.callbacks.graph.RunOp

Run ops from the collection UPDATE_OPS every step

class tensorpack.callbacks.Callbacks(cbs)[source]

Bases: tensorpack.callbacks.base.Callback

A container to hold all callbacks, and execute them in the right order (e.g. StatPrinter will be executed at last).

__init__(cbs)[source]
Parameters:cbs (list) – a list of Callback instances.
class tensorpack.callbacks.CallbackToHook(cb)[source]

Bases: tensorflow.python.training.session_run_hook.SessionRunHook

This is only for internal implementation of before_run/after_run callbacks. You shouldn’t need to use this.

class tensorpack.callbacks.HookToCallback(hook)[source]

Bases: tensorpack.callbacks.base.Callback

Make a tf.train.SessionRunHook into a callback. Note that the coord argument in after_create_session will be None.

__init__(hook)[source]
Parameters:hook (tf.train.SessionRunHook) –
class tensorpack.callbacks.ScalarStats(names, prefix='validation')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Statistics of some scalar tensor. The value will be averaged over all given datapoints.

__init__(names, prefix='validation')[source]
Parameters:
  • names (list or str) – list of names or just one name. The corresponding tensors have to be scalar.

  • prefix (str) – a prefix for logging

class tensorpack.callbacks.Inferencer[source]

Bases: object

Base class of Inferencer. To be used with InferenceRunner.

after_inference()[source]

Called after a round of inference ends. Returns a dict of scalar statistics which will be logged to monitors.

before_inference()[source]

Called before a new round of inference starts.

datapoint(output)[source]

Called after each new datapoint finished the forward inference.

Parameters:output (list) – list of output this inferencer needs. Has the same length as self.get_output_tensors().
get_output_tensors()[source]

Return a list of tensor names (guaranteed not op name) this inferencer needs.

class tensorpack.callbacks.ClassificationError(wrong_tensor_name='incorrect_vector', summary_name='val_error')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute classification error in batch mode, from a wrong tensor.

The wrong tensor is supposed to be an binary vector containing whether each sample in the batch is incorrectly classified. You can use tf.nn.in_top_k to produce this vector.

This Inferencer produces the “true” error, taking account of the fact that batches might not have the same size in testing (because the size of test set might not be a multiple of batch size). Therefore the result can be different from averaging the error rate of each batch.

__init__(wrong_tensor_name='incorrect_vector', summary_name='val_error')[source]
Parameters:
  • wrong_tensor_name (str) – name of the wrong tensor. The default is the same as the default output name of prediction_incorrect().

  • summary_name (str) – the name to log the error with.

class tensorpack.callbacks.BinaryClassificationStats(pred_tensor_name, label_tensor_name, prefix='val')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute precision / recall in binary classification, given the prediction vector and the label vector.

__init__(pred_tensor_name, label_tensor_name, prefix='val')[source]
Parameters:
  • pred_tensor_name (str) – name of the 0/1 prediction tensor.

  • label_tensor_name (str) – name of the 0/1 label tensor.

class tensorpack.callbacks.InferenceRunner(input, infs, tower_name='InferenceTower', extra_hooks=None)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

A callback that runs a list of Inferencer on some InputSource.

__init__(input, infs, tower_name='InferenceTower', extra_hooks=None)[source]
Parameters:
  • input (InputSource or DataFlow) – The InputSource to run inference on. If given a DataFlow, will use FeedInput.

  • infs (list) – a list of Inferencer instances.

class tensorpack.callbacks.DataParallelInferenceRunner(input, infs, gpus)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

Inference by feeding datapoints in a data-parallel way to multiple GPUs.

Doesn’t support remapped InputSource for now.

__init__(input, infs, gpus)[source]
Parameters:
  • input (DataParallelFeedInput or DataFlow) –

  • gpus (list[int]) – list of GPU id

class tensorpack.callbacks.TrainingMonitor[source]

Bases: tensorpack.callbacks.base.Callback

Monitor a training progress, by processing different types of summary/statistics from trainer.

_setup_graph()[source]

Override this method to setup the monitor.

put(name, val)[source]

Process a key-value pair.

put_event(evt)[source]
Parameters:evt (tf.Event) – the most basic format, could include Summary, RunMetadata, LogMessage, and more.
put_image(name, val)[source]
Parameters:val (np.ndarray) – 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.
put_summary(summary)[source]

Process a tf.Summary.

class tensorpack.callbacks.Monitors(monitors)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Merge monitors together for trainer to use.

get_history(name)[source]

Get a history of the scalar value of some data.

get_latest(name)[source]

Get latest scalar value of some data.

put_event(evt)[source]

Simply call put_event() on each monitor. step and wall_time fields of this proto will be filled automatically.

Parameters:evt (tf.Event) –
put_image(name, val)[source]
Parameters:
  • name (str) –

  • val (np.ndarray) – 2D, 3D (HWC) or 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.

class tensorpack.callbacks.TFEventWriter[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Write summaries to TensorFlow event file.

class tensorpack.callbacks.JSONWriter[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Write all scalar data to a json, grouped by their global step.

class tensorpack.callbacks.ScalarPrinter(enable_step=False, enable_epoch=True)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Print scalar data into terminal.

__init__(enable_step=False, enable_epoch=True)[source]
Parameters:enable_epoch (enable_step,) – whether to print the monitor data (if any) between steps or between epochs.
class tensorpack.callbacks.SendMonitorData(command, names)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Execute a command with some specific scalar monitor data. This is useful for, e.g. building a custom statistics monitor.

It will try to send once receiving all the stats

__init__(command, names)[source]
Parameters:
  • command (str) – a command to execute. Use format string with stat names as keys.

  • names (list or str) – data name(s) to use.

Example

Send the stats to your phone through pushbullet:

SendMonitorData('curl -u your_id: https://api.pushbullet.com/v2/pushes \
         -d type=note -d title="validation error" \
         -d body={validation_error} > /dev/null 2>&1',
         'validation_error')
class tensorpack.callbacks.HyperParam[source]

Bases: object

Base class for a hyperparam.

get_value()[source]

Get the value of the param.

readable_name

A name to display

set_value(v)[source]

Set the value of the param.

Parameters:v – the value to be set
setup_graph()[source]

setup the graph in setup_graph callback stage, if necessary

class tensorpack.callbacks.GraphVarParam(name, shape=[])[source]

Bases: tensorpack.callbacks.param.HyperParam

A variable in the graph (e.g. learning_rate) can be a hyperparam.

__init__(name, shape=[])[source]
Parameters:
  • name (str) – name of the variable.

  • shape (list) – shape of the variable.

get_value()[source]

Evaluate the variable.

set_value(v)[source]

Assign the variable a new value.

setup_graph()[source]

Will setup the assign operator for that variable.

class tensorpack.callbacks.ObjAttrParam(obj, attrname, readable_name=None)[source]

Bases: tensorpack.callbacks.param.HyperParam

An attribute of an object can be a hyperparam.

__init__(obj, attrname, readable_name=None)[source]
Parameters:
  • obj – the object

  • attrname (str) – the attribute

  • readable_name (str) – The name to display and set with. Defaults to be attrname.

class tensorpack.callbacks.HyperParamSetter(param)[source]

Bases: tensorpack.callbacks.base.Callback

An abstract base callback to set hyperparameters.

__init__(param)[source]
Parameters:param (HyperParam or str) – if is a str, it is assumed to be a GraphVarParam.
get_current_value()[source]
Returns:The current value of the param.
get_value_to_set()[source]
Returns:The value to assign to the variable.

Note

Subclasses will implement the abstract method _get_value_to_set(), which should return a new value to set, or return None to do nothing.

class tensorpack.callbacks.HumanHyperParamSetter(param, file_name='hyper.txt')[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameter by loading the value from a file each time it get called. This is useful for manually tuning some parameters (e.g. learning_rate) without interrupting the training.

__init__(param, file_name='hyper.txt')[source]
Parameters:
  • param – same as in HyperParamSetter.

  • file_name (str) – a file containing the new value of the parameter. Each line in the file is a k:v pair, for example, learning_rate:1e-4. If the pair is not found, the param will not be changed.

class tensorpack.callbacks.ScheduledHyperParamSetter(param, schedule, interp=None)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameters by a predefined epoch-based schedule.

__init__(param, schedule, interp=None)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • schedule (list) – with the format [(epoch1, val1), (epoch2, val2), (epoch3, val3)]. Each (ep, val) pair means to set the param to “val” __after__ the completion of epoch ep. If ep == 0, the value will be set before the first epoch (by default the first is epoch 1).

  • interp – None: no interpolation. ‘linear’: linear interpolation

Example

ScheduledHyperParamSetter('learning_rate',
                          [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]),
class tensorpack.callbacks.StatMonitorParamSetter(param, stat_name, value_func, threshold, last_k, reverse=False)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Change the param by monitoring the change of a statistic. Change when it wasn’t decreasing/increasing enough.

__init__(param, stat_name, value_func, threshold, last_k, reverse=False)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • stat_name (str) – name of the statistics.

  • value_func (float -> float) – a function which returns a new value taking the old value.

  • threshold (float) – change threshold.

  • last_k (int) – last k epochs.

  • reverse (bool) – monitor increasing instead of decreasing.

This callback will change param by new_value = value_func(old_value), when: min(stats) >= stats[0] - threshold, where stats = [stat_name in last k epochs]

Example

If validation error wasn’t decreasing for 5 epochs, anneal the learning rate:

StatMonitorParamSetter('learning_rate', 'val-error', lambda x: x * 0.2, 0, 5)
class tensorpack.callbacks.HyperParamSetterWithFunc(param, func)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set the parameter by a function of epoch num and old value.

__init__(param, func)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • funcparam will be set by new_value = func(epoch_num, old_value). epoch_num is the number of epochs that have finished.

Example

Decrease by a factor of 0.9 every two epochs:

HyperParamSetterWithFunc('learning_rate',
                         lambda e, x: x * 0.9 if e % 2 == 0 else x)
class tensorpack.callbacks.GPUUtilizationTracker(devices=None)[source]

Bases: tensorpack.callbacks.base.Callback

Summarize the average GPU utilization within an epoch

__init__(devices=None)[source]
Parameters:devices (list[int]) – physical GPU ids. If None, will use CUDA_VISIBLE_DEVICES
class tensorpack.callbacks.GraphProfiler(dump_metadata=False, dump_tracing=True, dump_event=False)[source]

Bases: tensorpack.callbacks.base.Callback

Enable profiling by installing session hooks, and write metadata or tracing files to logger.LOG_DIR.

The tracing files can be loaded from chrome://tracing. The metadata files can be processed by tfprof command line utils.

Note that the profiling is enabled for every step. You probably want to schedule it less frequently by PeriodicRunHooks.

__init__(dump_metadata=False, dump_tracing=True, dump_event=False)[source]
Parameters:
  • dump_metadata (bool) – Dump tf.RunMetadata to be used with tfprof.

  • dump_tracing (bool) – Dump chrome tracing files.

  • dump_event (bool) – Dump to an event processed by FileWriter.

class tensorpack.callbacks.ModelSaver(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections='variables', keep_recent=None, keep_freq=None)[source]

Bases: tensorpack.callbacks.base.Callback

Save the model every epoch.

__init__(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections='variables', keep_recent=None, keep_freq=None)[source]
Parameters:
  • keep_checkpoint_every_n_hours (max_to_keep,) – the same as in tf.train.Saver.

  • checkpoint_dir (str) – Defaults to logger.LOG_DIR.

  • var_collections (str or list of str) – collection of the variables (or list of collections) to save.

class tensorpack.callbacks.MinSaver(monitor_stat, reverse=False, filename=None)[source]

Bases: tensorpack.callbacks.base.Callback

Separately save the model with minimum value of some statistics.

__init__(monitor_stat, reverse=False, filename=None)[source]
Parameters:
  • monitor_stat (str) – the name of the statistics.

  • reverse (bool) – if True, will save the maximum.

  • filename (str) – the name for the saved model. Defaults to min-{monitor_stat}.tfmodel.

Example

Save the model with minimum validation error to “min-val-error.tfmodel”:

MinSaver('val-error')

Note

It assumes that ModelSaver is used with checkpoint_dir=logger.LOG_DIR (the default). And it will save the model to that directory as well.

class tensorpack.callbacks.MaxSaver(monitor_stat, filename=None)[source]

Bases: tensorpack.callbacks.saver.MinSaver

Separately save the model with maximum value of some statistics.

__init__(monitor_stat, filename=None)[source]
Parameters:
  • monitor_stat (str) – the name of the statistics.

  • filename (str) – the name for the saved model. Defaults to max-{monitor_stat}.tfmodel.

class tensorpack.callbacks.SendStat(command, names)[source]

Bases: tensorpack.callbacks.base.Callback

An equivalent of SendMonitorData, but as a normal callback.

class tensorpack.callbacks.DumpParamAsImage(tensor_name, prefix=None, map_func=None, scale=255)[source]

Bases: tensorpack.callbacks.base.Callback

Dump a tensor to image(s) to logger.LOG_DIR after every epoch.

Note that it requires the tensor is directly evaluable, i.e. either inputs are not its dependency (e.g. the weights of the model), or the inputs are feedfree (in which case this callback will take an extra datapoint from the input pipeline).

__init__(tensor_name, prefix=None, map_func=None, scale=255)[source]
Parameters:
  • tensor_name (str) – the name of the tensor.

  • prefix (str) – the filename prefix for saved images. Defaults to the Op name.

  • map_func – map the value of the tensor to an image or list of images of shape [h, w] or [h, w, c]. If None, will use identity.

  • scale (float) – a multiplier on pixel values, applied after map_func.

class tensorpack.callbacks.InjectShell(file='INJECT_SHELL.tmp', shell='ipython')[source]

Bases: tensorpack.callbacks.base.Callback

When triggered, opens an IPython/pdb shell if a file exists. Useful for interactive debug during training.

Using this callback requires ipython to be installed.

__init__(file='INJECT_SHELL.tmp', shell='ipython')[source]
Parameters:
  • file (str) – if this file exists, will open a shell.

  • shell (str) – one of ‘ipython’, ‘pdb’

class tensorpack.callbacks.StepTensorPrinter(names)[source]

Bases: tensorpack.callbacks.base.Callback

It prints the value of some tensors in each step. It’s just a demo of how trigger_step works but you should in general use symbolic_functions.print_stat() or tf.Print() instead.

__init__(names)[source]
Parameters:names (list) – list of string, the names of the tensors to print.
class tensorpack.callbacks.MaintainStepCounter[source]

Bases: tensorpack.callbacks.base.Callback

It maintains the global step in the graph, making sure it’s increased by one in every run_step call. This callback is always enabled by the trainer, and you wouldn’t need to use it.

class tensorpack.callbacks.ProgressBar(names=[])[source]

Bases: tensorpack.callbacks.base.Callback

A progress bar based on tqdm. Enabled by default.

__init__(names=[])[source]
Parameters:names (list) – list of string, the names of the tensors to monitor on the progress bar.
class tensorpack.callbacks.MovingAverageSummary(collection='MOVING_SUMMARY_OPS')[source]

Bases: tensorpack.callbacks.base.Callback

Maintain the moving average of the tensors in every step, and summarize them. Enabled by default.

__init__(collection='MOVING_SUMMARY_OPS')[source]
Parameters:collection (str) – the collection of EMA-maintaining ops. The default would work with add_moving_summary(), but you can use some others.
tensorpack.callbacks.MergeAllSummaries(period=0, run_alone=False, key='summaries')[source]

Evaluate all summaries by tf.summary.merge_all, and write to logs.

Parameters:
  • period (int) – by default the callback summarizes once every epoch. This option (if not set to 0) makes it additionally summarize every period steps.

  • run_alone (bool) – whether to evaluate the summaries alone. If True, summaries will be evaluated after each epoch alone. If False, summaries will be evaluated together with other sess.run calls, in the last step of each epoch. For SimpleTrainer, it needs to be False because summary may depend on inputs.

  • key (str) – the collection of summary tensors. Same as in tf.summary.merge_all.

Returns:

a Callback.

class tensorpack.callbacks.PeriodicTrigger(triggerable, every_k_steps=None, every_k_epochs=None)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Schedule to trigger a callback every k global steps or every k epochs by its trigger() method.

__init__(triggerable, every_k_steps=None, every_k_epochs=None)[source]
Parameters:
  • triggerable (Callback) – a Callback instance with a _trigger method to be called.

  • every_k_steps (int) – trigger when global_step % k == 0. Set to None to disable.

  • every_k_epochs (int) – trigger when epoch_num % k == 0. Set to None to disable.

every_k_steps and every_k_epochs can be both set, but cannot be both NOne.

class tensorpack.callbacks.PeriodicRunHooks(callback, every_k_steps)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Schedule the {before,after}_run methods of a callback every k global steps. All other methods are untouched.

__init__(callback, every_k_steps)[source]
Parameters: