tensorpack.models package¶

Relevant tutorials: Symbolic Layers.

tensorpack.models.BatchNorm(variable_scope_name, inputs, axis=None, *, training=None, momentum=0.9, epsilon=1e-05, center=True, scale=True, beta_initializer=<tf.python.ops.init_ops.Zeros object>, gamma_initializer=<tf.python.ops.init_ops.Ones object>, virtual_batch_size=None, data_format='channels_last', ema_update='default', sync_statistics=None)[source]¶

A more powerful version of tf.layers.batch_normalization. It differs from the offical one in the following aspects:

Accepts an alternative data_format option when axis is None. For 2D input, this argument will be ignored.
Default value for momentum and epsilon is different.
Default value for training is automatically obtained from tensorpack’s TowerContext. User-provided value can overwrite this behavior.
Support the ema_update option, which covers broader use cases than the standard EMA update.
Support the sync_statistics option, which implements “SyncBN” and is very useful in small-batch models.
Better support of the virtual_batch_size option that does not have the bugs in tf.layers.

Parameters

training (bool) – if True, use per-batch statistics to normalize. Otherwise, use stored EMA to normalize. By default, it is equal to get_current_tower_context().is_training. This is not a good argument name, but it is what the Tensorflow layer uses.
virtual_batch_size (int) –
implement “Ghost BatchNorm” that normalizes the data with a smaller batch size than the input. Only effective when training is True. The value has to be a divisor of the actual batch size.

It does not use the buggy TensorFlow implementation which has the problems of (1) wrong behavior at inference; (2) create variables with unnecessary size=1 dimensions. Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/23050
ema_update (str) –
Only effective when training=True. It has the following options:
- ”default”: same as “collection”. Because this is the default behavior in TensorFlow.
- ”skip”: do not update EMA. This can be useful when you reuse a batch norm layer in several places but do not want them to all update your EMA.
- ”collection”: Add EMA update ops to collection tf.GraphKeys.UPDATE_OPS in the first training tower. The ops in the collection will be run automatically by the callback RunUpdateOps, along with your training iterations. This can waste compute if your training iterations do not always depend on the BatchNorm layer.
- ”internal”: EMA is updated in the first training tower inside this layer itself by control dependencies. In standard scenarios, it has similar speed to “collection”. But it supports more scenarios:
  1. BatchNorm is used inside dynamic control flow. The collection-based update does not support dynamic control flows.
  2. BatchNorm layer is sometimes unused (e.g., in GANs you have two networks to train alternatively). Putting all update ops into a single collection will waste a lot of compute.
  3. Other part of the model relies on the “updated” EMA. The collection-based method does not update EMA immediately.
  4. It has less chance to cause TensorFlow bugs in a graph with complicated control flow.
  Therefore this option is preferred over TensorFlow default. Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
sync_statistics (str or None) –
one of None, “nccl”, or “horovod”. It determines how to compute the “per-batch statistics” when training==True.
- None: it uses statistics of the input tensor to normalize during training. This is the standard way BatchNorm was implemented in most frameworks.
- ”nccl”: this layer must be used under tensorpack’s multi-GPU trainers. It uses the aggregated statistics of the whole batch (across all GPUs) to normalize.
- ”horovod”: this layer must be used under tensorpack’s HorovodTrainer. It uses the aggregated statistics of the whole batch (across all MPI ranks) to normalize. Note that on a single machine this is found to be slower than the “nccl” implementation.
When not None, each GPU computes its own E[x] and E[x^2], which are then averaged among all GPUs to compute global mean & variance. Therefore each GPU needs to have the same batch size.

The synchronization is based on the current variable scope + the name of the layer (BatchNorm(‘name’, input)). Therefore, you need to make sure that:
1. The BatchNorm layer on different GPUs needs to have the same name, so that statistics can be synchronized. If names do not match, this layer will hang.
2. A BatchNorm layer cannot be reused within one tower.
3. A BatchNorm layer needs to be executed for the same number of times by all GPUs. If different GPUs execute one BatchNorm layer for different number of times (e.g., if some GPUs do not execute it), this layer may hang.
This option is also known as “SyncBN” or “Cross-GPU BatchNorm” as mentioned in: MegDet: A Large Mini-Batch Object Detector. Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222.

When sync_statistics is enabled, ema_update is set to “internal” automatically. This is to avoid running UPDATE_OPS, which requires synchronization.

Variable Names:

beta: the bias term. Will be zero-inited by default.
gamma: the scale term. Will be one-inited by default.
mean/EMA: the moving average of mean.
variance/EMA: the moving average of variance.

Note

This layer is more flexible than the standard “BatchNorm” layer and provides more features:

No matter whether you’re doing training or not, you can set the training argument to use batch statistics or EMA statistics. i.e., you can use batch statistics during inference, or use EMA statistics during training. Using EMA statistics in training is useful when you load a pre-trained BN and don’t want to update it.
As long as training=True, sync_statistics and ema_update option will take effect.

tensorpack.models.BatchRenorm(variable_scope_name, x, rmax, dmax, *, momentum=0.9, epsilon=1e-05, center=True, scale=True, gamma_initializer=None, data_format='channels_last')[source]¶

Batch Renormalization layer, as described in the paper: Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. This implementation is a wrapper around tf.layers.batch_normalization.

Parameters

x (tf.Tensor) – a NHWC or NC tensor.
dmax (rmax,) – a scalar tensor, the maximum allowed corrections.
decay (float) – decay rate of moving average.
epsilon (float) – epsilon to avoid divide-by-zero.
use_bias (use_scale,) – whether to use the extra affine transformation or not.

Returns

tf.Tensor – a tensor named output with the same shape of x.

Variable Names:

beta: the bias term.
gamma: the scale term. Input will be transformed by x * gamma + beta.
moving_mean, renorm_mean, renorm_mean_weight: See TF documentation.
moving_variance, renorm_stddev, renorm_stddev_weight: See TF documentation.

tensorpack.models.layer_register(log_shape=False, use_scope=True)[source]¶

Parameters

log_shape (bool) – log input/output shape of this layer
use_scope (bool or None) – Whether to call this layer with an extra first argument as variable scope. When set to None, it can be called either with or without the scope name argument, depend on whether the first argument is string or not.

Returns

A decorator used to register a layer.

Example:

@layer_register(use_scope=True)
def add10(x):
    return x + tf.get_variable('W', shape=[10])

# use it:
output = add10('layer_name', input)  # the function will be called under variable scope "layer_name".

class tensorpack.models.VariableHolder(**kwargs)[source]¶

Bases: object

A proxy to access variables defined in a layer.

__init__(**kwargs)[source]¶

Parameters: kwargs – {name:variable}

all()[source]¶

Returns: list of all variables

tensorpack.models.rename_tflayer_get_variable()[source]¶

Rename all tf.get_variable() with rules that transforms tflayer style to tensorpack style.

Returns: A context where the variables are renamed.

Example:

with rename_tflayer_get_variable():
    x = tf.layer.conv2d(input, 3, 3, name='conv0')
    # variables will be named 'conv0/W', 'conv0/b'

tensorpack.models.disable_layer_logging()[source]¶: Disable the shape logging for all layers from this moment on. Can be useful when creating multiple towers.

tensorpack.models.Conv2D(variable_scope_name, inputs, filters, kernel_size, strides=(1, 1), padding='same', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, split=1)[source]¶

Similar to tf.layers.Conv2D, but with some differences:

Default kernel initializer is variance_scaling_initializer(2.0).
Default padding is ‘same’.
Support ‘split’ argument to do group convolution.

Variable Names:

W: weights
b: bias

tensorpack.models.Conv2DTranspose(variable_scope_name, inputs, filters, kernel_size, strides=(1, 1), padding='same', data_format='channels_last', activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None)[source]¶

A wrapper around tf.layers.Conv2DTranspose. Some differences to maintain backward-compatibility:

Default kernel initializer is variance_scaling_initializer(2.0).
Default padding is ‘same’

Variable Names:

W: weights
b: bias

tensorpack.models.FullyConnected(variable_scope_name, inputs, units, activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None)[source]¶

A wrapper around tf.layers.Dense. One difference to maintain backward-compatibility: Default weight initializer is variance_scaling_initializer(2.0).

Variable Names:

W: weights of shape [in_dim, out_dim]
b: bias

tensorpack.models.LayerNorm(variable_scope_name, x, epsilon=1e-05, *, center=True, scale=True, gamma_initializer=<tf.python.ops.init_ops.Ones object>, data_format='channels_last')[source]¶

Layer Normalization layer, as described in the paper: Layer Normalization.

Parameters

x (tf.Tensor) – a 4D or 2D tensor. When 4D, the layout should match data_format.
epsilon (float) – epsilon to avoid divide-by-zero.
scale (center,) – whether to use the extra affine transformation or not.

tensorpack.models.InstanceNorm(variable_scope_name, x, epsilon=1e-05, *, center=True, scale=True, gamma_initializer=<tf.python.ops.init_ops.Ones object>, data_format='channels_last', use_affine=None)[source]¶

Instance Normalization, as in the paper: Instance Normalization: The Missing Ingredient for Fast Stylization.

Parameters

x (tf.Tensor) – a 4D tensor.
epsilon (float) – avoid divide-by-zero
scale (center,) – whether to use the extra affine transformation or not.
use_affine – deprecated. Don’t use.

class tensorpack.models.LinearWrap(tensor)[source]¶

Bases: object

A simple wrapper to easily create “linear” graph, consisting of layers / symbolic functions with only one input & output.

__call__()[source]¶

Returns: tf.Tensor – the underlying wrapped tensor.

__init__(tensor)[source]¶

Parameters: tensor (tf.Tensor) – the tensor to wrap

apply(func, *args, **kwargs)[source]¶

Apply a function on the wrapped tensor.

Returns: LinearWrap – LinearWrap(func(self.tensor(), *args, **kwargs)).

apply2(func, *args, **kwargs)[source]¶

Apply a function on the wrapped tensor. The tensor will be the second argument of func.

This is because many symbolic functions (such as tensorpack’s layers) takes ‘scope’ as the first argument.

Returns: LinearWrap – LinearWrap(func(args[0], self.tensor(), *args[1:], **kwargs)).

print_tensor()[source]¶

Print the underlying tensor and return self. Can be useful to get the name of tensors inside LinearWrap.

Returns: self

tensor()[source]¶

Equivalent to self.__call__().

Returns: tf.Tensor – the underlying wrapped tensor.

tensorpack.models.Maxout([variable_scope_name, ]x, num_unit)[source]¶

Maxout as in the paper Maxout Networks.

Parameters

x (tf.Tensor) – a NHWC or NC tensor. Channel has to be known.
num_unit (int) – a int. Must be divisible by C.

Returns

tf.Tensor – of shape NHW(C/num_unit) named output.

tensorpack.models.PReLU(variable_scope_name, x, init=0.001, name=None)[source]¶

Parameterized ReLU as in the paper Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Parameters

x (tf.Tensor) – input
init (float) – initial value for the learnable slope.
name (str) – deprecated argument. Don’t use

Variable Names:

alpha: learnable slope.

tensorpack.models.BNReLU([variable_scope_name, ]x, name=None)[source]¶

A shorthand of BatchNormalization + ReLU.

Parameters

x (tf.Tensor) – the input
name – deprecated, don’t use.

tensorpack.models.MaxPooling(variable_scope_name, inputs, pool_size, strides=None, padding='valid', data_format='channels_last')[source]¶: Same as tf.layers.MaxPooling2D. Default strides is equal to pool_size.

tensorpack.models.FixedUnPooling(variable_scope_name, x, shape, unpool_mat=None, data_format='channels_last')[source]¶

Unpool the input with a fixed matrix to perform kronecker product with.

Parameters

x (tf.Tensor) – a 4D image tensor
shape – int or (h, w) tuple
unpool_mat – a tf.Tensor or np.ndarray 2D matrix with size=shape. If is None, will use a matrix with 1 at top-left corner.

Returns

tf.Tensor – a 4D image tensor.

tensorpack.models.AvgPooling(variable_scope_name, inputs, pool_size, strides=None, padding='valid', data_format='channels_last')[source]¶: Same as tf.layers.AveragePooling2D. Default strides is equal to pool_size.

tensorpack.models.GlobalAvgPooling(variable_scope_name, x, data_format='channels_last')[source]¶

Global average pooling as in the paper Network In Network.

Parameters: x (tf.Tensor) – a 4D tensor.
Returns: tf.Tensor – a NC tensor named output.

tensorpack.models.regularize_cost(regex, func, name='regularize_cost')[source]¶

Apply a regularizer on trainable variables matching the regex, and print the matched variables (only print once in multi-tower training). In replicated mode, it will only regularize variables within the current tower.

If called under a TowerContext with is_training==False, this function returns a zero constant tensor.

Parameters

regex (str) – a regex to match variable names, e.g. “conv.*/W”
func – the regularization function, which takes a tensor and returns a scalar tensor. E.g., tf.nn.l2_loss, tf.contrib.layers.l1_regularizer(0.001).

Returns

tf.Tensor – a scalar, the total regularization cost.

Example

cost = cost + regularize_cost("fc.*/W", l2_regularizer(1e-5))

tensorpack.models.regularize_cost_from_collection(name='regularize_cost')[source]¶

Get the cost from the regularizers in tf.GraphKeys.REGULARIZATION_LOSSES. If in replicated mode, will only regularize variables created within the current tower.

Parameters: name (str) – the name of the returned tensor
Returns: tf.Tensor – a scalar, the total regularization cost.

tensorpack.models.Dropout([variable_scope_name, ]x, *args, **kwargs)[source]¶: Same as tf.layers.dropout. However, for historical reasons, the first positional argument is interpreted as keep_prob rather than drop_prob. Explicitly use rate= keyword arguments to ensure things are consistent.

tensorpack.models.ConcatWith([variable_scope_name, ]x, tensor, dim)[source]¶

A wrapper around tf.concat to cooperate with LinearWrap.

Parameters

x (tf.Tensor) – input
tensor (list[tf.Tensor]) – a tensor or list of tensors to concatenate with x. x will be at the beginning
dim (int) – the dimension along which to concatenate

Returns

tf.Tensor – tf.concat([x] + tensor, dim)