tensorpack.dataflow.dataset package¶
-
class
tensorpack.dataflow.dataset.
BSDS500
(name, data_dir=None, shuffle=True)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
Berkeley Segmentation Data Set and Benchmarks 500 dataset.
Produce
(image, label)
pair, whereimage
has shape (321, 481, 3(BGR)) and ranges in [0,255].Label
is a floating point image of shape (321, 481) in range [0, 1]. The value of each pixel isnumber of times it is annotated as edge / total number of annotators for this image
.
-
class
tensorpack.dataflow.dataset.
Caltech101Silhouettes
(name, shuffle=True, dir=None)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
Produces [image, label] in Caltech101 Silhouettes dataset, image is 28x28 in the range [0,1], label is an int in the range [0,100].
-
class
tensorpack.dataflow.dataset.
CifarBase
(train_or_test, shuffle=None, dir=None, cifar_classnum=10)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
Produces [image, label] in Cifar10/100 dataset, image is 32x32x3 in the range [0,255]. label is an int.
-
class
tensorpack.dataflow.dataset.
Cifar10
(train_or_test, shuffle=None, dir=None)[source]¶ Bases:
tensorpack.dataflow.dataset.cifar.CifarBase
Produces [image, label] in Cifar10 dataset, image is 32x32x3 in the range [0,255]. label is an int.
-
class
tensorpack.dataflow.dataset.
Cifar100
(train_or_test, shuffle=None, dir=None)[source]¶ Bases:
tensorpack.dataflow.dataset.cifar.CifarBase
Similar to Cifar10
-
class
tensorpack.dataflow.dataset.
ILSVRCMeta
(dir=None)[source]¶ Bases:
object
Provide methods to access metadata for
ILSVRC12
dataset.-
get_image_list
(name, dir_structure='original')[source]¶ - Parameters
name (str) – ‘train’ or ‘val’ or ‘test’
dir_structure (str) – same as in
ILSVRC12.__init__()
.
- Returns
list – list of (image filename, label)
-
-
class
tensorpack.dataflow.dataset.
ILSVRC12
(dir, name, meta_dir=None, shuffle=None, dir_structure=None)[source]¶ Bases:
tensorpack.dataflow.dataset.ilsvrc.ILSVRC12Files
The ILSVRC12 classification dataset, aka the commonly used 1000 classes ImageNet subset. This dataflow produces uint8 images of shape [h, w, 3(BGR)], and a label between [0, 999]. The label map follows the synsets.txt file in http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz, which can also be queried using
ILSVRCMeta
.-
__init__
(dir, name, meta_dir=None, shuffle=None, dir_structure=None)[source]¶ - Parameters
dir (str) – A directory containing a subdir named
name
, containing the images in a structure described below.name (str) – One of ‘train’ or ‘val’ or ‘test’.
shuffle (bool) – shuffle the dataset. Defaults to True if name==’train’.
dir_structure (str) – One of ‘original’ or ‘train’. The directory structure for the ‘val’ directory. ‘original’ means the original decompressed directory, which only has list of image files (as below). If set to ‘train’, it expects the same two-level directory structure similar to ‘dir/train/’. By default, it tries to automatically detect the structure. You probably do not need to care about this option because ‘original’ is what people usually have.
Example:
When dir_structure==’original’, dir should have the following structure:
dir/ train/ n02134418/ n02134418_198.JPEG ... ... val/ ILSVRC2012_val_00000001.JPEG ... test/ ILSVRC2012_test_00000001.JPEG ...
With the downloaded ILSVRC12_img_*.tar, you can use the following command to build the above structure:
mkdir val && tar xvf ILSVRC12_img_val.tar -C val mkdir test && tar xvf ILSVRC12_img_test.tar -C test mkdir train && tar xvf ILSVRC12_img_train.tar -C train && cd train find -type f -name '*.tar' | parallel -P 10 'echo {} && mkdir -p {/.} && tar xf {} -C {/.}'
When dir_structure==’train’, dir should have the following structure:
dir/ train/ n02134418/ n02134418_198.JPEG ... ... val/ n01440764/ ILSVRC2012_val_00000293.JPEG ... ... test/ ILSVRC2012_test_00000001.JPEG ...
-
-
class
tensorpack.dataflow.dataset.
ILSVRC12Files
(dir, name, meta_dir=None, shuffle=None, dir_structure=None)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
Same as
ILSVRC12
, but produces filenames of the images instead of nparrays. This could be useful whencv2.imread
is a bottleneck and you want to decode it in smarter ways (e.g. in parallel).
-
class
tensorpack.dataflow.dataset.
TinyImageNet
(dir, name, shuffle=None)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
The TinyImageNet classification dataset, with 200 classes and 500 images per class. See https://tiny-imagenet.herokuapp.com/.
It produces [image, label] where image is a 64x64x3(BGR) image, label is an integer in [0, 200).
-
class
tensorpack.dataflow.dataset.
Mnist
(train_or_test, shuffle=True, dir=None)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
Produces [image, label] in MNIST dataset, image is 28x28 in the range [0,1], label is an int.
-
class
tensorpack.dataflow.dataset.
FashionMnist
(train_or_test, shuffle=True, dir=None)[source]¶ Bases:
tensorpack.dataflow.dataset.mnist.Mnist
Same API as
Mnist
, but more fashion.
-
class
tensorpack.dataflow.dataset.
Places365Standard
(dir, name, shuffle=None)[source]¶ Bases:
tensorpack.dataflow.base.RNGDataFlow
The Places365-Standard Dataset, in low resolution format only. Produces BGR images of shape (256, 256, 3) in range [0, 255].
-
__init__
(dir, name, shuffle=None)[source]¶ - Parameters
dir – path to the Places365-Standard dataset in its “easy directory structure”. See http://places2.csail.mit.edu/download.html
name – one of “train” or “val”
shuffle (bool) – shuffle the dataset. Defaults to True if name==’train’.
-
-
class
tensorpack.dataflow.dataset.
RNGDataFlow
[source]¶ Bases:
tensorpack.dataflow.base.DataFlow
A DataFlow with RNG
-
rng
= None¶ self.rng
is anp.random.RandomState
instance that is initialized correctly (with different seeds in each process) inRNGDataFlow.reset_state()
.
-