The `tf.data`

API of Tensorflow is a great way to build a pipeline for sending data to the GPU. Setting up data augmentation can be a bit tricky though. In this tutorial I will go through the steps of setting up a data augmentation pipeline. The table of contents for this post:

*Note:* I have used Tensorflow eager in this post, but the same approach can also be used for the graph mode of Tensorflow.

## Some sample data

To illustrate the different augmentation techniques we need some demo data. CIFAR10 is available through Tensorflow so that is an easy start. For illustration purposes I take the first 8 examples of CIFAR10 and build a dataset with this. In real world scenarios this would be replaced by your own data loading logic.

```
import tensorflow as tf
tf.enable_eager_execution()
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
dataset = tf.data.Dataset.from_tensor_slices((x_train[0:8] / 255).astype(np.float32))
```

With a simple function we can plot images from this dataset:

```
import numpy as np
import matplotlib.pyplot as plt
def plot_images(dataset, n_images, samples_per_image):
output = np.zeros((32 * n_images, 32 * samples_per_image, 3))
row = 0
for images in dataset.repeat(samples_per_image).batch(n_images):
output[:, row*32:(row+1)*32] = np.vstack(images.numpy())
row += 1
plt.figure()
plt.imshow(output)
plt.show()
```

## Implementing augmentations

To augment the dataset it can beneficial to make augmenter functions: a function that receives an image (a `tf.Tensor`

) and returns a new augmented image. By defining functions for each augmentation operation we can easily attach them to datasets and control when they are evaluated. The augmenter functions I use are based on the following signature:

```
def augment(x: tf.Tensor) -> tf.Tensor:
"""Some augmentation
Args:
x: Image
Returns:
Augmented image
"""
x = .... # augmentation here
return x
```

With this basic recipe for an augmenter function we can implement the augmenters itself. Here I will show examples of:

- Orientation (flipping and rotation)
- Color augmentations (hue, saturation, brightness, contrast)
- Zooming

Not all of these augmentations are necessarily applicable to CIFAR10; e.g. learning to detect flipped trucks is maybe not that beneficial for the task at hand. Nevertheless, I show them here as an example as they can be useful for tasks that are more orientation invariant. Of course, there are many more augmentations that could be useful, but most of them follow the same approach.

### Rotation and flipping

One of the most simplest augmentations is rotating the image 90 degrees. For this we can use the `rot90`

function of Tensorflow. To get a new random rotation for each image we need to use a random function from Tensorflow itself. Random functions from Tensorflow are evaluated for every input, functions from numpy or basic python only once which would result in a static augmentation.

```
def rotate(x: tf.Tensor) -> tf.Tensor:
"""Rotation augmentation
Args:
x: Image
Returns:
Augmented image
"""
# Rotate 0, 90, 180, 270 degrees
return tf.image.rot90(x, tf.random_uniform(shape=[], minval=0, maxval=4, dtype=tf.int32))
```

Flipping is another easy-to-implement augmentation. For these augmentations we do not have to use a random number generator as Tensorflow has a built-in function that does this for us: random_flip_left_right and random_flip_up_down.

```
def flip(x: tf.Tensor) -> tf.Tensor:
"""Flip augmentation
Args:
x: Image to flip
Returns:
Augmented image
"""
x = tf.image.random_flip_left_right(x)
x = tf.image.random_flip_up_down(x)
return x
```

### Color augmentations

Color augmentations are applicable to almost every image learning task. In Tensorflow there are four color augmentations readily available: hue, saturation, brightness and contrast. These functions only require a range and will result in an unique augmentation for each image.

```
def color(x: tf.Tensor) -> tf.Tensor:
"""Color augmentation
Args:
x: Image
Returns:
Augmented image
"""
x = tf.image.random_hue(x, 0.08)
x = tf.image.random_saturation(x, 0.6, 1.6)
x = tf.image.random_brightness(x, 0.05)
x = tf.image.random_contrast(x, 0.7, 1.3)
return x
```

### Zooming

Zooming is a powerful augmentation that can make a network robust to (small) changes in object size. This augmentation is a bit harder to implement as there is no single function that performs this operation completely. The Tensorflow function crop_and_resize function comes close as it can crop an image and then resize it to an arbitrary size. The function requires a list of ‘crop boxes’ that contain normalized coordinates (between 0 and 1) for cropping.

In the augmentation function below we first create 20 crop boxes using numpy. These boxes are created once and then passed on to the `crop_and_resize`

function. This function returns a new image for each crop box, resulting in 20 potential cropped images for each input image. By using `tf.random_uniform`

we can randomly select one of these crops. `tf.random_uniform`

will give new random numbers during training so is safe to use here.

To make sure that some part of our data retains it original dimension, a `tf.cond`

call can be used. `tf.cond`

expects three parameters: a predicate (or condition), a true function `true_fn`

and a false function `false_fn`

. The predicate should be an operation that evaluates to true or false, after which `true_fn`

or `false_fn`

is called respectively. In our case we use a random number generator to return true in 50% of the calls. `true_fn`

is set to the cropping function and `false_fn`

to a identity function returning the original image.

**Note:** Do not use `np.random`

functions for generating random numbers in these augmenter functions. These are only evaluated once in the TF data pipeline and will result in the same augmentation applied to all images.

```
def zoom(x: tf.Tensor) -> tf.Tensor:
"""Zoom augmentation
Args:
x: Image
Returns:
Augmented image
"""
# Generate 20 crop settings, ranging from a 1% to 20% crop.
scales = list(np.arange(0.8, 1.0, 0.01))
boxes = np.zeros((len(scales), 4))
for i, scale in enumerate(scales):
x1 = y1 = 0.5 - (0.5 * scale)
x2 = y2 = 0.5 + (0.5 * scale)
boxes[i] = [x1, y1, x2, y2]
def random_crop(img):
# Create different crops for an image
crops = tf.image.crop_and_resize([img], boxes=boxes, box_ind=np.zeros(len(scales)), crop_size=(32, 32))
# Return a random crop
return crops[tf.random_uniform(shape=[], minval=0, maxval=len(scales), dtype=tf.int32)]
choice = tf.random_uniform(shape=[], minval=0., maxval=1., dtype=tf.float32)
# Only apply cropping 50% of the time
return tf.cond(choice < 0.5, lambda: x, lambda: random_crop(x))
```

## Augmenting the Dataset

With all functions defined we can combine them in to a single pipeline. Applying these functions to a Tensorflow Dataset is very easy using the `map`

function. The `map`

function takes a function and returns a new and augmented dataset. When this new dataset is evaluated, the data operations defined in the function will be applied to all elements in the set. Chaining map functions makes it very easy to iteratively add new data mapping operations, like augmentations.

To drastically increase the speed of these operations we can execute them in parallel, practically all Tensorflow operations support this. With the tf.Data API this is done using the `num_parallel_calls`

parameter of the `map`

function. When this parameter is higher than one functions will be executed in parallel. It is advised to set this parameter to the number of CPUs available.

*Note:* Some of these operations can result in images that have values outside the normal range of [0, 1]. To make sure that these ranges are not exceeded a clipping function such as `tf.clip_by_value`

is recommended.

```
# Add augmentations
augmentations = [flip, color, zoom, rotate]
# Add the augmentations to the dataset
for f in augmentations:
# Apply the augmentation, run 4 jobs in parallel.
dataset = dataset.map(f, num_parallel_calls=4)
# Make sure that the values are still in [0, 1]
dataset = dataset.map(lambda x: tf.clip_by_value(x, 0, 1), num_parallel_calls=4)
plot_images(dataset, n_images=10, samples_per_image=15)
```

If applying all augmentations is a bit to much – which it is in the example above – it is also possible to only apply them to a certain percentage of the data. For this we can use the same approach as for the zooming augmentation: a combination of a `tf.cond`

and `tf.random_uniform`

call.

```
for f in augmentations:
# Apply an augmentation only in 25% of the cases.
dataset = dataset.map(lambda x: tf.cond(tf.random_uniform([], 0, 1) > 0.75, lambda: f(x), lambda: x), num_parallel_calls=4)
```

That’s it! Adding more augmentations is as simple as writing a new function and adding them to the list of augmenters. Any other tips for data augmentation using the tf.Data pipeline? Let me know!

## Full code example used in this post

For convenience, all code in this post is repeated below, combined as a single script:

```
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
def plot_images(dataset, n_images, samples_per_image):
output = np.zeros((32 * n_images, 32 * samples_per_image, 3))
row = 0
for images in dataset.repeat(samples_per_image).batch(n_images):
output[:, row*32:(row+1)*32] = np.vstack(images.numpy())
row += 1
plt.figure()
plt.imshow(output)
plt.show()
def flip(x: tf.Tensor) -> tf.Tensor:
"""Flip augmentation
Args:
x: Image to flip
Returns:
Augmented image
"""
x = tf.image.random_flip_left_right(x)
x = tf.image.random_flip_up_down(x)
return x
def color(x: tf.Tensor) -> tf.Tensor:
"""Color augmentation
Args:
x: Image
Returns:
Augmented image
"""
x = tf.image.random_hue(x, 0.08)
x = tf.image.random_saturation(x, 0.6, 1.6)
x = tf.image.random_brightness(x, 0.05)
x = tf.image.random_contrast(x, 0.7, 1.3)
return x
def rotate(x: tf.Tensor) -> tf.Tensor:
"""Rotation augmentation
Args:
x: Image
Returns:
Augmented image
"""
return tf.image.rot90(x, tf.random_uniform(shape=[], minval=0, maxval=4, dtype=tf.int32))
def zoom(x: tf.Tensor) -> tf.Tensor:
"""Zoom augmentation
Args:
x: Image
Returns:
Augmented image
"""
# Generate 20 crop settings, ranging from a 1% to 20% crop.
scales = list(np.arange(0.8, 1.0, 0.01))
boxes = np.zeros((len(scales), 4))
for i, scale in enumerate(scales):
x1 = y1 = 0.5 - (0.5 * scale)
x2 = y2 = 0.5 + (0.5 * scale)
boxes[i] = [x1, y1, x2, y2]
def random_crop(img):
# Create different crops for an image
crops = tf.image.crop_and_resize([img], boxes=boxes, box_ind=np.zeros(len(scales)), crop_size=(32, 32))
# Return a random crop
return crops[tf.random_uniform(shape=[], minval=0, maxval=len(scales), dtype=tf.int32)]
choice = tf.random_uniform(shape=[], minval=0., maxval=1., dtype=tf.float32)
# Only apply cropping 50% of the time
return tf.cond(choice < 0.5, lambda: x, lambda: random_crop(x))
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
data = (x_train[0:8] / 255).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices(data)
# Add augmentations
augmentations = [flip, color, zoom, rotate]
for f in augmentations:
dataset = dataset.map(lambda x: tf.cond(tf.random_uniform([], 0, 1) > 0.75, lambda: f(x), lambda: x), num_parallel_calls=4)
dataset = dataset.map(lambda x: tf.clip_by_value(x, 0, 1))
plot_images(dataset, n_images=8, samples_per_image=10)
```

## 33 comments

## Kaushik Acharya on

Typo: dataset.map(lambda x: tf.cond(tf.random_uniform([], 0, 1) > 0.75, lambda: f(x), lambda: x), num_parallel_calls=4))

There seems to be an extra end parentheses in the end.

## Wouter Bulten on

Good find, thanks! I've updated the article.

## Germinate on

Could someone explain what this function does?

I get the error of TypeError: <lambda>() takes 1 positional argument but 2 were given. I am not sure which lambda is throwing this error.

## Wouter Bulten on

Hi! Do u you have a bit more information on that error? Maybe stacktrace?

## _faizan_ on

How do I use this augment strategy with tf.keras ? tf.keras allows to use datasets as given here: https://www.tensorflow.org/...

But the issue is that once you augment you get one new dataset. So I need to train loop myself and train. Instead I am asking how to do something like Keras ImageDataGenerator thing.

## Wouter Bulten on

As shown in the example on the TF website, you can inject the dataset. Something like:

dataset = (code from above)

dataset = dataset.batch(28)

model.fit(train_dataset, epochs=3)

## Sam on

can I use the augmented “dataset” in a multi output model such as: model.fit(dataset, y[label_a, label_b, label_c], epochs=10)

where labels a,b,c represent multiple label output.

## Wouter Bulten on

Hi Sam, yes you should be able to use it for multi-output models. If I’m correct, for tf.keras you should pass the dataset completely. See also: Training & evaluation from tf.data Datasets

## samra irshad on

Hi Wouter, thanks for writing this article. I have been using tensorflow data generators and I have switched to dataset APIs due to their fast processing. However, I noticed a difference in performance between the models generated using generators and dataset API. I am trying to ensure the data augmentation I am applying in both the generators and dataset APIs is similar and I have a question relate to that: In tf.keras.preprocessing.image.ImageDataGenerator, the augmentations are applied randomly, like sometimes they are applied on images and sometimes they are not. I am wondering if the operations like tf.image.random_flip_left_right and tf.random_rotation offer the same behavior (augment the image with some probability) or do they always augment all the images?

## Wouter Bulten on

Hi Samra, yes they offer the same behaviour. For example, with the flip function there is a 50% chance that the image is passed on as-is without flipping.

## Bruno Jerkovic on

Hi Wouter, is there a way to change that probability. I don’t see it in the official documentation.

## Wouter Bulten on

Hi Bruno, You mean for

`random_flip_left_right`

? This is fixed to a 50% chance according to the documentation. If you want to change the probability, you have to implement that yourself. Hope that helps!## Olga F on

Hi Wouter, thanks for the article. As far as I understand, data augmentation does not change the number of samples in the dataset. The image transformations are applied to the original image, which is substituted by the augmented versions. But, in some cases where we are encountering class imbalance, our goal is to extend the existent dataset with new samples.

## Wouter Bulten on

Hi Olaf, yes that is true. Though, you can sample as often from the dataset as you want. Every time you sample will generate new images.

## Anirudh on

Hey Mr.Bulten, I had the same question about the image augmentation in the post not increasing the size of the dataset. But, can we create multiple dataset objects for each augmentation type by calling the map function for that particular augmentation type. And then merge all the created augmented datasets along with the original one by concatenation. Is that a possible way to expand the dataset using augmentation?

## Wouter Bulten on

Hi Anirudh, I’m not 100% sure if I understand what you mean. Do you want to generate multiple versions of the same image using data augmentation? If that’s the case you can just call them in the map function, or indeed as you say, merge them. Something like:

Note that there is no need to ‘expand’ the dataset for most scenarios. Everytime you request a batch, a new batch is generated using the augmentations.

## Mubin Tirsaiwala on

Hey Anirudh and Wouter, Yes you can merge multiple datasets as long as they are concatenable i.e. match the dimensions. This can be done using tf.data.expermental.sample_from_datasets as suggested in the following documentation :- “https://www.tensorflow.org/api_docs/python/tf/data/experimental/sample_from_datasets”.

## Wouter Bulten on

Also a good suggestion! Thanks for the comment, Wouter

## Reza Bojnordi on

error code AttributeError: in user code:

## Wouter Bulten on

Hi Reza,

You are right. In newer versions of Tensorflow the random version has been changed to “tf.random.uniform”. I will test the code again with TF 2.0 and then update the post. For now, you can use the new function.

Regards, Wouter

## Ingrid Xu on

Hi Wouter, thank you so much for the tutorial. How can I turn the dataset into a numpy array? I need my x_train to be in numpy array to reshape and encode. I tried

`dataset = np.array(dataset)`

, but it returned an empty object. I’ve been stuck on this all day and I was wondering if you would know the solution!Thanks

## Wouter Bulten on

Hi Ingrid,

Normally you don’t want to turn the dataset into a numpy array for training. If you do so you will lose a lot of the performance gains tf.data can offer you. If you really need to get (a part of) the dataset as a numpy array you could do something like this:

The snippet above will return the first 10 elements from the dataset as a list.

Regards, Wouter

## Ingrid Xu on

Thank you so much Wouter! It worked :)

## Alexander P on

Hey,

thanks for sharing.

In tf version 2.2 I got the error:

TypeError: crop_and_resize_v2() got an unexpected keyword argument ‘box_ind’

for the crop_and_resize function you have to replace ‘box_ind’ by ‘box_indices’ in 2.2 (see here: https://www.tensorflow.org/api_docs/python/tf/image/crop_and_resize)

## Wouter Bulten on

Thanks! You are right, some things have changed since the last update of TF. I will have to update the post to reflect those changes.

Regards, Wouter

## Tolga Dincer on

Thank you so much for the tutorial. Could you possibly show an example of elastic deformation that incorporates with the tf.data api?

## Wouter Bulten on

Hi! Thanks for your comment. I don’t have any code available of elastic deformation using tf.Data unfortunately. Regards, Wouter

## Pedro Galvez on

Hi,

Thank you very much for your detailed explanation on data augmentation using Tensorflow. I have a question regaring the function “map”.

Does using “map” replaces the original image in the dataset, or adds an additional image (the modified version of the input image) to the dataset and keeps the original image?

Thank you! Pedro

## Wouter Bulten on

Hi Pedro, thanks for your comment!

I think it’s best to see the

`map`

function as part of the pipeline. Every time you run`map`

, nothing happens with the images yet. Instead, a chain of operations is made in the background. The TF docs explain it like this:Source: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map

Note that these transformations are executed lazy. Only when you actually request images from the dataset (like with a for loop and

`batch`

), the source images from your dataset are fed through the pipeline. At that time, each map is applied on the image or on the batch.So to answer your question: Yes, in a sense the original image is replaced. You are making a new dataset with the mapped images. The original images are not part of the dataset anymore. That’s why you sometimes want to apply the augmentation to only a part of the images (for example with

`tf.cond`

).## Kleyson Rios on

Thanks for this.

My augmenter function receives two parameters, like:

May you help me to update the code below ?

Running the code above I get the following error:

Thanks.

## Wouter Bulten on

Hi Kleyson,

It depends on your dataset. You will have to update the map call. If your dataset returns two variables you can do something like:

Now the function is applied on both the data and the label. If your augment function does not need the label, you can also do this:

## Shaobo on

Hi Wouter, thank you very much for sharing. I am a beginner working on a task of image localization. I am wondering how to update the bounding box (like (x1, x2, y1, y2)) in train_labels accordingly, while using functions like tf.image.random_crop, random_flip_left_right, or random_flip_up_down to the train_images. Thanks.

## Wouter Bulten on

Hi Shaobo, thanks! If you also need to update labels, you will have to pass them to the augment functions too. See my reply to this comment for an example! Good luck :)