Exercise 2. Write training and evaluation set generators.

Now that we have a collection of examples, we need a convenient way to pull them out for use in our model. When we fire up the autoencoder for real, we'll want to split our data set into two: a training set and a separate evaluation set. In order to set a good precedent and get the right pieces in place, it will be helpful to make our examples look like two data sets, one for training, and the other for evaluation.

Generators

A very nice way to do this is with generators - special python functions that can give you long sequence of results, but only one at a time and only when you ask for one. Here's a great tutorial if you'd like to take the detour. The biggest concrete difference is the use of a yield statement in place of a return statement at the end of the function.

Without generators, you would have to load the entire data set into memory at once. That's fine when you have a handful of examples, but you can run out of space quickly when you have a million images. Also, when you have some pre-processing step, like adding noise or a rotation, it can take forever to do that to your whole data set. Generators are lazy, and they just do it one at a time, as needed.

Functions are just another type of object

Another useful feature of python we get to exploit is that everything is an object: Variables, classes, lists, and even functions. That means that you can write a function that creates another function, and returns it as a result.

We can demonstrate here with a toy example. Let’s say we want to create a function called add_a_few(), that adds a number to whatever number you give it. The only trouble is that at the time we write the code we don't know how much we’re going to want to increment by. We can create another function, called create_add_a_few(), that takes in our increment step and uses it to build the add_a_few() function we want. Then create_add_a_few() returns the increment() function, and we can use it exactly the same as if we had hard coded in the right increment value from the beginning.

def create_add_a_few(step):

    def add_a_few(input):
        sum = input + step
        return sum

    return add_a_few

add_a_few = create_add_a_few(4)
print(add_a_few(3))

In this example, creating a function within a function is needlessly complex, but there are times that it will simplify our lives. In this coding exercise for instance we’ll use it to create two generator functions, one for our training data, and one for our evaluation data.

Coding challenge

  • Create two functions within the get_data_sets() function, training_set() and evaluation_set(). These should be generators, that is, they end in a yield statement, giving a randomly selected example each time.
  • Return them both as the output of get_data_sets().

My solution

Here is all the code we've written up to this point.

Complete and Continue  
Discussion

4 comments