Adversarially generated Julia sets

I thought I’d follow up my first article/tutorial about Julia, by showcasing another side of the language’s ecosystem, libraries for machine learning.

Namely, I’ll be displaying Knet and AutoGard, by implementing an image generating adversarial network. For the sake of continuity, I will use the Julia set generator we had in the last articles. We’ll feed the images to this GAN and try to get it to generate its own fractal~ish patterns.

If you don’t know what a GAN is, I suggest you just go read the original whitepaper, it’s one of the easiest and most interesting reads as far as machine learning papers go.

Assuming you don’t have the time to do that, but know some basic machine learning concepts, here’s a quick rundown.

A GAN is essentially two models, one that generates, let’s call it G, and one that discriminates (i.e classifies), D. The goal of D is to differentiate examples from a training data from “fakes”, the goal of G is to produce fakes images that fool D.

Getting started

First, we need to import and modify the code from the previous article, in order to make it generate and n dimensional matrix, with differing Julia sets as its rows. This matrix will server as our real examples for training the discriminator. Also, we need to install the dependencies for this model, namely Knet (which includes AutoGard) and Plots + GR for image generation.

Pkg.add("Knet")
Pkg.add("Plots")
Pkg.add("GR")

function julia(x, y, width, height, c)
  z = ((y/width)*2.7 - 1.3) + ((x/height)*4.5 - 2.5)im
  for i = 1:254
      z = z^2 + c
      if abs(z) >= 4
         return Float32(i)
      end
  end
  return Float32(255)
end


julia_set(height, width, c) = [julia(x, y, width, height, c) for x = 1:height, y = 1:width]

function get_training_data(n,size)
  data = []
  for i=1:n
    c = 1.2e^( (rand()/10 + (n/14)*π)*im)
    push!(data, julia_set(size, size, c))
  end
  cat(4, map(x -> cat(3,x),data)...)
end

Next, we’ll include our dependencies and initialize the plotting backend. More importantly, however, we’ll also define a variable called atype here, our first use of Knet.

If our machine has any GPUs, we can use the type KnetArray, which will, in turn, cause certain algebraic operations on that a array to be executed on the GPU using OpenCl code. If you don’t have any GPUs at your disposal, atype is set to a normal Julia array, and everything is executed on the CPU.

I’d like to note here, that you can realistically run this model on a CPU, I personally ran them on my laptop, on an i5 kabylake mobile CPU.

using Knet, Plots;
gr()
global atype = gpu()>=0 ? KnetArray{Float32} : Array{Float32}

The model

Now that that’s out of the way, let’s look at the code that will generate all of this stuff. This model is based on this repository, written by Ekin Akyürek, so shoutouts to him.

Note: After writing this, I was informed by Ekin that he actually wrote a blog post about this GAN model, before I had written this one. I encourage you to go checkout his blog.

First, we must create a function to actually build our networks, in this case, some basic multi-layer perceptrons. This essentially amounts to initializing a bunch of vectors (the weights and biases) with zeros or random small values.

#Initialize weights and biases, original code: https://github.com/ekinakyurek/GAN-70-Lines-of-Julia/blob/66a60a6ea4532841ee647f08759ae9b1ace0c892/gan.jl#L25
function initmodel(hidden, input, output)
  ? = [];
  x = input
  for h in [hidden... output]
    push!(?, atype(xavier(h,x)), atype(zeros(h, 1))) #FC Layers weights and bias
    x = h
  end
  ?
 end

Next, we’ll define a simple function to do the forward propagation through our models. This function is generic and can be used with any of the models we define.

The function takes two arguments, W is an array of vectors, with the odd index elements being the weights and the even indexed elements the biases, X is the vector of input features. It also takes an optional argument, which represents the dropout probability for any given value in our vector of weights.

leakyrelu(x;α=Float32(0.2)) = max(0,x) + α*min(0,x) # LeakyRelu activation
# A generic MLP forward prop function, original code: https://github.com/ekinakyurek/GAN-70-Lines-of-Julia/blob/66a60a6ea4532841ee647f08759ae9b1ace0c892/gan.jl#L6
function forward_prop(W,X;dropout_p=0.0)
    for i=1:2:length(W)
        X = W[i]*dropout(mat(X),dropout_p) .+ W[i+1] # mat(X) flattens X to an
        i < length(W)-1 && (X = leakyrelu.(X))
    end
    sigm.(X)
end
# Forward prop for the discriminator and generator respectively
D(w,x;dropout_p=0.0) = forward_prop(w,x;dropout_p=dropout_p)  #  Discriminator
G(w,z;dropout_p=0.0) = forward_prop(w,z;dropout_p=dropout_p)  #  Generator

Ok, here’s where we see how the libraries really come together to produce some beautiful code. Which is almost indistinguishable from the actual equations we’d write down to define this model. This is the part of our code where we define our loss functions.

global const ?=Float32(1e-8) #  a small number prevent from getting NaN  in logs
?d(?d,x,Gz) = -mean(log.(D(?d,x)+?)+log.(1-D(?d,Gz)+?))/2 # Discriminator Loss
?g(?g, ?d, z) = -mean(log.(D(?d,G(?g,z))+?))             # Generator Loss

∇d  = grad(?d) # Discriminator gradient
∇g  = grad(?g) # Generator gradient

Next, we need some auxiliary code, to help us save the images and to produce the noise which the generator will use as input, in order to introduce randomness in the images it generates. ? is a dictionary which will just use to carry around various information.

?(input, batch) = atype(randn(Float32, input, batch)) # SampleNoise
function generate_and_save(?,number,?,gen;fldr="save_image_directory/")
    Gz = G(?[1],?(?[:ginp],number)) #.> 0.5
    Gz = permutedims(reshape(Gz,(?[:size],?[:size],number)),(2,1,3))
    [png(heatmap(Gz[:,:,i]), "$(fldr)$(gen)-$(i).png") for i=1:number]
end

The final piece of this puzzle is a function that will take some data and train our models several time. The essential part is calling the update! macro, this takes the weights vector for the respective model, its gradient (generate by AutoGrad) and an optimizer.

In this case we use Adaptive Moment Estimation, but we could have just as well used SGD, AdaGrad or just simple averaging for our optimizer.

With this information, update! will adjust the weights of our models, in order to try and minimize their loss functions, a process also known as backwards propagation.

Keep in mind the loss functions are interdependent, in an ideal model there should be some balance between the losses of D and G, that is to say, they should have a similar value and never converge to zero.

#original code: https://github.com/ekinakyurek/GAN-70-Lines-of-Julia/blob/66a60a6ea4532841ee647f08759ae9b1ace0c892/gan.jl#L42
function train_model(?, data, ?, optim)
    gloss=dloss=counter=0.0;
    B =
    for generation=1:?[:epochs]
        for n=1:32:(length(data[1,1,1,:]) - 33)
            x = data[:,:,:,n:n+31]

            Gz = G(?[1],?(?[:ginp],?[:batchsize]))
            update!(?[2], ∇d(?[2],x,Gz), optim[2])

            z=?(?[:ginp],2*?[:batchsize])
            update!(?[1], ∇g(?[1], ?[2], z), optim[1])
        end
        # Compute the total losses, log them and save some images
        log_model(?, data, ?, generation)
    end
end

We’ll define a function that will save the images generated during various epochs and print out the losses for both models in each epoch.

function log_model(?, data, ?, generation)
    println("Running logging function for generation $(generation)")
    println("-----------------------------------------------------")
    gloss=dloss=counter=0.0

    for n=1:32:(length(data[1,1,1,:]) - 33)
        x = data[:,:,:,n:n+31]

        counter+=2*?[:batchsize]

        Gz = G(?[1],?(?[:ginp],?[:batchsize]))
        dloss += 2?[:batchsize]*?d(?[2],x,Gz)

        z=?(?[:ginp],2*?[:batchsize])
        gloss += 2?[:batchsize]*?g(?[1],?[2],z)
    end

    println("Generator average loss: $(gloss/counter)")
    println("discriminator average loss: $(dloss/counter)")
    generate_and_save(?,6,?,generation)
    println("Saved images for generation: $(generation)")
    println("------------------------------------------------\n")
end

Let’s combine all the pieces together in our main function. This simply consists of defining various hyperparameters and the shape of our network, then running the training function.

function main(size, data)
    ? = Dict(:batchsize=>32,:epochs=>80,:ginp=>256,:genh=>[Int64(round(size^2*2/3))],:disch=>[Int64(round(size^2*2/3))],:optim=>Adam,:size=>size)
    println("Using hidden layer for discriminator: $(?[:disch]) and hidden layers for generator: $(?[:genh])")
    ? = (?g,?d)   = initmodel(?[:genh], ?[:ginp], size^2), initmodel(?[:disch], size^2, 1)
    ? = (?pg,?pd)  = optimizers(?g,?[:optim];lr=0.003), optimizers(?d,?[:optim];lr=0.0003)
    train_model(?, data, ?, ?)
end

Now, we just need to generate some julia sets, feed them as the input data to our model, and watch the pretty images being created.

size = 58
X = get_training_data(6000,size)
main(size, X)

Using various training samples and sizes, here’s some images I’ve managed to generate using this.

!

Keep in mind, that there’s some element of randomness to the above code, and you can easily slip into a situation where either the G or the D loss converges to zero. Meaning that either the discriminator is unable to distinguish between images, resulting in no improvements to our generator or the discriminator gets “too good” to ever be fooled by our generator, again, resulting in no improvements to our generator.

Adjusting the various hyperparameters. This is where the hard and expensive part of machine learning would come in, where models have to be tuned through a combination of theory, intuition, luck and trial&error.

Obviously, this model doesn’t have recurrent or convolutional layers, which are a staple of most modern neural networks, especially when dealing with images.

However, if you dig a bit into Knet’s docs, I think you’ll find that adapting this model to use a CNN or a LSTMN won’t be that hard.

The beauty of Knet is that is offers a very nice balance between low level primitives and higher level abstractions, making it very useful if you need models that are somewhat unique. The other amazing thing about it, is its performance when compared to leading frameworks in this domain, such as MXnet and Tensorflow.


If you enjoyed this article you may also like:


angry
loony
pls
sexy
straigt-face










twitter logo
Share this article on twitter
 linkedin logo
Share this article on linkedin
Fb logo
Share this article on facebook