Understanding Convolution by Implementing in Julia

A story about my experiments in implementing convolution in Julia and what I learned from it.

Yosi Pramajaya
Towards Data Science

--

Photo by Roman Mager on Unsplash

In part 2 of Fast.ai Deep Learning Course, I learned that it’s important to not only be able to use Deep Learning library such as Tensorflow / PyTorch, but to really understand the idea and what’s actually happening behind it. And there’s no better way to understand it than to try and implement it ourselves.

In Machine Learning practice, convolution is something we’re all very familiar with. So I thought, why not give it a try? Here’s the result of my experiments in implementing Convolution in Julia.

Why Julia?

When I took Andrew Ng’s course in Machine Learning, I mostly use MATLAB to understand machine learning algorithms. But MATLAB is a commercial product and can be very expensive.

Julia is very fast, and specifically designed to be very good at numerical and scientific computing, which is what we need in implementing Machine Learning algorithms. It’s also very easy to learn. Most importantly, it’s free and open-source.

Julia compared to other Languages. Source

Start to Experiment

Alright, now it’s time to start working on it. You see, there’re so many articles about CNN already, so I’m not going to explain it more. If you’re looking into basic concept of CNN, I recommend this Youtube playlist by Andrew Ng.

It’s always good to start on simple things first. So for my experiments, I will use 6x6 matrix as input, 3x3 matrix as filter, and expected output of 4x4 matrix. I start with small matrix so that I can check the accuracy of my function.

Convolution through Spreadsheets.

I define the input, the filter, and the expected output through Julia CLI. The input will be 6x6 matrix, the filter will be 3x3 matrix, and expected output will be 4x4 matrix with the same value like the picture above.

julia> input = [[3,2,4,2,7,6] [8,0,2,1,7,8] [2,2,10,4,1,9] [1,5,4,6,5,0] [5,4,1,7,5,6] [5,0,2,7,6,8]]6×6 Array{Int64,2}:
3 8 2 1 5 5
2 0 2 5 4 0
4 2 10 4 1 2
2 1 4 6 7 7
7 7 1 5 5 6
6 8 9 0 6 8
julia> filter = [[1,1,1] [0,0,0] [-1,-1,-1]]3×3 Array{Int64,2}:
1 0 -1
1 0 -1
1 0 -1
julia> output = [[-5,-8,-2,1] [0,-12,-5,5] [4,4,2,-4] [3,6,0,-10]]4×4 Array{Int64,2}:
-5 0 4 3
-8 -12 4 6
-2 -5 2 0
1 5 -4 -10

Iteration #1

Since we already know the input (6x6) and the filter (3x3), the output should be a 4x4. In the first iteration, if we do the computation manually, and assume the filter is 3x3, the code will be like this:

First iteration

We can copy the whole function in the CLI and see the result.

julia> conv_1(input, filter) == outputtrue

And it works! Another awesome feature from Julia is that we can benchmark our function. We can do that by using BenchmarkTools.

julia> using BenchmarkToolsjulia> @benchmark conv_1(input, filter)BenchmarkTools.Trial:
memory estimate: 208 bytes
allocs estimate: 1
--------------
minimum time: 196.880 ns (0.00% GC)
median time: 200.165 ns (0.00% GC)
mean time: 212.749 ns (0.82% GC)
maximum time: 1.828 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 616

And as you can see, Julia is amazingly fast! But what if we have more than 3x3 filters?

Iteration #2

In the second iteration, let’s try to do that by looping through a filter matrix. The code will look like this:

Second iteration
julia> conv_2(input, filter) == outputtrue

The code is working, and now it can accept any size of Filter. What about it’s performance?

julia> @benchmark conv_2(input, filter)BenchmarkTools.Trial:
memory estimate: 208 bytes
allocs estimate: 1
--------------
minimum time: 485.372 ns (0.00% GC)
median time: 488.586 ns (0.00% GC)
mean time: 517.087 ns (0.33% GC)
maximum time: 13.017 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 191

It’s still very fast! But can I make it faster? Maybe if I find another way instead of looping through Filters that will make it faster and more concise to read.

Iteration #3

In Julia, we can do this with dot syntax. So instead of looping through Filters, we can do this:

Third iteration.

The third iteration is now more concise. Let’s give it a try:

julia> conv_3(input, filter) == outputtruejulia> @benchmark conv_3(input, filter)BenchmarkTools.Trial:
memory estimate: 5.20 KiB
allocs estimate: 33
--------------
minimum time: 2.378 μs (0.00% GC)
median time: 2.482 μs (0.00% GC)
mean time: 2.679 μs (2.05% GC)
maximum time: 71.501 μs (95.64% GC)
--------------
samples: 10000
evals/sample: 9

I was surprised that instead of getting faster, it’s actually getting slower, and with a significant difference!! Why is that? This is what the documentation says:

In other languages vectorization is also often required for performance: if loops are slow, the “vectorized” version of a function can call fast library code written in a low-level language. In Julia, vectorised functions are not required for performance, and indeed it is often beneficial to write your own loops.

So the second iteration is the best implementation of convolution so far. But it’s way too basic. What if I take it further?

With Padding

Another concept that’s commonly used in Convolution is padding. If we have 6x6 input matrix with 3x3 filter matrix, instead of having 4x4 matrix as the output, with padding we can expect 6x6 matrix. This approach is usually called the “same” convolution. If you want to know more about the concept of Padding, try to watch video C4W1L04 from Andrew Ng.

How to implement padding in our convolution function? One way is to re-create our input matrix with padding around the real values. The formula to know how much padding we need is:

padding = (filter - 1) / 2

The formula above is with the assumption that the size of the filter matrix is odd. And it’s mostly, if not always, odd. So in my implementation, I assume the size of filter matrix are odd and I will fill the padding with 0.

With Padding

You probably notice the ÷ operator when I try to calculate the padding value. This is because of Julia’s type-stability feature. Basically, all / operator will return Float, and all ÷ will return Integer. This type-stability feature is one main reason why Julia is super fast!

Let’s try this function to get “same” convolution with the input and filter matrices that we have. Notice that the values from [2,2] to [5,5] is the same with “valid” convolution done with the previous function.

julia> padding_conv_1(input, filter, "same")6×6 Array{Float64,2}:
-8.0 1.0 2.0 -5.0 1.0 9.0
-10.0 -5.0 0.0 4.0 3.0 10.0
-3.0 -8.0 -12.0 4.0 6.0 12.0
-10.0 -2.0 -5.0 2.0 0.0 13.0
-16.0 1.0 5.0 -4.0 -10.0 18.0
-15.0 3.0 10.0 -1.0 -9.0 11.0
julia> padding_conv_1(input, filter, "same")[2:5,2:5] == conv_2(input, filter)true

The performance of our padding_conv function is also not bad.

julia> @benchmark padding_conv_1(input, filter, "same")BenchmarkTools.Trial:
memory estimate: 992 bytes
allocs estimate: 2
--------------
minimum time: 1.746 μs (0.00% GC)
median time: 1.803 μs (0.00% GC)
mean time: 1.865 μs (0.72% GC)
maximum time: 70.076 μs (96.21% GC)
--------------
samples: 10000
evals/sample: 10

Although quite slower compared to “valid” convolution, which is expected, but the memory allocation is still very low.

Strided Convolution

Again, I want to improve my convolution by trying to implement “Strided” convolution. Usually, stride=1. If you want to know more about the concept, watch video C4W1L05 from Andrew Ng.

Implementing Strided Convolution is a bit tricky. First, I need to find the size of the output matrix based on input, filter, and the stride. The formula of the size is:

result = (input-filter) ÷ stride + 1

So if we have 7x7 input matrix, with 3x3 filter matrix, and stride=2, we will have 3x3 (instead of 5x5 with stride=1). The tricky part is iterating over the input matrix. In our simple convolution, the ratio of change between Input and Output matrix are the same, that is 1. That’s why we can use variable i and j when iterating through input matrix.

But when the stride is not 1, we need an additional variable to iterate through the input matrix. Here’s the code:

With Strides

Let’s see the performance result. For this check, I use 7x7 input matrix with 3x3 filter. This is so that I can try with stride=1 (expecting 5x5 filter) and stride=2 (3x3 filter).

julia> input7 = rand(7,7)julia> @benchmark stride_conv_1(input7, filter)BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 1
--------------
minimum time: 742.395 ns (0.00% GC)
median time: 747.081 ns (0.00% GC)
mean time: 772.902 ns (0.40% GC)
maximum time: 5.709 μs (86.29% GC)
--------------
samples: 10000
evals/sample: 124
julia> @benchmark stride_conv_1(input7, filter, 2)BenchmarkTools.Trial:
memory estimate: 160 bytes
allocs estimate: 1
--------------
minimum time: 319.876 ns (0.00% GC)
median time: 322.438 ns (0.00% GC)
mean time: 327.930 ns (0.56% GC)
maximum time: 2.753 μs (87.78% GC)
--------------
samples: 10000
evals/sample: 233

Having more stride actually increase the speed! This makes sense because we iterate less than before.

Putting It All Together

We have seen how we can do simple convolution, with padding, and with stride > 1. Let’s see if we can put them together in a function.

Basic 2d convolution

And we can try it out on the CLI.

julia> @benchmark conv2d(input7, filter, 2, "same")BenchmarkTools.Trial:
memory estimate: 944 bytes
allocs estimate: 2
--------------
minimum time: 980.100 ns (0.00% GC)
median time: 1.044 μs (0.00% GC)
mean time: 1.072 μs (1.00% GC)
maximum time: 55.501 μs (97.63% GC)
--------------
samples: 10000
evals/sample: 10

Conclusion

What I did was just a very basic 2 dimensional convolution algorithm. In practice, we usually use 3 dimensional convolution with multiple layers and usually paired with Pooling algorithm. But for me, it’s already very exciting to learn and implement a very basic Convolution in Julia.

I’m still learning Julia too. Feel free to let me know how I can improve on my code. Thanks for reading.

If you’re interested with the code, here’s the link to the repository on GitHub.

--

--