Deep Neural Networks are just awesome!

Background

If you followed DNN news recently, then you might have heard about Capsule Network (Jeff Hinton). And even if you have, there is a remarkable article on Medium which you should give a read.

This sort of fixes the “temporal or contextual problems”. Scale and rotational invariance still haunt the prediction or recognition algorithms. There are ways to solve some of the scale problems, like Spatial Pyramid Pooling or Pyramidical Residual Units, but they tend to be compute intensive. And they still do not guarantee full accuracy all scales, but only quantized levels.

I also read a very fascinating paper last year, Harmonic Networks which took a very different approach to achieve rotational invariance.

Simplification

If we were to simplify DNNs grossly, we could say that “in DNNs we take a small image filter (a Kernel learned) and look at a target image through this filter. Whenever the feature or part of the image matches this filter, we confirm our match and raise some of the vectors in our embeddings to help the classifier!” Refer this image:

transparent black overlay

Experiments

Last week in our labs, we had a theoretical discussion on what Convolutions are doing? There wasn’t a summed-up conclusion, but this blog-post from NVidia was close to what we also concluded. Here is what it said “While it is unknown which interpretation of convolution is correct for deep learning, the cross-correlation interpretation is currently the most useful: convolutional filters can be interpreted as feature detectors, that is, the input (feature map) is filtered for a certain feature (the kernel) and the output is large if the feature is detected in the image. This is exactly how you interpret cross-correlation for an image.”

So we collected these thoughts and performed a thought experiment:

  1. pyramidal approach > scale invariance
  2. harmonic kernels > rotational invariance
  3. convolution > correlation

Thought experiments shouldn’t take long to test them, so we came up with a quick and dirty approach. Here is what we did:

image-left we thought about the above two problems as a designer and asked: “what should we do to a kernel so it can be pyramidical as well as harmonic at the same time?” And as always nature already had the answer, Fractals!

image-right

Training a whole new network and then running the network to create kernel visualizations would be too much for this short thought experiment. But Distill had done this work for us already! We downloaded the visualization images there and selected this particular one for our science project:

image-leftWe were still too lazy to create a fractal ourselves, so we went over to Malinc.se and built this fractalized kernel.

Finally taking inspiration from “Convolutions are performing Correlations”, we re-used this Template Matching code written in Python-OpenCV.

import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread('messi5.jpg',0)
img2 = img.copy()
template = cv2.imread('template.jpg',0)
w, h = template.shape[::-1]

# All the 6 methods for comparison in a list
methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR',
            'cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', 'cv2.TM_SQDIFF_NORMED']

for meth in methods:
    img = img2.copy()
    method = eval(meth)
    # Apply template Matching
    res = cv2.matchTemplate(img,template,method)
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
    # If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
    if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
        top_left = min_loc
    else:
        top_left = max_loc
    bottom_right = (top_left[0] + w, top_left[1] + h)
    cv2.rectangle(img,top_left, bottom_right, 255, 2)
    plt.subplot(121),plt.imshow(res,cmap = 'gray')
    plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
    plt.subplot(122),plt.imshow(img,cmap = 'gray')
    plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
    plt.suptitle(meth)
    plt.show()

Results

Here is the first result of using a normal feature vector we took from Distill:

transparent black overlay

Some correlation can be seen, but the face is not detected properly.

Here is the result of our Fractalized Kernel transparent black overlay

BINGO!!

And to make sure we weren’t hallucinating or getting fooled by 1 sample, we performed it multiple times:

Normal Kernel :

transparent black overlay

Fractalized Kernel :

transparent black overlay

Isn’t this exciting!

Ask the Fox!

We would have spent cool 3-4 hours trying out different kernels, shapes, and images and frankly, we were delighted! There are many takeaways from this thought experiment. Correlation can be a new way to go about convolution, especially where compute resources are low. Fractals on learned kernels may help in scale and rotational invariance. We would work on the mathematical/theoretical background of this research further, integrating it into the networks and share further details with those who are interested! Do remember these are not claims, just fun experiments we did while on beer. Need proof? Ask this fox!

transparent black overlay

Let us know what you think!

Rohan Shravan

Not an English Professor

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...