Welcome!!

Hello potential employers,

The following is a breif introduction to one of my current side projects which I call FFT Pose Tags. The goal of this notebook is to explain how my implementation works.

Background: how ArUco tags work

Feducial pose tags are images that are used to help cameras identify their position and orientation. They are often used in robotics to help identify world location of a moving robot, and in augmented relity applications to project images onto a object. A popular library for producing and finding these tags is ArUco. Functions for identifying and finding the poses are integrated in the popular OpenCV library.

This is an image of an ArUco Tag

In [1]:
from imageio import imread
import matplotlib.pyplot as plt
aruco_tag_img = imread('./aruco_tag.jpg')

plt.figure(figsize=(10,10))
plt.imshow(aruco_tag_img)
Out[1]:
<matplotlib.image.AxesImage at 0x108bad470>

Identify the marker corners

ArUco finds potential marker by looking for quadrilaterals, verifies the marker using the internal pattern, and identify each of the corner locations.

In [2]:
from cv2 import aruco
#the aruco dictionary defines 
ARUCO_DICT = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
(aruco_marker_corners,_,_) = aruco.detectMarkers(aruco_tag_img, ARUCO_DICT)
marked_aruco_img = aruco.drawDetectedMarkers(aruco_tag_img, aruco_marker_corners, borderColor = (255,0,0))

plt.figure(figsize=(10,10))
plt.imshow(marked_aruco_img)
Out[2]:
<matplotlib.image.AxesImage at 0x10559c9e8>

Identify the pose

It then uses PnP to caculate the pose

In [3]:
import numpy as np
import pickle
#load the camera parameters which define how pixels translate into directions out of the camera 
CAMERA_PARAMS = pickle.load(open('camera_params.pickle', 'rb'))
aruco_pose = aruco.estimatePoseSingleMarkers(corners = aruco_marker_corners, 
                                markerLength = 2, 
                                cameraMatrix = CAMERA_PARAMS['mtx'], 
                                distCoeffs = CAMERA_PARAMS['dist'])

aruco_pose_img = aruco.drawAxis(image = aruco_tag_img,
                                cameraMatrix = CAMERA_PARAMS['mtx'],
                                distCoeffs = CAMERA_PARAMS['dist'],
                                rvec = aruco_pose[0],
                                tvec = aruco_pose[1],
                                length = 2.);
plt.figure(figsize=(10,10))
plt.imshow(aruco_pose_img)
Out[3]:
<matplotlib.image.AxesImage at 0x105e79e10>

My FFT Pose Tags

I have designed my own tags that attempt to alleviate some of the shortcomeings of ArUco and other tags. The idea is to track the changes in frequency that occur as you distort an image by applying affine transformations. This allows us to use the entire image to track the pose which will hopefully help the tags become robust to occlusion, and let us leverage some Fourie transform tecniques to identify the pose with high accuracy.

My Tag structure:

My tag has a few features. The black square boarder is designed so that we can use existing libraries to identify potential locations tags. The blue and red tags on the side are used to easily identify if a potential quadrilateral is a tag. The largest area in the center is a sinusoidal signal extending vertically combined with a sinusoid extending horizontally. This section will be used to decompose the orientation based on how the frequency changes as we rotate the tag.

In [4]:
fft_tag = imread('fft_tag.png')
plt.imshow(fft_tag)
Out[4]:
<matplotlib.image.AxesImage at 0x106f4d400>

Fourier transforms

Fourier transforms convert a signal consisting of sinusoids into their a frequency domain. Below you can see the undistorted tag's 2D discreet Fourier transform (an fast implementation an algorithm for solving this iis called FFT). Notice that there are two points on the resulting frequency domain. The points direction should be interpreted as the direction the frequency is extending in, and its magnitude is frequency of that signal.

In [5]:
#crop out the boarder
freq_section = fft_tag[130:340,130:340,:]
freq_section = np.sum(freq_section, axis = 2)
freq_section = freq_section - np.mean(freq_section)

frequencies = np.fft.fftshift(np.fft.fft2(freq_section))

#remove the 0 frequency section
(rows, cols) = frequencies.shape
frequencies[rows//2,cols//2] = 0

#zoom in 
frequencies = frequencies[90:-90, 90:-90]

plt.figure(figsize=(10,10))
plt.subplot(121); plt.title("Tag"); plt.imshow(freq_section)
plt.subplot(122); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))
Out[5]:
<matplotlib.image.AxesImage at 0x110799438>

Distorted Tag transform

Observe below that as we warp the tag the frequency domain changes as well

In [6]:
import cv2

rows,cols = freq_section.shape

pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[0,80],[160,90],[70,250]])

M = cv2.getAffineTransform(pts1,pts2)

distorted_freq_section = cv2.warpAffine(freq_section,M,(cols,rows))

frequencies = np.fft.fftshift(np.fft.fft2(distorted_freq_section))
frequencies[rows//2,cols//2] = 0

#zoom in 
frequencies = frequencies[90:-90, 90:-90]

plt.figure(figsize=(10,10))
plt.subplot(121); plt.title("Tag"); plt.imshow(distorted_freq_section)
plt.subplot(122); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))
Out[6]:
<matplotlib.image.AxesImage at 0x11191afd0>

Pose Detection Example:

We will now try to decompose the pose of a tag from a sample image.

In [7]:
fft_img = imread('test_fft.jpg')
plt.figure(figsize=(10,10))
plt.imshow(fft_img)
Out[7]:
<matplotlib.image.AxesImage at 0x111d274a8>

Find the frequency section of the tag

This is done manually here as there is nothing interesting or original in my implmentation.

In [8]:
print(fft_img.shape)
fft_img = np.sum(fft_img.copy(), axis = 2)

#entering corners manually as idneitifying them programatically 
#isn't all that ineteresting and takes time to explain
#corners = np.array([[256, 423], [224, 487], [298,527], [324,446]])
corners = np.array([[423, 256],[487, 224],[524,290],[460,324]])

#crop around the corners
x,y,w,h = cv2.boundingRect(corners)
cropped_img = fft_img[y:y+h, x:x+w]
cropped_corners = [(i - x, j - y) for (i,j) in corners]
mask = np.zeros(cropped_img.shape)
cv2.fillConvexPoly(img = mask,
                   points = np.array([cropped_corners], dtype = np.int32),
                   color = 1)

masked_fft = cropped_img * mask

plt.figure(figsize=(5,5))
plt.title("Tag"); plt.imshow(masked_fft)
(720, 1080, 3)
Out[8]:
<matplotlib.image.AxesImage at 0x1119414e0>

Find the frequencies that make up this pattern

We can find the frequencies that make up this pattern using the Fourier transform. We can see below that we have run the FFT across the image set within a buffer of zeros. This helps us get a more accurate reading.

In [9]:
shapef = np.array(masked_fft.shape)

#buffer with 0s for better resolution 
#https://ccrma.stanford.edu/~jos/mdft/Causal_Zero_Padding.html#1
size_multiplier = 10
masked_fft_zero = np.zeros(shapef * size_multiplier)
shapez = masked_fft_zero.shape
masked_fft_zero[shapez[0]//2-shapef[0]//2:1+shapez[0]//2+shapef[0]//2,
                shapez[1]//2-shapef[1]//2:shapez[1]//2+shapef[1]//2] = masked_fft

frequencies = np.fft.fftshift(np.fft.fft2(masked_fft_zero))
(rows, cols) = frequencies.shape
frequencies[rows//2 - 40:rows//2 + 40,
            cols//2 - 40:cols//2 + 40] = 0
frequencies = frequencies[400:-400, 400:-400]

plt.figure(figsize=(15,15))
plt.subplot(131); plt.title("Tag"); plt.imshow(masked_fft)
plt.subplot(132); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))

freq_points = [[7.03, 4.67], [2.8,-5.95]]
plt_size = (-10,10)

plt.subplot(133); 
plt.title("Main Frequency Points"); 
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')

plt.scatter(*zip(*freq_points))
Out[9]:
<matplotlib.collections.PathCollection at 0x115e8d2b0>

Period representation

We will now tranform our frequency representation, into a "period represntation". These will maintain the direction of the frequency, but be scaled by their relative period.

In [10]:
from numpy.linalg import norm

freq_points = [[7.03, 4.67], [2.8,-5.95]]
period_points = [[x/(norm([x,y])**2), y/(norm([x,y])**2)] for (x,y) in freq_points]

plt.figure(figsize=(10,10))

plt.subplot(121); 
freq_plt_size = (-10,10)
plt.title("Frequency Points"); 
plt.ylim(freq_plt_size)
plt.xlim(freq_plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*freq_points))

plt.subplot(122); 
period_plt_size = (-.5,.5)
plt.title("Period Points"); 
plt.ylim(period_plt_size)
plt.xlim(period_plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))
print(period_points)
[[0.0986946474649655, 0.06556244717800695], [0.06475111290975313, -0.13759611493322543]]

Get Constructive Interference Points

We now have the directions and the periods of the sine waves that make up our signal. If we imagine these as two waves originating from the origin with associated periods and directions we can attempt to find the points where they constructively interefere with eachother. The first occurance of this will happen at the origin. The others will occur at points that are orthogonal to each period point.

In [11]:
def get_orthogonal(p):
    return (-p[1],p[0])

orthogonals = [get_orthogonal(p) for p in period_points]

def plot_line(point, direction):
    """plot from a point in a direction"""
    axes = plt.gca()
    xs = np.array(axes.get_xlim())
    slope = direction[1]/direction[0]
    ys = point[1]+(slope * (xs - point[0]))
    plt.plot(xs, ys, '--')
    
plt_size = (-.5,.5)
plt.figure(figsize=(10,10))

plt.subplot(121)
plt.title("Period Points"); 
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))

plt.subplot(122)
plt.title("Orothogonal"); 
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))

for period_pt, ortho in zip(period_points, orthogonals):
    plot_line(period_pt, ortho)
    plot_line([0,0], ortho)

If we now get the intersections of all of these lines and plot them onto the the original frequency points

In [12]:
from numpy import dot
from itertools import product

def line_intersection(a1,a2,b1,b2) :
    """get the intersection of two lines defined by point and direction"""
    (a1,a2,b1,b2) = [np.array(x) for x in [a1,a2,b1,b2]]
    # if np.equal(a1,b1).all() or np.equal(a1,b2).all():
        # return a1
    # if np.equal(a2,b1).all() or np.equal(a2,b2).all():
        # return a2
    da = a2-a1
    db = b2-b1
    dp = a1-b1
    dap = get_orthogonal(da)
    denom = np.dot( dap, db)
    num = np.dot( dap, dp )
    return (num / denom.astype(float))*db + b1

interference_points = np.array([line_intersection(a, np.add(a, orthogonals[0]), b, np.add(b, orthogonals[1]))
                       for a,b in product([[0,0], period_points[0]], [[0,0], period_points[1]])])
fft_size,_ = masked_fft.shape
plt.figure(figsize=(10,10))
plt.subplot(121)
plt.title("Calculated Interference Points"); 
plt.ylim((-30,30))
plt.xlim((-30,30))
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(interference_points[:,0] * fft_size, interference_points[:,1] * fft_size, c='r')

plt.subplot(122)
plt.imshow(masked_fft)
plt.title("Calculated Interference Points Over Original Image"); 
plt.scatter(40 + (interference_points[:,0] * fft_size), 40 + (interference_points[:,1] * -fft_size), c='r')
Out[12]:
<matplotlib.collections.PathCollection at 0x1163eb6a0>

Correlating Points and PnP

As we can see, the period points and the intersections of their orthogonals have the exact same spacing as the constructive interference points on the frequency tag from the original image. This means that we have very precise points in the image, and since we know how we constructed the tag ourselves we have the correlating points in the real world. We can then use Persepective-n-Point (PnP) to find the homography, and therefore the translation and rotation between the two sets of points.

In [13]:
from cv2 import solvePnP

org_period = -5.
obj_points = np.array([[0, 0, 0],
                       [0, org_period, 0],
                       [org_period, 0, 0],
                       [org_period, org_period, 0]])
#image y coordinates are reversed
interference_points[:,0] *= -1
middle_of_tag = (np.array(corners[0]) + np.array(corners[2])) / 2
img_points = middle_of_tag + (interference_points * fft_size)

(_, rvect, tvect) = solvePnP(obj_points, img_points.astype(np.float32), CAMERA_PARAMS['mtx'], None)
fft_img = imread('test_fft.jpg')
fft_tag_pose_img = aruco.drawAxis(image = fft_img,
                                cameraMatrix = CAMERA_PARAMS['mtx'],
                                distCoeffs = CAMERA_PARAMS['dist'],
                                rvec = rvect,
                                tvec = tvect,
                                length = 25);
plt.figure(figsize=(15,15))
plt.imshow(fft_tag_pose_img)
Out[13]:
<matplotlib.image.AxesImage at 0x116549828>

Things left out

This notebook has only attempted to cover the most digestable and fun sections of my project. Lots of work went into finding things like finding the appropriate windowing function for the FFT, and designing the tags. Some more lengthy but still interesting discussions can be had on things like

  • Vectorizing numpy code for a fast implementation
  • Pain points in correlating period points with world coordinates
  • How accurate FFT frequency points are obtained
  • Assumptions made about transformations
  • FFT windowing functions
  • Failed attempts
  • Things I learned in the process
  • Plans going forward

Thanks for reading!!