Hello potential employers,
The following is a breif introduction to one of my current side projects which I call FFT Pose Tags. The goal of this notebook is to explain how my implementation works.
Feducial pose tags are images that are used to help cameras identify their position and orientation. They are often used in robotics to help identify world location of a moving robot, and in augmented relity applications to project images onto a object. A popular library for producing and finding these tags is ArUco. Functions for identifying and finding the poses are integrated in the popular OpenCV library.
from imageio import imread
import matplotlib.pyplot as plt
aruco_tag_img = imread('./aruco_tag.jpg')
plt.figure(figsize=(10,10))
plt.imshow(aruco_tag_img)
ArUco finds potential marker by looking for quadrilaterals, verifies the marker using the internal pattern, and identify each of the corner locations.
from cv2 import aruco
#the aruco dictionary defines
ARUCO_DICT = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
(aruco_marker_corners,_,_) = aruco.detectMarkers(aruco_tag_img, ARUCO_DICT)
marked_aruco_img = aruco.drawDetectedMarkers(aruco_tag_img, aruco_marker_corners, borderColor = (255,0,0))
plt.figure(figsize=(10,10))
plt.imshow(marked_aruco_img)
import numpy as np
import pickle
#load the camera parameters which define how pixels translate into directions out of the camera
CAMERA_PARAMS = pickle.load(open('camera_params.pickle', 'rb'))
aruco_pose = aruco.estimatePoseSingleMarkers(corners = aruco_marker_corners,
markerLength = 2,
cameraMatrix = CAMERA_PARAMS['mtx'],
distCoeffs = CAMERA_PARAMS['dist'])
aruco_pose_img = aruco.drawAxis(image = aruco_tag_img,
cameraMatrix = CAMERA_PARAMS['mtx'],
distCoeffs = CAMERA_PARAMS['dist'],
rvec = aruco_pose[0],
tvec = aruco_pose[1],
length = 2.);
plt.figure(figsize=(10,10))
plt.imshow(aruco_pose_img)
I have designed my own tags that attempt to alleviate some of the shortcomeings of ArUco and other tags. The idea is to track the changes in frequency that occur as you distort an image by applying affine transformations. This allows us to use the entire image to track the pose which will hopefully help the tags become robust to occlusion, and let us leverage some Fourie transform tecniques to identify the pose with high accuracy.
My tag has a few features. The black square boarder is designed so that we can use existing libraries to identify potential locations tags. The blue and red tags on the side are used to easily identify if a potential quadrilateral is a tag. The largest area in the center is a sinusoidal signal extending vertically combined with a sinusoid extending horizontally. This section will be used to decompose the orientation based on how the frequency changes as we rotate the tag.
fft_tag = imread('fft_tag.png')
plt.imshow(fft_tag)
Fourier transforms convert a signal consisting of sinusoids into their a frequency domain. Below you can see the undistorted tag's 2D discreet Fourier transform (an fast implementation an algorithm for solving this iis called FFT). Notice that there are two points on the resulting frequency domain. The points direction should be interpreted as the direction the frequency is extending in, and its magnitude is frequency of that signal.
#crop out the boarder
freq_section = fft_tag[130:340,130:340,:]
freq_section = np.sum(freq_section, axis = 2)
freq_section = freq_section - np.mean(freq_section)
frequencies = np.fft.fftshift(np.fft.fft2(freq_section))
#remove the 0 frequency section
(rows, cols) = frequencies.shape
frequencies[rows//2,cols//2] = 0
#zoom in
frequencies = frequencies[90:-90, 90:-90]
plt.figure(figsize=(10,10))
plt.subplot(121); plt.title("Tag"); plt.imshow(freq_section)
plt.subplot(122); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))
Observe below that as we warp the tag the frequency domain changes as well
import cv2
rows,cols = freq_section.shape
pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[0,80],[160,90],[70,250]])
M = cv2.getAffineTransform(pts1,pts2)
distorted_freq_section = cv2.warpAffine(freq_section,M,(cols,rows))
frequencies = np.fft.fftshift(np.fft.fft2(distorted_freq_section))
frequencies[rows//2,cols//2] = 0
#zoom in
frequencies = frequencies[90:-90, 90:-90]
plt.figure(figsize=(10,10))
plt.subplot(121); plt.title("Tag"); plt.imshow(distorted_freq_section)
plt.subplot(122); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))
We will now try to decompose the pose of a tag from a sample image.
fft_img = imread('test_fft.jpg')
plt.figure(figsize=(10,10))
plt.imshow(fft_img)
This is done manually here as there is nothing interesting or original in my implmentation.
print(fft_img.shape)
fft_img = np.sum(fft_img.copy(), axis = 2)
#entering corners manually as idneitifying them programatically
#isn't all that ineteresting and takes time to explain
#corners = np.array([[256, 423], [224, 487], [298,527], [324,446]])
corners = np.array([[423, 256],[487, 224],[524,290],[460,324]])
#crop around the corners
x,y,w,h = cv2.boundingRect(corners)
cropped_img = fft_img[y:y+h, x:x+w]
cropped_corners = [(i - x, j - y) for (i,j) in corners]
mask = np.zeros(cropped_img.shape)
cv2.fillConvexPoly(img = mask,
points = np.array([cropped_corners], dtype = np.int32),
color = 1)
masked_fft = cropped_img * mask
plt.figure(figsize=(5,5))
plt.title("Tag"); plt.imshow(masked_fft)
We can find the frequencies that make up this pattern using the Fourier transform. We can see below that we have run the FFT across the image set within a buffer of zeros. This helps us get a more accurate reading.
shapef = np.array(masked_fft.shape)
#buffer with 0s for better resolution
#https://ccrma.stanford.edu/~jos/mdft/Causal_Zero_Padding.html#1
size_multiplier = 10
masked_fft_zero = np.zeros(shapef * size_multiplier)
shapez = masked_fft_zero.shape
masked_fft_zero[shapez[0]//2-shapef[0]//2:1+shapez[0]//2+shapef[0]//2,
shapez[1]//2-shapef[1]//2:shapez[1]//2+shapef[1]//2] = masked_fft
frequencies = np.fft.fftshift(np.fft.fft2(masked_fft_zero))
(rows, cols) = frequencies.shape
frequencies[rows//2 - 40:rows//2 + 40,
cols//2 - 40:cols//2 + 40] = 0
frequencies = frequencies[400:-400, 400:-400]
plt.figure(figsize=(15,15))
plt.subplot(131); plt.title("Tag"); plt.imshow(masked_fft)
plt.subplot(132); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))
freq_points = [[7.03, 4.67], [2.8,-5.95]]
plt_size = (-10,10)
plt.subplot(133);
plt.title("Main Frequency Points");
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*freq_points))
We will now tranform our frequency representation, into a "period represntation". These will maintain the direction of the frequency, but be scaled by their relative period.
from numpy.linalg import norm
freq_points = [[7.03, 4.67], [2.8,-5.95]]
period_points = [[x/(norm([x,y])**2), y/(norm([x,y])**2)] for (x,y) in freq_points]
plt.figure(figsize=(10,10))
plt.subplot(121);
freq_plt_size = (-10,10)
plt.title("Frequency Points");
plt.ylim(freq_plt_size)
plt.xlim(freq_plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*freq_points))
plt.subplot(122);
period_plt_size = (-.5,.5)
plt.title("Period Points");
plt.ylim(period_plt_size)
plt.xlim(period_plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))
print(period_points)
We now have the directions and the periods of the sine waves that make up our signal. If we imagine these as two waves originating from the origin with associated periods and directions we can attempt to find the points where they constructively interefere with eachother. The first occurance of this will happen at the origin. The others will occur at points that are orthogonal to each period point.
def get_orthogonal(p):
return (-p[1],p[0])
orthogonals = [get_orthogonal(p) for p in period_points]
def plot_line(point, direction):
"""plot from a point in a direction"""
axes = plt.gca()
xs = np.array(axes.get_xlim())
slope = direction[1]/direction[0]
ys = point[1]+(slope * (xs - point[0]))
plt.plot(xs, ys, '--')
plt_size = (-.5,.5)
plt.figure(figsize=(10,10))
plt.subplot(121)
plt.title("Period Points");
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))
plt.subplot(122)
plt.title("Orothogonal");
plt.ylim(plt_size)
plt.xlim(plt_size)
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(*zip(*period_points))
for period_pt, ortho in zip(period_points, orthogonals):
plot_line(period_pt, ortho)
plot_line([0,0], ortho)
If we now get the intersections of all of these lines and plot them onto the the original frequency points
from numpy import dot
from itertools import product
def line_intersection(a1,a2,b1,b2) :
"""get the intersection of two lines defined by point and direction"""
(a1,a2,b1,b2) = [np.array(x) for x in [a1,a2,b1,b2]]
# if np.equal(a1,b1).all() or np.equal(a1,b2).all():
# return a1
# if np.equal(a2,b1).all() or np.equal(a2,b2).all():
# return a2
da = a2-a1
db = b2-b1
dp = a1-b1
dap = get_orthogonal(da)
denom = np.dot( dap, db)
num = np.dot( dap, dp )
return (num / denom.astype(float))*db + b1
interference_points = np.array([line_intersection(a, np.add(a, orthogonals[0]), b, np.add(b, orthogonals[1]))
for a,b in product([[0,0], period_points[0]], [[0,0], period_points[1]])])
fft_size,_ = masked_fft.shape
plt.figure(figsize=(10,10))
plt.subplot(121)
plt.title("Calculated Interference Points");
plt.ylim((-30,30))
plt.xlim((-30,30))
plt.gca().set_aspect('equal')
plt.gca().grid(True)
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.scatter(interference_points[:,0] * fft_size, interference_points[:,1] * fft_size, c='r')
plt.subplot(122)
plt.imshow(masked_fft)
plt.title("Calculated Interference Points Over Original Image");
plt.scatter(40 + (interference_points[:,0] * fft_size), 40 + (interference_points[:,1] * -fft_size), c='r')
As we can see, the period points and the intersections of their orthogonals have the exact same spacing as the constructive interference points on the frequency tag from the original image. This means that we have very precise points in the image, and since we know how we constructed the tag ourselves we have the correlating points in the real world. We can then use Persepective-n-Point (PnP) to find the homography, and therefore the translation and rotation between the two sets of points.
from cv2 import solvePnP
org_period = -5.
obj_points = np.array([[0, 0, 0],
[0, org_period, 0],
[org_period, 0, 0],
[org_period, org_period, 0]])
#image y coordinates are reversed
interference_points[:,0] *= -1
middle_of_tag = (np.array(corners[0]) + np.array(corners[2])) / 2
img_points = middle_of_tag + (interference_points * fft_size)
(_, rvect, tvect) = solvePnP(obj_points, img_points.astype(np.float32), CAMERA_PARAMS['mtx'], None)
fft_img = imread('test_fft.jpg')
fft_tag_pose_img = aruco.drawAxis(image = fft_img,
cameraMatrix = CAMERA_PARAMS['mtx'],
distCoeffs = CAMERA_PARAMS['dist'],
rvec = rvect,
tvec = tvect,
length = 25);
plt.figure(figsize=(15,15))
plt.imshow(fft_tag_pose_img)
This notebook has only attempted to cover the most digestable and fun sections of my project. Lots of work went into finding things like finding the appropriate windowing function for the FFT, and designing the tags. Some more lengthy but still interesting discussions can be had on things like
Thanks for reading!!