Hello potential employers,
The following is a breif introduction to one of my current side projects which I call FFT Pose Tags. The goal of this notebook is to explain how my implementation works.
Feducial pose tags are images that are used to help cameras identify their position and orientation. They are often used in robotics to help identify world location of a moving robot, and in augmented relity applications to project images onto a object. A popular library for producing and finding these tags is ArUco. Functions for identifying and finding the poses are integrated in the popular OpenCV library.
from imageio import imread
import matplotlib.pyplot as plt
aruco_tag_img = imread('./aruco_tag.jpg')
plt.figure(figsize=(10,10))
plt.imshow(aruco_tag_img)
ArUco finds potential marker by looking for quadrilaterals, verifies the marker using the internal pattern, and identify each of the corner locations.
from cv2 import aruco
#the aruco dictionary defines
ARUCO_DICT = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
(aruco_marker_corners,_,_) = aruco.detectMarkers(aruco_tag_img, ARUCO_DICT)
marked_aruco_img = aruco.drawDetectedMarkers(aruco_tag_img, aruco_marker_corners, borderColor = (255,0,0))
plt.figure(figsize=(10,10))
plt.imshow(marked_aruco_img)
import numpy as np
import pickle
#load the camera parameters which define how pixels translate into directions out of the camera
CAMERA_PARAMS = pickle.load(open('camera_params.pickle', 'rb'))
aruco_pose = aruco.estimatePoseSingleMarkers(corners = aruco_marker_corners,
markerLength = 2,
cameraMatrix = CAMERA_PARAMS['mtx'],
distCoeffs = CAMERA_PARAMS['dist'])
aruco_pose_img = aruco.drawAxis(image = aruco_tag_img,
cameraMatrix = CAMERA_PARAMS['mtx'],
distCoeffs = CAMERA_PARAMS['dist'],
rvec = aruco_pose[0],
tvec = aruco_pose[1],
length = 2.);
plt.figure(figsize=(10,10))
plt.imshow(aruco_pose_img)
I have designed my own tags that attempt to alleviate some of the shortcomeings of ArUco and other tags. The idea is to track the changes in frequency that occur as you distort an image by applying affine transformations. This allows us to use the entire image to track the pose which will hopefully help the tags become robust to occlusion, and let us leverage some Fourie transform tecniques to identify the pose with high accuracy.
My tag has a few features. The black square boarder is designed so that we can use existing libraries to identify potential locations tags. The blue and red tags on the side are used to easily identify if a potential quadrilateral is a tag. The largest area in the center is a sinusoidal signal extending vertically combined with a sinusoid extending horizontally. This section will be used to decompose the orientation based on how the frequency changes as we rotate the tag.
fft_tag = imread('fft_tag.png')
plt.imshow(fft_tag)
Fourier transforms convert a signal consisting of sinusoids into their a frequency domain. Below you can see the undistorted tag's 2D discreet Fourier transform (an fast implementation an algorithm for solving this iis called FFT). Notice that there are two points on the resulting frequency domain. The points direction should be interpreted as the direction the frequency is extending in, and its magnitude is frequency of that signal.
#crop out the boarder
freq_section = fft_tag[130:340,130:340,:]
freq_section = np.sum(freq_section, axis = 2)
freq_section = freq_section - np.mean(freq_section)
frequencies = np.fft.fftshift(np.fft.fft2(freq_section))
#remove the 0 frequency section
(rows, cols) = frequencies.shape
frequencies[rows//2,cols//2] = 0
#zoom in
frequencies = frequencies[90:-90, 90:-90]
plt.figure(figsize=(10,10))
plt.subplot(121); plt.title("Tag"); plt.imshow(freq_section)
plt.subplot(122); plt.title("Frequencies"); plt.imshow(abs(frequencies.real))