Revisiting Autofocus for Smartphone Cameras

Abdullah Abuolaim, Abhijith Punnappurath, and Michael S. Brown

Department of Electrical Engineering and Computer Science

Lassonde School of Engineering, York University, Canada

{abuolaim, pabhijith, mbrown}@eecs.yorku.ca

AF 4D dataset description

Based on our observations, we settled on 10 representative scenes that are categorized into three types: (1) scenes containing no face (NF), (2) scenes with a face in the foreground (FF), and (3) scenes with faces in the background (FB). For each of these scenes, we allowed different arrangements in terms of textured backgrounds, whether the camera moves, and how many types of objects in the scene change their directions (referred to as motion switches). Table below summarizes this information.

The 10 scenes/image sequences in our AF dataset. The final table row, discrete time points, denotes the number of full focal stacks per captured temporal image sequence.

AF 4D dataset

Below are the 10 scenes in our AF 4D dataset. By clicking on the scene's number or icon, you will be directed to another link where you can download the .rar folder.

[Scene1] [Scene2] [Scene3] [Scene4] [Scene5]

[Scene6] [Scene7] [Scene8] [Scene9] [Scene10]

Once you unzip the .rar folder, you will get a folder with 50 x time_points JPEG images that are following the name convension: IMAGE_xx_yy.jpg, where xx is the zero padded time_point number and yy is the zero padded focus distance in units of diopters (1/meter).

REMEMBER:
We captured a focal stack of 50 images where the camera lens is moved in linear steps from its minimum to maximum position (00.19795460–10.00000000). For Android devices, APPROXIMATE and CALIBRATED devices report the focus metadata in units of diopters (1/meter), so 0.0f represents focusing at infinity, and increasing positive numbers represent focusing closer and closer to the camera device. The focus distance control also uses diopters on these devices. Click here for more details.

Prepare the data for our API and data browser

In order to run our API and data browser, you need to convert scene's images into a huge 4D Numpy array of size (t x s) x h x w x c.
t: Number of time points.
s: Focal stack size (s = 50).
h: Image hight.
w: Image width.
c: RGB channels (c = 3).

The following Python code "sceneToArray.py" reads each scene images and convert it into a 4D Numpy array. It also downscales the images by setting w to 1500 (preserves the aspect ratio). What you need first is to download the code with the corresponding focus distance Numpy array "focusDisArr.npy". Second, you need to create a new directory named "Data", then you add all the unzipped scenes to it (Scene1–10). You need to keep "sceneToArray.py", "focusDisArr.npy", and "Data" in the same directory (same path), since "sceneToArray.py" has relative paths to the folders/files.

[sceneToArray.py] [focusDisArr.npy]