Online Lens Motion Smoothing for Video Autofocus

Abdullah Abuolaim and Michael S. Brown

Department of Electrical Engineering and Computer Science

Lassonde School of Engineering, York University, Canada

{abuolaim, mbrown}@eecs.yorku.ca

Abstract

Autofocus (AF) is the process of moving the camera’s lens such that desired scene content is in focus. AF for single image capture is a well-studied research topic and most modern cameras have hardware support that allows quick lens movements to optimize image sharpness. How to best perform AF for video is less clear. Conventional wisdom would suggest that each temporal frame should be as sharp as possible. However, unlike single image capture, the effect of the lens movement is visible in the captured video. As a result, there are two parameters to consider in AF for video: sharpness and lens movement. In this paper, we show that users preferred videos with smooth lens movement, even if it results in less overall sharpness. Based on this observation, we propose two novel AF algorithms for video that strive for both smooth lens movement and sharp scene content. Specifically, we introduce (1) a bidirectional long short-term memory (BLSTM) module trained on smooth lens trajectories and (2) a simple weighted moving average (WMA) method that factors in prior lens motion. Both of these methods have demonstrated excellent results in terms of reducing lens movements (up to 64% reduction) without greatly affecting the sharpness (less than 5.2% change in sharpness). Moreover, videos produced using our methods are more preferred by users over conventional AF that aims only for maximizing sharpness.

Presentation

Lens motion effect on video frame sharpness and user preference

Fig. 1

The left-hand plot shows lens motion and perframe sharpness for a conventional AF algorithm and its corresponding smoothed lens movements. The right-hand plot shows the preferences of 32 users for six videos using conventional AF and those that have had their lens movement smoothed.

4D dataset

Fig. 2

This figure shows an example of the 4D temporal focal stack data. The highlighted frames indicate the images that would be visible in the output video. The position of the highlighted images is achieved by moving the lens.

Related projects

This page contains files that could be protected by copyright. They are provided here for reasonable academic fair use.
Copyright © 2020