Towards a Computer Vision Shader Language

Computer vision is a complex field which can be challenging for those outside the research community to apply in the real world. The problem preventing widespread adoption is the lack of a formulation which separates the understanding of a task from the knowledge of which algorithm to use as a solution. Successful abstractions in computer graphics, such as OpenGL and DirectX, hide the algorithmic details of rendering behind a powerful interface. We propose a similar abstraction for computer vision, analogous to a shader language, which provides developers with access to sophisticated vision methods without requiring specialist knowledge. Our contribution is an interface which presents developers with mechanisms to describe their vision task; the description is interpreted to select an appropriate method and provide a solution.
Many computer vision problems can be divided up into smaller sub-problems and solved by providing solutions to each sub-problem. This applies conceptually as well as algorithmically and so we base our vision shader language on this principle. We allow the user to describe vision tasks by dividing the conceptual problem into sub-tasks, then the sequence of sub-tasks is analysed to select a suitable method to apply. Our language is made up of these sub-tasks, which we term operations, and we provide a core set to span the range of computer vision problems.
Each of these operations accepts descriptions of the conditions under which they must operate. For example, if requesting detection we provide the developer with a means to describe the object to detect (e.g. a "face", mostly front-facing, possible occlusion, multiple instances) and the means to describe the image (detailed, varied illumination). From this description we can infer which face-detection algorithm will work most effectively under these conditions. Behind the scenes, each algorithm which is incorporated into our framework is evaluated with respect to how it performs under the conditions descriptions; this allows us to select the best one when the developer describes their problem.
The power of the abstraction comes from the evaluation of the operations as a sequence instead of individually; this approach allows us to establish the higher-level problem to solve and possibly select a single method capable of solving it more effectively.
Presented at the SIGGRAPH poster session in Vancouver, August 2011.
|
|||
@InProceedings{Miller:SIGGRAPH2011,
author = {Gregor Miller and Steve Oldridge and Sidney Fels},
title = {Towards a Computer Vision Shader Language},
booktitle = {Proceedings of the 38th Conference on Computer Graphics and Interactive Techniques Posters},
series = {SIGGRAPH'11},
pages = {40:1},
articleno = {40},
month = {August},
year = {2011},
publisher = {ACM},
address = {New York City, New York, U.S.A.},
isbn = {978-1-4503-0971-4},
location = {Vancouver, British Columbia, Canada},
doi = {http://doi.acm.org/10.1145/2037715.2037761},
url = {http://www.openvl.org.uk/Publications/Publication.php?id=Miller:SIGGRAPH2011}
}
|
|||