Understanding How Computers See the World

It’s easy to take vision for granted. Most people don’t really think about the process of seeing things. Vision is usually something that’s so intertwined with our experience of the world that we only really think about it when it’s gone. But in reality vision is an amazingly complex process. We’re not consciously aware of it, but the act of looking at anything will require an amazing amount of brainpower. Not to mention that the human eye is itself quite complex. In all, the simple act of looking at something such as a chair and realizing that it’s a chair is a very complex process.

This is why computational image recognition is such a difficult task. Researchers need to start from scratch, and recreate a complex biological process within a computational medium. Until fairly recently, it was often more cost effective to simply hire people to recognize images than it was to have computers do so.

Part of the reasoning behind this can be understood by thinking about what’s involved with image recognition. A chair is a comparatively easy to recognize item. It has four legs, a back, and a central platform to sit on. It’s probably one of the easiest items for a computer to recognize, but it still needs to sort through a vast number of definitions. Consider the legs as an example. For humans, it’s easy to focus on the column of the leg as the most important section. A computer doesn’t have the same cultural frame of reference. It can’t know that a leg in many ways defines what is and isn’t a chair. To it, a chair might seem to be more about the bottom of the leg. And in those cases, a wheeled chair might not be seen as a chair. Or the curve of a chair’s back might seem like the proper area to focus on. Even the material involved might get undue importance placed upon it. Computer based image recognition needs to not only be able to see individual parts of a greater whole, but understand what aspects of those objects are the most significant. It’s a difficult problem to solve. But once software is able to make those distinctions, it can open up a whole new world of possibilities.

One of the best examples can be seen with Slyce. Slyce is one of the leading companies in the field of image recognition. They’ve gone from solving the more difficult aspects of image recognition, and have now moved it into the realm of true real world application. One of the most impressive aspects of their products is how it can impact sales. It’s simple for the image recognition to look at a picture someone’s taken on their phone. It can usually not only categorize it by type, but actually determine the exact brand and nature of the item. It will then be able to load a store’s selection to allow the user to buy it. By doing so, one can capture a big market of impulse buyers. If a user sees a chair they like at a friend’s house, all it takes is a quick use of an app to buy it.