Computer vision technology is very productive for self-driving cars. However, a significant breakthrough in the field of unmanned vehicles safety will be achieved when smart cars start driving smart roads. In the interview, Microsoft Technical Evangelist, speaker of Artificial Intelligence Conference 2018 Dmitry Soshnikov talked about other application areas of computer vision, whether the technology would be able to model analogues of animal sight, and what to expect from the technology in the nearest future.

Interviewer: AI Conference (AI)
Respondent: Dmitry Soshnikov (D.S.)

AI: Uber self-driving car has recently killed a woman in Arizona. In your opinion, why did not artificial intelligence manage to avoid the crash and how will the incident influence the future of unmanned vehicles?

D.S.: That was a very unpleasant incident. It will undoubtedly influence the speed of implementing unmanned vehicles in our everyday life, facilitating a more thorough work on safety issues from technical as well as legal perspectives. Professionals should judge why it happened and they still have to investigate the case. Olga Uskova, founder of Cognitive Technologies, posted some considerations on her Facebook page.

I do not think that the incident will lead to the end of the developments in the field of unmanned vehicles, as they are economically viable. Besides, there are reasons to think that the technology will drive safer than a human will, as even now it handles image recognition and speech recognition no worse.


AI: This road accident has touched upon a philosophical problem, to an extent. If an accident is inescapable, whom should the vehicle protect, a driver or a pedestrian? How can AI developers solve such moral dilemmas?

D.S.: In fact, a whole community of multidisciplinary specialists should solve such problems rather than developers. In this case, the problem has a philosophic or moral nature rather than technical: how to make a choice in a definite situation. For instance, there are attempts to delegate the task of making such a moral choice to the humankind in general by collecting the averaged opinion using crowdsourcing. That is exactly what researchers of MIT are doing at the Moral Machine platform. This website offers everyone to make a decision in several specific situations related to such a moral choice. Basing on offered solutions, one will be able to draft standards of behavior. That is yet another systemic attempt to draft rules after Asimov’s Three Laws of Robotics.

Certainly, the problem is also juridical, as it is necessary to define who bears responsibility for property or health damage. Some work in this direction is supposed to be performed as part of the National Technological Initiative’s AutoNet direction.

From the standpoint of technologies, I believe that a significant breakthrough in the field of safety will be achieved when smart cars start driving smart roads, coordinating actions with each other, i.e. when the infrastructure moves to a new level and becomes intelligent. In this case, only the irrational behavior of people will cause accidents.


AI: Computer vision is widely used for face recognition. The technology already allows detecting the leakage of roof insulation or conducting aerial surveys. Can computer vision model analogues of animal sight, for instance, and create an application that would allow seeing the world through the eyes of other creatures? What is needed for that?

D.S.: To some extent, such applications already exist, based on researches of biologists that study visual mechanisms of animals. These researches turn out to be extremely useful for the tasks of artificial intelligence. For instance, computer vision techniques rely on the understanding of principles of image perception common for animals. It turns out that the image does not appear at once in our head, but undergoes some hierarchical stages of analysis first, starting from the simplest filters that detect some traits or brightness jumps of the image, and gradually the whole picture is assembled from this component parts. Convolutional neural networks work analogously, commonly used to analyze images.

Also worth noting is that we can see something similar to the image seen by a fly, for example, but we cannot imagine how the fly perceives the image, how it is processed by its brain. The task of ‘feeling yourself as if you were a fly’ has not been solved yet, as we know too little about the brain.


AI: Which fields see the most efficient use of computer vision technologies? Where do you see the prospects of the most serious breakthroughs from the standpoint of implementation?

D.S.: Currently, we can distinguish several fields where computer vision turns out to be very efficient:

♦ Self-driving cars, which we have already mentioned, or other transportation means. For instance, the latest models of DJI quadcopters can follow a person, bypass obstacles, and recognize faces and gestures. That is almost an unmanned robot, which shoots instead of you.

♦ Marketing. Image processing allows you to digitize customer behavior (thanks to face/emotion analysis) as well as availability of goods on the shelves and their flow. Systems of customer relationship analysis also fall into this category: for instance, startup heedbook tracks emotions and the dialogue between a customer and a bank employee, and forms statistics that shows the level of contentment, deviation from the dialogue scenario, etc.

♦ Image cataloguing and statistical analysis of images. As the growing share of content on social media is offered in the form of images or video, efficient analysis requires shifting such content to more convenient character representation. Computer vision systems excellently handle this task.

♦ Security. Face recognition allows tracking movements of a person or restrict access to a facility using a photo.

♦ Manufacturing, when you need to identify the motion of goods along the assembly line, count the quantity of goods, evaluate quality.

The breakthrough will most probably lie in the massive implementation of such technologies in practice. In my presentation, I will show that it is very simple to start the implementation process, and the economic effect can be huge in some cases. By integrating even simple technologies at the technological level that is easily accessible today, you can achieve a breakthrough and a digital transformation.


AI: At Artificial Intelligence Conference 2018, you will speak about the solution of computer vision tasks based on Microsoft technologies. Which products of the company already use the technology?

D.S.: Microsoft’s core focus area is the democratization of AI technologies, delivery of services that partners can use to develop their own solutions. I have already described some of these solutions above, and will tell about the rest in my report.

Of course, we also massively use AI technologies in our products. Not everyone knows it, but when you insert an image in PowerPoint, for instance, a signature is generated automatically, describing the image. Windows Hello technology allows safely accessing your computer using facial recognition. These are just the first technologies that I thought of…


AI: Which tasks and challenges developers and researchers of computer vision systems will have to solve and what should we expect from the technology in the next two-three-five years?

D.S.: The main thing that restricts AI today is the incapacity to correlate the thinking processes (based on explicit notions) and low-level computational processes running in neural networks. For instance, using computations, a neural network can learn to define the probability of the presence of a bicycle in the image, but it cannot build up knowledge of higher level. A driver forms his driving skills partially by practicing (by trial and error), and partially by learning at the driver training center through examples, rules. For instance, a man knows that on a hot day the road cannot be wet and will conclude that he sees a mirage rather than a wet surface. All pieces of knowledge get together in the head and lead to efficient solution of the problem. Whereas a neural network can only learn through examples, and it is impossible to interfere and influence its behavior.

I believe that the interoperation between the explicit knowledge representation and machine learning will be very important in the next years. In some way, it is done today, although not very efficiently.


You can learn more about the solution of computer vision tasks based on Microsoft technologies as well as ask Dmitry Soshnikov additional questions at Artificial Intelligence Conference 2018 on April 19.

Register ►►►