Speech recognition without actual voice?

21

October

2018

No ratings yet.

Chinese developers have created an application for a smartphone that recognizes silent speech by the movements of the user’s lips and turns commands into actions on the device, for example, it can run other applications. Unlike ordinary voice assistants, the application can be used in public places without interfering with other people, say the developers.

Almost all modern smartphones are equipped with voice assistants that recognize and execute user commands. In recent years, developers have been able to bring the level of accuracy of speech recognition algorithms to the level of typing specialists, as well as to teach assistants to maintain a dialogue by remembering the context of previous commands. However, studies show that most people do not use voice assistants in public places because they feel uncomfortable.

Yuanchun Shi and his colleagues at Tsinghua University have developed a voice assistant for smartphones that can recognize speech through lip movements, even if the user does not make sounds.

During operation, the application determines the face in the frame from the camera of the smartphone and then starts tracking the position of the 20 control points that accurately describe the shape of the lips. In addition, it determines the degree of openness of the mouth, which allows you to track the moments of the beginning and end of the team. After that, the data is transmitted to another algorithm based on a convolutional neural network, which directly deals with speech recognition by lip movements. It is worth noting that while the developers have implemented recognition not on the smartphone itself, but on an additional and quite powerful computer.

The authors of the application developed 44 commands for it, some of which relate to the entire system, for example, turning on Wi-Fi, part to specific applications, and another part allows you to interact with any application using system services, for example, to select text. At the same time, the application understands the context of commands, for example, if the system displayed a pop-up window with a message, then the user will be able to quickly respond to it.

Sources:

Sun, K., Yu, C., Shi, W., Liu, L. and Shi, Y., 2018, October. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In The 31st Annual ACM Symposium on User Interface Software and Technology (pp. 581-593). ACM.

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *