The general environment shows that the most widely used applications of intelligent voice technology are smart single products (smart speakers, robots) and smart homes, and voice recognition technology is the core technology of intelligent voice technology. But what needs to be mentioned is that, entering the development period of intelligent speech technology, it is time for intelligent speech technology to have some relatively novel landing scenes.

From the text age to the image age, to the ubiquitous voice age today, the popularity and explosion of intelligent voice technology continues to refresh people’s lifestyles. The emergence of Amazon Echo is the most distinctive milestone.

The general environment shows that the most widely used applications of intelligent voice technology are smart single products (smart speakers, robots) and smart homes, and voice recognition technology is the core technology of intelligent voice technology. But what needs to be mentioned is that, entering the development period of intelligent speech technology, it is time for intelligent speech technology to have some relatively novel landing scenes.

Based on this background, this article will analyze the application of speech recognition technology in the security industry.

The “job selection” and “framing” of voice recognition, the security industry should be an excellent entry point

As artificial intelligence technology empowers major industries, many companies have also shifted their strategies to “AI+”. Based on the broad application prospects of the security industry, “AI + security” soon became the mainstream melody of the market. As a major branch of artificial intelligence technology, intelligent voice technology naturally also needs to “choose jobs” and “framing” in the security industry. The first thing to do is to bear the brunt of voice recognition technology.

Learn to understand each other with machines, that is, human-machine interaction has always been the core of intelligence in the security industry. As the core technology of human-computer interaction, voice recognition technology also has many footholds in the security industry, and is mainly applied to security robots represented by intelligent inspection robots.

Similar to other service robots that can speak, the security robot accepts external sounds through a built-in microphone, and recognizes and understands the human voice. Once it understands that there is a similar dangerous behavior behind the “human voice”, it will automatically trigger the alarm system to enter the defense Status, which plays a role of safety protection for the target person.

In addition to security robots, voice recognition technology also plays a key role in the smart hotel scene of the security industry. In the future hotel recently opened by Ali, although face recognition is its main technology, intelligent robots throughout the entire hotel service process are also indispensable key figures. In the Ali Future Hotel, the robot acts as the front desk of the hotel, guiding the guests in the whole process, and in the hotel room, the guests can also communicate with the Tmall elves to improve their own housing experience. In the check-in process of the future hotel, whether it is a robot acting as a front desk or a waiter Tmall elves, it will complete human-computer interaction through voice recognition technology, so that through this full-stack voice interaction system built with voice recognition technology, anytime Create smart and interconnected scenes anywhere.

Of course, the application of voice recognition technology in the security industry has also been involved in multiple smart scenarios such as smart finance and smart education.

Or, intelligent voice technology can be used as a “good helper” for “face recognition”

The main application of the security industry in video surveillance with face recognition technology as the core, we don’t need to talk more about this, but in the future, can we also be able to use intelligent voice technology to assist face recognition, making video surveillance more intelligent change.

The market is talking about voice recognition technology, but few companies pay attention to it. Voiceprint recognition and voice emotion recognition also belong to intelligent voice technology.

Voiceprint recognition is also called speaker recognition, which converts acoustic signals into electrical signals and then uses computers for recognition. It can be divided into speaker identification and speaker confirmation. In different scenarios, the choice of voiceprint recognition technology is different. For example, recognition technology may be required when reducing the scope of criminal investigation, while confirmation technology is required for bank transactions.

Speech emotion recognition is one of the ways of emotion recognition, which means that the computer automatically recognizes the emotional state of the input speech. The computer uses sensors to measure and analyze the structural characteristics and distribution laws of the language signals of different tonal expressions in terms of time structure, amplitude structure, fundamental frequency structure and formant structure, so as to identify all the implicit tones of the language Emotional content.

Although the recognition rate of current face recognition technology is as high as 99% or even 99.9%, the remaining 1% or even 0.1% is a difficult problem that current technology cannot overcome. Imagine if you add voiceprint recognition and voice emotion recognition technology to the current video surveillance system with face recognition, the formation of audio and image fusion technology (lip reading), even if the target audience is silent, they can still think and act For prediction and identification, whether the current video surveillance system will also be upgraded to a new level of intelligence and truly “prevent trouble before it happens.”

It is undeniable that the multi-modal interactive system formed by face recognition, voiceprint recognition, and language emotion recognition should be able to open many new applications in the security industry, such as scene analysis and event detection. In the new round of AI industry reform, multi-modal technology will also become the key to victory.

However, there are still difficulties to be solved for smart voice to “framing” the security industry

“No voice and no security” sounds like a good vision. But unfortunately, as far as it seems, there are still many difficulties that need to be solved for smart voice to “framing” the security industry.

It is widely believed that there are still four “hows” to be addressed in the application layout of artificial intelligence in the security industry-how to create scenario-based AI applications to meet user needs? How to build an industry intelligence system to solve practical problems in the industry? How to improve the infrastructure, industry standards and security protection mechanisms? How to build a new smart industry ecology that is mutually beneficial and win-win? And these four, when it comes to the application of intelligent voice technology in the security industry, there is no sense of contradiction.

Far-field speech recognition should be the most critical core technology for intelligent speech in speech recognition in the security industry. However, this technology still has three major technical bottlenecks, echo, noise, and reverberation. The most intuitive example is that security robots are in public When the area was performing security work, it was unable to separate the target voice due to too many voice signals received, so that it could not be recognized normally.

Another example is the language emotion recognition technology mentioned above. In fact, it is much more difficult to characterize emotions in speech than facial expressions, because facial expression signals convey personal characteristics and expressions, but do not convey language information, while speech signals are mixed information, including speaker characteristics, emotions, and The vocabulary and grammar emphasized in the speech content require much more training and learning data than face recognition.

In addition to the technical problems of far-field speech recognition and language emotion recognition, intelligent speech technology itself has many problems that have not been broken through, including accent, target speaker separation, multilingual mixing, efficient migration and data iteration, industry standards and attack defense And so on, it is not only in the security industry. At this stage, the application of AI smart voice in various industries seems to be more appropriate to describe it as “artificial mental retardation”.

summary:

The industry widely believes that AI is not to show off the rankings, but to really promote technological innovation and solve industrial problems. Today, when artificial intelligence technology enters large-scale applications, it is even more necessary to properly correct the relationship between “career selection” and “viewfinder”, and to walk out of differentiation from homogenization.

How to break the technological bottleneck in the era of artificial intelligence and how to empower all walks of life, Yunzhisheng Chairman/CTO Liang Jiayan’s four solutions may be more rational thinking: solving the problems of deep learning in the industrial scale application, solving non-big data, The problem of end-to-end and sequence mapping effectively combines data and knowledge to form an efficient iterative closed loop and fundamentally improve the cognition and learning ability of the machine.

In 2019, artificial intelligence technology has gradually returned to rationality, and more and more problems have begun to emerge. But for the industry, it is the worst and the best.

The general environment shows that the most widely used applications of intelligent voice technology are smart single products (smart speakers, robots) and smart homes, and voice recognition technology is the core technology of intelligent voice technology. But what needs to be mentioned is that, entering the development period of intelligent speech technology, it is time for intelligent speech technology to have some relatively novel landing scenes.

From the text age to the image age, to the ubiquitous voice age today, the popularity and explosion of intelligent voice technology continues to refresh people’s lifestyles. The emergence of Amazon Echo is the most distinctive milestone.

The general environment shows that the most widely used applications of intelligent voice technology are smart single products (smart speakers, robots) and smart homes, and voice recognition technology is the core technology of intelligent voice technology. But what needs to be mentioned is that, entering the development period of intelligent speech technology, it is time for intelligent speech technology to have some relatively novel landing scenes.

Based on this background, this article will analyze the application of speech recognition technology in the security industry.

The “job selection” and “framing” of voice recognition, the security industry should be an excellent entry point

As artificial intelligence technology empowers major industries, many companies have also shifted their strategies to “AI+”. Based on the broad application prospects of the security industry, “AI + security” soon became the mainstream melody of the market. As a major branch of artificial intelligence technology, intelligent voice technology naturally also needs to “choose jobs” and “framing” in the security industry. The first thing to do is to bear the brunt of voice recognition technology.

Learn to understand each other with machines, that is, human-machine interaction has always been the core of intelligence in the security industry. As the core technology of human-computer interaction, voice recognition technology also has many footholds in the security industry, and is mainly applied to security robots represented by intelligent inspection robots.

Similar to other service robots that can speak, the security robot accepts external sounds through a built-in microphone, and recognizes and understands the human voice. Once it understands that there is a similar dangerous behavior behind the “human voice”, it will automatically trigger the alarm system to enter the defense Status, which plays a role of safety protection for the target person.

In addition to security robots, voice recognition technology also plays a key role in the smart hotel scene of the security industry. In the future hotel recently opened by Ali, although face recognition is its main technology, intelligent robots throughout the entire hotel service process are also indispensable key figures. In the Ali Future Hotel, the robot acts as the front desk of the hotel, guiding the guests in the whole process, and in the hotel room, the guests can also communicate with the Tmall elves to improve their own housing experience. In the check-in process of the future hotel, whether it is a robot acting as a front desk or a waiter Tmall elves, it will complete human-computer interaction through voice recognition technology, so that through this full-stack voice interaction system built with voice recognition technology, anytime Create smart and interconnected scenes anywhere.

Of course, the application of voice recognition technology in the security industry has also been involved in multiple smart scenarios such as smart finance and smart education.

Or, intelligent voice technology can be used as a “good helper” for “face recognition”

The main application of the security industry in video surveillance with face recognition technology as the core, we don’t need to talk more about this, but in the future, can we also be able to use intelligent voice technology to assist face recognition, making video surveillance more intelligent change.

The market is talking about voice recognition technology, but few companies pay attention to it. Voiceprint recognition and voice emotion recognition also belong to intelligent voice technology.

Voiceprint recognition is also called speaker recognition, which converts acoustic signals into electrical signals and then uses computers for recognition. It can be divided into speaker identification and speaker confirmation. In different scenarios, the choice of voiceprint recognition technology is different. For example, recognition technology may be required when reducing the scope of criminal investigation, while confirmation technology is required for bank transactions.

Speech emotion recognition is one of the ways of emotion recognition, which means that the computer automatically recognizes the emotional state of the input speech. The computer uses sensors to measure and analyze the structural characteristics and distribution laws of the language signals of different tonal expressions in terms of time structure, amplitude structure, fundamental frequency structure and formant structure, so as to identify all the implicit tones of the language Emotional content.

Although the recognition rate of current face recognition technology is as high as 99% or even 99.9%, the remaining 1% or even 0.1% is a difficult problem that current technology cannot overcome. Imagine if you add voiceprint recognition and voice emotion recognition technology to the current video surveillance system with face recognition, the formation of audio and image fusion technology (lip reading), even if the target audience is silent, they can still think and act For prediction and identification, whether the current video surveillance system will also be upgraded to a new level of intelligence and truly “prevent trouble before it happens.”

It is undeniable that the multi-modal interactive system formed by face recognition, voiceprint recognition, and language emotion recognition should be able to open many new applications in the security industry, such as scene analysis and event detection. In the new round of AI industry reform, multi-modal technology will also become the key to victory.

However, there are still difficulties to be solved for smart voice to “framing” the security industry

“No voice and no security” sounds like a good vision. But unfortunately, as far as it seems, there are still many difficulties that need to be solved for smart voice to “framing” the security industry.

It is widely believed that there are still four “hows” to be addressed in the application layout of artificial intelligence in the security industry-how to create scenario-based AI applications to meet user needs? How to build an industry intelligence system to solve practical problems in the industry? How to improve the infrastructure, industry standards and security protection mechanisms? How to build a new smart industry ecology that is mutually beneficial and win-win? And these four, when it comes to the application of intelligent voice technology in the security industry, there is no sense of contradiction.

Far-field speech recognition should be the most critical core technology for intelligent speech in speech recognition in the security industry. However, this technology still has three major technical bottlenecks, echo, noise, and reverberation. The most intuitive example is that security robots are in public When the area was performing security work, it was unable to separate the target voice due to too many voice signals received, so that it could not be recognized normally.

Another example is the language emotion recognition technology mentioned above. In fact, it is much more difficult to characterize emotions in speech than facial expressions, because facial expression signals convey personal characteristics and expressions, but do not convey language information, while speech signals are mixed information, including speaker characteristics, emotions, and The vocabulary and grammar emphasized in the speech content require much more training and learning data than face recognition.

In addition to the technical problems of far-field speech recognition and language emotion recognition, intelligent speech technology itself has many problems that have not been broken through, including accent, target speaker separation, multilingual mixing, efficient migration and data iteration, industry standards and attack defense And so on, it is not only in the security industry. At this stage, the application of AI smart voice in various industries seems to be more appropriate to describe it as “artificial mental retardation”.

summary:

The industry widely believes that AI is not to show off the rankings, but to really promote technological innovation and solve industrial problems. Today, when artificial intelligence technology enters large-scale applications, it is even more necessary to properly correct the relationship between “career selection” and “viewfinder”, and to walk out of differentiation from homogenization.

How to break the technological bottleneck in the era of artificial intelligence and how to empower all walks of life, Yunzhisheng Chairman/CTO Liang Jiayan’s four solutions may be more rational thinking: solving the problems of deep learning in the industrial scale application, solving non-big data, The problem of end-to-end and sequence mapping effectively combines data and knowledge to form an efficient iterative closed loop and fundamentally improve the cognition and learning ability of the machine.

In 2019, artificial intelligence technology has gradually returned to rationality, and more and more problems have begun to emerge. But for the industry, it is the worst and the best.

The Links:   MG75J2YS50 SX14Q006-C1