Enhancing the intelligibility of BC speech without utilizing the spectral information of AC speech

  Published: 1 October 2015
During last few decades, extensive research has been conducted on enabling computers to enhance, recognize, and understand speech. Probabilistic speech enhancements with noise modeling techniques
have gone a long way in addressing the robustness issue. Numerous advancements in automatic speech recognition have been made, resulting in the development of various speech recognition based systems. The state-of-the-art speech recognition systems often have a high recognition accuracy rate obtained through extensive training and computations. The high accuracy rate, however, substantially decreases in the presence of noise, such as non-stationary background conversations. As a consequence, robust speech processing in noisy environments continues to be a popular research topic. Toward this end, BC speech has received a lot of attention in recent years as an alternative to the normal AC speech , where the bone conduction pathways have been used to record the talker’s voices by placing a bone-conductive microphone on the talker’s head. An important advantage of BC speech is its less susceptibility to ambient noise. Additionally, since the bone conductive microphone does not cover the ears, it allows the the users to maintain a higher level of alertness to surrounding auditory signals. This is particularly significant in military, rescue and security applications cite{cat}. Recently, a Microsoft research group proposed a hardware device that combines regular air conductive microphone with bone conductive microphone . It is reported that they succeeded to eliminate more than 90% of the background speech. Utilization of BC speech has also been shown to yield very accurate pitch contours in highly noisy environments.
Although the SNR (signal to noise ratio) with bone conduction is enhanced, intelligibility of BC speech is compromised. Sounds at lower frequencies are impeded less during bone conduction transmission than sounds at higher frequenciescite{head}. Therefore, speech captured through bone conduction may sound unintelligible in comparison to that obtained via air conduction. This leads many researchers to concentrate on improving the quality of BC speech by incorporating the higher frequency components derived from the AC speech. Since AC speech in severe noisy conditions like railway stations, battle field, rescue ground is itself affected, utilization of features derived from the AC speech may in turn introduce noise in BC speech which brings back the original problem. A codebook based mapping approach has also been proposed for improving the quality of BC speech. Since the mapping is done by learning the paired units, this technique is rather more suitable for limited number of words.
In this study, we propose a technique for enhancing the intelligibility of BC speech without utilizing the spectral information of AC speech. As mentioned, significant spectral energy for BC speech is observed at the lower frequency regions which is followed by a spectral roll-off at the higher frequency regions. The proposed technique emphasizes the mid and higher frequency components by spectral enhancement including adjustments of formant amplitude/bandwidth by manipulating linear predictive AR (autoregressive) coefficients. The modified AR coefficients are then utilized to synthesize BC speech. To our knowledge, this method is the first in its kind. Experimental results on speech signals spoken by male and female speakers show significant improvement of intelligibility when compared with original BC speech.

