Improving Speech Intelligibility Using Ideal Binary Mask

Document Type : Original Article (s)

Authors

1 MSc Student, Department of Medical Physics and Medical Engineering, School of Medicine AND Student Research Committee, Isfahan University of Medical Sciences, Isfahan, Iran

2 Assistant Professor, Department of Medical Physics and Medical Engineering, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

Abstract

Background: The application of the ideal binary mask (IBM) for speech signal processing provides remarkable intelligibility improvements in both normal-hearing and hearing-impaired listeners. Binary mask widely applies to the time-frequency (T–F) representation of a noisy signal and eliminates units of a signal below a signal-to-noise-ratio (SNR) threshold while retains others.Methods: The factors underlying intelligibility of ideal binary-masked speech were examined and evaluated in the present study. The effects of the local SNR threshold, input SNR level, masker type, and ideal mask-estimator were examined. New estimators including weighted Euclidean and COSH were proposed in which, the human perceptual auditory masking effect and perceptual perception were incorporated.Findings: High-performance plateau for SNR thresholds ranging from −20 to 5 dB was observed. Findings could be used for hearing-aid and cochlear-implant designs.Conclusion: Intelligibility of speech was high even at −10 dB SNR for all maskers tested. Performance assessment shows that our proposed estimators can achieve more significant noise estimation as compared to the Wiener estimator.

Keywords


  1. Bregman AS. Auditory Scene Analysis: Hearing in Complex Environments. In: McAdams S, Bigand E, Editors. Thinking in Sound: The Cognitive Psychology of Human Audition. Oxford, UK: Oxford University Press; 1993. p. 10-36.
  2. Wang D, Brown GJ. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. New Jersey, NJ: Wiley; 2006.
  3. Wang D. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis. Speech Separation by Humans and Machines 2005; 181-97.
  4. Brungart DS, Chang PS, Simpson BD, Wang D. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J Acoust Soc Am 2006; 120(6): 4007-18.
  5. Cao S, Li L, Wu X. Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J Acoust Soc Am 2011; 129(4): 2227-36.
  6. Anzalone MC, Calandruccio L, Doherty KA, Carney LH. Determination of the potential benefit of time-frequency gain manipulation. Ear Hear 2006; 27(5): 480-92.
  7. Wang D, Kjems U, Pedersen MS, Boldt JB, Lunner T. Speech intelligibility in background noise with ideal binary time-frequency masking. J Acoust Soc Am 2009; 125(4): 2336-47.
  8. Li N, Loizou PC. Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction. J Acoust Soc Am 2008; 123(3): 1673-82.
  9. Hartmann W, Fosler-Lussier E. Investigations into the incorporation of the Ideal Binary Mask in ASR. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2011 May 22-27 May; Prague, Czech Republic; 2011.
  10. De Souza Siqueira Versiani T, Rodrigues GF, de Souza ACS, de Matos Moreira J, Yehia HC. Binary spectral masking for speech recognition systems. Proceedings of the 35th International Conference on Telecommunications and Signal Processing (TSP); 2012 Jul 3-4; Prague, Czech Republic; 2012.
  11. Ahmadi M, Gross VL, Sinex DG. Perceptual learning for speech in noise after application of binary time-frequency masks. J Acoust Soc Am 2013; 133(3): 1687-92.
  12. Roman N, Woodruff J. Intelligibility of reverberant noisy speech with ideal binary masking. J Acoust Soc Am 2011; 130(4): 2153-61.
  13. Rothauser EH, Chapman WD, Guttman N, Hecker MH., Nordby KS, Silbiger HR, et al. IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics 1969; 17(3): 225-46.
  14. Hu Y, Loizou PC. Subjective comparison and evaluation of speech enhancement algorithms. Speech communication 2007; 49(7): 588-601.
  15. Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. Acoustics, Speech and Signal Processing, IEEE Transactions on 1984; 32(6): 1109-21.
  16. Wolfe PJ, Godsill SJ. Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing; 2000 Jun 5-9; Istanbul, Turkey; 2000.
  17. Lotter T, Vary P. Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Applied Signal Processing 2005; 2005: 1110-26.
  18. Plourde E, Champagne B. Auditory-Based Spectral Amplitude Estimators for Speech Enhancement. Audio, Speech, and Language Processing, IEEE Transactions on 2008; 16(8): 1614-23.
  19. Rix AW, Beerends JG, Hollier MP, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing; 2001 May 7-11; Salt Lake City, UT; 2001.