Welcome to the first post in a four-part series focusing on beamforming microphones and use cases for lobe aiming.
Beamforming microphones combine an array of microphone elements with intelligent DSP to enable definable polar patterns. When the intelligent DSP includes beam tracking and/or multiple beams, these beamforming microphone arrays are excellent for capturing simultaneous talkers accurately, such as in an audio or video conferencing scenario. The beamforming aspect of these array microphones is that the “beams” (polar patterns) of the mics can be controlled and shaped through DSP, allowing the beams to be “aimed” at the appropriate areas in the room.
As with all microphones though, the room acoustics and ambient noise have a direct impact on the audio quality. If speech originating in the room is unintelligible to listeners in the same room, no microphone can successfully overcome this acoustical handicap. In this technical note, we’ll explore some of the design considerations and trade-offs for lobe design in different use cases involving beamforming microphones. First, let’s recap the primary goal of any conferencing system: speech intelligibility.
Speech Intelligibility (Does the Room Sound Good?)
A conference room either sounds good or it doesn’t – there’s very little gray area – but that’s typically a subjective assessment by the listener. However, if you and others are regularly straining to understand the far end of calls, then the speech intelligibility in the room is low.
Speech intelligibility is the most important consideration for conferencing applications; if the listener(s) can’t understand the talker, the tele- or videoconference becomes ineffective. Proper lobe aiming plays a significant role in speech intelligibility, but other variables can include:
- The talker’s speech level (average is 65-67 dba at 3 feet [1 meter])
- Frequency response of the transducer
- Background noise level (room noise floor)
- Quality of the sound reproduction equipment
- Echoes (reflections with delay > 100ms)
- Reverberation time (RT60)1
- Psychoacoustic (masking) effects
Quality of speech transmission is measured via the Speech Transmission Index (STI). STI predicts the likelihood of syllables, words, and sentences being successfully comprehended by the listener. STI is a numeric representation whose value varies from 0 (bad) to 1 (excellent), and an STI of at least .5 is desirable for most applications.
1 Internal testing indicates that room architecture is the most significant factor affecting speech intelligibility, but your results may vary.
|STI value||Quality according to IEC 60268-16||Intelligibility of syllables in %||Intelligibility of words in %||Intelligibility of sentences in %|
|0 – 0.3||bad||0 – 34||0 – 67||0 – 89|
|0.3 – 0.45||poor||34 – 48||67 – 78||89 – 92|
|0.45 – 0.6||fair||48 – 67||78 – 87||92 – 95|
|0.6 – 0.75||good||67 – 90||87 – 94||95 – 96|
|0.75 – 1||excellent||90 – 96||94 – 96||96 – 100|
Figure 1: STI measurements for native speakers
Currently, there aren’t many scenarios where you’ll find yourself measuring STI in a conference room. However, since effective communication is the purpose of a conference room, awareness and measurement of STI can help the designer/integrator provide an optimal system design. In some cases, STI measurements may provide the necessary data to allow a designer to advise against the use of poor acoustic environments for conferencing (or at least indicate the need for acoustical treatments).
Stay tuned to Component for part two of this series, which will focus on microphone polar patterns.