Two-Dimensional Spatialization of Sound 1. Background In 1997, a system was designed at CRC to study the use of directional sound as a situational awareness enhancement for dismounted soldiers. Four wearable sets were constructed for outdoor field trials, each set containing a laptop computer, differential GPS, wireless LAN transceiver, headphones and head-mounted compass. Using the GPS position and head bearing data, the bearing of the other three sets relative to the wearers head could be calculated. Voice communication (via the wireless LAN) between the four dispersed individuals was spatialized on the horizontal plane so that the sound arriving at the ears of each listener would appear to come from the actual direction of the speaker. Ambient sound was received by artificial ears mounted to the headphones so that it could be presented to the listener without loss of directional cues, while protecting the soldier's hearing. 2. Description The 2D sound tool is an ANSI C library of functions which are called by a client program. The main functions are: - soundscapeUpdate(), which specifies the coordinates of the sound source and listener and the positions of the walls of the room which they occupy, and - spatialize(), which given a vector of monaural speech samples produces a vector of binaural spatialized speech samples based on the previously supplied coordinates. The soundscape uses Head Related Transfer Functions (HRTF) generated using the KEMAR head model[3]. These have been calculated in steps of 10 degrees of arc for elevations from -40 degrees to plus 90 degrees. Only the data for 0 degrees elevation has been used. The HRTF data was originaly created by Bill Gardner and Keith Martin at the MIT Media Lab. Differential HRTFs (DHRTF) are generated in which the shadowed ear response (source on opposite side of the head from ear) is inverse filtered with the unshadowed ear response (source on same side of the head as ear). The unshadowed ear then receives the unmodified sound, whereas the shadowed ear receives the sound modified using the DHRTF. There were several reasons for doing this: - the impact on intelligibility is minimized, - the effect of microphone placement in the ear canals is eliminated, - delays common to both right and left ear HRTFs are removed, and - the computational load is substantially decreased. The KEMAR HRTFs are known to be deficient with regard to front-back reversals, in which a sound placed behind the listener will be perceived to be in front of the listener, and vice-versa. To overcome this, a set of "common mode" cues is superimposed on the DHRTFs. These cues exploit the observation that for wavelengths shorter than the diameter of the head, the sound shadow increases gradually with frequency [2]. The shadow varies sinusoidally with respect to azimuth and has a maximim value of 10 dB at the Nyquist frequency ([1] p. 62) at an azimuth of 225 degrees on the shadowed side. The spatialized sound thus produced lacks the sensation of coming from "outside the head". To overcome this, the first order reverberations of the sound from the walls of the room are calculated using two dimensional ray tracing ([1] p. 184) with some assumptions about sound absorption by the walls. The four reflected sound sources are treated as separate sound rays and the right and left ear response is calculated as above, for each ray. Judicious choice of room dimensions and listener location also helps to reduce front-back reversals. The range of the sound and its reverberations is also taken into account. Sound intensity is in general proportional to the inverse square of range, however this implies that a sound source placed very close to the head could be very loud. Instead, the sound level is normalized. The direct sound ray is not scaled, but the reflected rays are attenuated proportional to the square of the ratio of the range of the direct ray to the range of the reflected ray. (This means that the level of reflected rays is greatest when the sound source is placed at a distance from the head.) When soundscapeUpdate() is called, the differential and common mode cues for each of the five rays are convolved with the filter for the right and left ear to create two filters representing the composite response for each ear. Finally, the right and left ear filters are "pruned" to remove taps with very small weights, to reduce computational load. When spatialize() is called, the input monaural sound is passed through these filters to produce a binaural sound containing spatialization cues. 3. References [1] D. R. Begault, "3-D Sound for Virtual Reality and Multimedia", Academic Press, 1994. [2] J. M. Loomis, C. Hebert, J. G. Cicinelli, "Active Localization of Virtual Sounds", J. Acoust. Soc. Am. Vol 88 No 4, October 1990. [3]Gardner W.G. and Martin K.D., "HRTF Measurements of a KEMAR", J. Acoust. Soc. Am., 97(6), pp 3907-3908