<aside> ❗ Code has been released in public! 👉 https://github.com/jin-woo-lee/nfs-binaural
</aside>
https://github.com/jin-woo-lee/nfs-binaural
List of Contents
Authors
Abstract
Related works
Proposed method
This work mainly focuses on modeling binaural speech using a neural network, focusing on time delay and energy reduction.
While propagating through the air, sound has a delay in its arrival, and the energy is altered mainly due to attenuation and absorption.
We introduce a novel network that models the delay and the energy reduction of the binaural speech in the Fourier space.
Visualization of the proposed idea
The proposed system for binaural speech rendering
Conclusion
We present NFS, a lightweight neural network for binaural speech rendering.
Building upon the foundations of the geometric time delay, NFS predicts frame-wise frequency response and phase lags for multiple early reflection paths in the Fourier space.
Defined in the Fourier domain, NFS is highly efficient and operates independently of the source domain by its design.
Experimental results show that NFS outperforms the previous studies on the benchmark dataset, even with its lighter memory and fewer computations.
NFS is interpretable in that it explicitly displays the frequency response and phase delays of each acoustic path, for each source position.
We expect improving NFS using generic binaural audio datasets to generalize to arbitrary domains as our future works.
Ground Truth (binaural)
Ours (NFS)
BinauralGrad
WarpNet
<aside> ☝ Please be sure to listen to the samples with EARPHONES! Listening with headphones or speakers does NOT accurately reflect the binauralization rendered in the audio.
</aside>
We can also inspect what NFS is doing to render the binaural speech. These videos visualize the magnitude response and delay for each framewise impulse responses.