top of page
비즈니스 그래프

Codes & datasets

Jung-Woo Choi and Franz Zotter

The 6DOF RIR dataset (aka 6DRIRset) includes room impulse responses (RIRs) measured by nine spherical microphone arrays (SMAs; Zylia ZM-1S) distributed in a semi-cuboid room. 6DRIRset is specialized by its massive loudspeaker positions (392 locations), which were incorporated for the 6DOF source localization task. 

Overview_fig_1.png

Dongheon LeeSeongrae KimJung-Woo Choi

https://arxiv.org/abs/2111.04312

In this work, we propose an end-to-end multichannel speech enhancement network that can handle inter-channel relationships on individual layers. In contrast to the conventional method, we build a network that can process data to maintain the various spatial information. Also, we made it possible for the network to process data from each perspective of feature and channel dimension. The proposed method outperforms the state-of-the-art multichannel variants of neural networks on the speech enhancement task, even with significantly smaller parameter sizes than the conventional methods.

convtas.png

D. Lee and J-W. Choi, "DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement," IEEE Signal Processing Letters vol.30, pp.155-159, 2023

DeFT-AN (dense frequency-time attentive network) is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in an input signal's short-time Fourier transform (STFT). The proposed mask estimation network incorporates three blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility

speech_enhancement.png

Wonjun Yi; Jung-Woo Choi; Jae-Woo Lee

arxiv : https://arxiv.org/abs/2304.11708

Accepted at 29th International Congress on Sound and Vibration (ICSV29). 

To improve the safety of drone operations, one should detect the mechanical faults of drones in real-time. The drone sound dataset was constructed by collecting the operating sounds of drones from microphones mounted on three different drones in an anechoic chamber. The dataset includes various operating conditions of drones, such as flight directions (front, back, right, left, clockwise, counterclockwise) and faults on propellers and motors. The drone sounds were then mixed with noises recorded in five different spots on the university campus, with a signal-to-noise ratio (SNR) varying from 10 dB to 15 dB. 

블랙 드론
bottom of page