It synthesizes a large-scale SELD dataset designed to include numerous sound event instances and various acoustic environments. It introduces PSELDNets trained on the large-scale synthetic SELD ...
Abstract: Weakly-supervised Temporal Action Localization (WTAL) aims to localize action instances with only video-level labels during training, where two primary issues are localization incompleteness ...
Abstract: Weakly supervised temporal action localization (TAL) aims to localize the action instances in untrimmed videos using only video-level action labels. Without snippet-level labels, this task ...