Abstract: Automatically identifying violence in videos is critical, and combining visual and audio cues is often the most effective approach that provides complementary information for violence ...