Paper-Conference

TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos

Introduces TB-Bench to train and evaluate multimodal agents for understanding complex traffic behaviors captured by dashcams.

Korawat Charoenpitaks

GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features

Presents GRIT, a dual-feature transformer that improves both speed and accuracy for image captioning.

avatar
Van-Quang Nguyen

Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks

Enhances interactive instruction following agents with wide-context perception and iterative reasoning.

avatar
Van-Quang Nguyen

Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs

Introduces an efficient attention design capturing full interactions in visual dialog systems.

avatar
Van-Quang Nguyen

Revisiting a Single-Stage Method for Face Detection

Revisits single-stage detectors and boosts their effectiveness for face detection benchmarks.

avatar
Van-Quang Nguyen

CapsuleNet for Micro-expression Recognition

Applies capsule networks to the challenging task of recognizing subtle micro-expressions.

avatar
Van-Quang Nguyen

A New Text Semi-supervised Multi-label Learning Model Based on Using the Label-Feature Relations

Proposes a semi-supervised multi-label learning framework that explicitly models label-feature relationships.

Quang-Thuy Ha

A New Lifelong Topic Modeling Method and Its Application to Vietnamese Text Multi-label Classification

Introduces a lifelong topic modeling pipeline tailored for Vietnamese multi-label text classification.

Quang-Thuy Ha

MASS: A Semi-supervised Multi-label Classification Algorithm with Specific Features

Presents MASS, a semi-supervised multi-label learner leveraging specific feature engineering.

Thi-Ngan Pham