Publications

2025

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Yiwen Guan and Jacob Whitehill

2025

arXiv
Interactive Real-Time Speaker Diarization Correction with Human Feedback

Xinlu He, Yiwen Guan, Badrivishal Paurana, Zilin Dai, and Jacob Whitehill

2025

arXiv
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?

Yiwen Guan, Viet Anh Trinh, Vivek Voleti, and Jacob Whitehill

2025

arXiv
ICME

Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?

Yiwen Guan, Viet Anh Trinh, Vivek Voleti, and Jacob Whitehill

In IEEE International Conference on Multimedia and Expo (ICME), 2025

DOI arXiv

2024

Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, and Jacob Whitehill

2024

arXiv