Publications

2025

  1. TET.png
    Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation
    Yiwen Guan and Jacob Whitehill
    2025
  2. HCI.png
    Interactive Real-Time Speaker Diarization Correction with Human Feedback
    Xinlu He, Yiwen Guan, Badrivishal Paurana, Zilin Dai, and Jacob Whitehill
    2025
  3. TMM.png
    MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
    Yiwen Guan, Viet Anh Trinh, Vivek Voleti, and Jacob Whitehill
    2025
  4. ICME
    ICME.png
    Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
    Yiwen Guan, Viet Anh Trinh, Vivek Voleti, and Jacob Whitehill
    In IEEE International Conference on Multimedia and Expo (ICME), 2025

2024

  1. DMLM.png
    Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
    Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, and Jacob Whitehill
    2024