Research

Research Assistant

(WPI, Aug 2023-present)

  • Efficient speech translation / machine translation system with encoder-only tree architecture.
  • Real-time active speaker detection (ASD) using multi-modal input streams.
  • Feedback-driven HCI speech processing system for real-time speech recognition and speaker diarization, utilizing Large Language Models (LLMs) for transcription and speaker correction.
  • Investigation of the impact of each modality on multi-modal speech recognition in various conditions, such as auditory noise levels, raw/abstract visual inputs.
  • Mix-supervised training discrete-token based decoder-only multi-modal language models for multiple tasks, such as speech recognition (ASR), speech translation (S2TT, S2ST), image caption, etc.
  • Uncertainty-based active learning and pseudo-labeling for finetuning ASR models efficiently.

Undergraduate Student Researcher

(USTC, Anhui Province Key Laboratory of Big Data Analysis and Application, Oct 2018 - May 2019)

  • Image processing and image recognition for plant disease with a tree-structured CNN model.