Research
Research Assistant
(WPI, Aug 2023-present)
- Efficient speech translation / machine translation system with encoder-only tree architecture.
- Real-time active speaker detection (ASD) using multi-modal input streams.
- Feedback-driven HCI speech processing system for real-time speech recognition and speaker diarization, utilizing Large Language Models (LLMs) for transcription and speaker correction.
- Investigation of the impact of each modality on multi-modal speech recognition in various conditions, such as auditory noise levels, raw/abstract visual inputs.
- Mix-supervised training discrete-token based decoder-only multi-modal language models for multiple tasks, such as speech recognition (ASR), speech translation (S2TT, S2ST), image caption, etc.
- Uncertainty-based active learning and pseudo-labeling for finetuning ASR models efficiently.
Undergraduate Student Researcher
(USTC, Anhui Province Key Laboratory of Big Data Analysis and Application, Oct 2018 - May 2019)
- Image processing and image recognition for plant disease with a tree-structured CNN model.