Multi-modal Intelligence Group

Research

Multi-modal Intelligence Group (MIG) focuses on three major research directions that integrate cutting-edge artificial intelligence, computer vision, and aerial systems. Each direction is organized into a dedicated sub-team with monthly rotating leadership.


👁‍🗨 Intelligent Visual Capturing Systems (FLY-IN)

Designing low-level vision algorithms that enable aerial vehicles to “see” in complex conditions — night, shining, hazy, cloudy, and rainy days. The focus is on deployment in resource-constrained edge devices on drones.

Team Leads (rotating monthly): Yangming Zhang, Junyu Liu, Lanyue Liang, Chaofan Qiao
Team Members: Quan Rui, Dongyu Xie, Zhibin Wang, Jiening Zhang, Zhuoyao Fan, Jiayi Zhou, Zixiao Hu, Wen Bo, Huilin Zhang, Yangyang Feng

Representative Research Topics:


🧠 Intelligent Visual Perception Systems (FLY-FOR)

Developing vision and multimodal large language models (LLMs) to understand the world at pixel, object, and scene level. Deployed into UAVs for object recognition, scene reconstruction, and trend prediction.

Team Leads (rotating monthly): Yupeng Gao, Xi Wu, Pengwei Yang, Jun Zhang
Team Members: Minghang Zhou, Mingfeng Zha, Chenxi Lan, Keli Wang, Yuchen Wu, Yirui Xu, Haixia Li

Representative Research Topics:


🤖 Intelligent Embodied Aerial AI Systems (FLY-WITH)

Designing embodied visual-language navigation (VLN) and visual-language action (VLA) algorithms, enabling UAVs and multi-agent systems to interact, self-control, and self-organize autonomously.

Team Leads (rotating monthly): Yunqiang Pei, Kaiyue Zhang, Rongyu Du
Team Members: Hongkun Chen, Ruyu Ye, Mian Zhang

Representative Research Topics: