Kai WU

Researcher at ByteDance

prof_pic.jpg

I am a researcher at ByteDance, where I lead initiatives in medical multimodal large language models. My work focuses on developing personal agentic medical AI systems capable of understanding and reasoning over complex medical data across multiple modalities, driving innovation at the intersection of artificial intelligence and healthcare.

I received my M.S. from the University of Wisconsin–Madison, where I was fortunate to be advised by Prof. Leyuan Shi and Prof. Xin Wang. 🚀 I am also a Kaggle Master.


🎉🎊 MedXIAOHE is hiring! We are seeking talented individuals with expertise in LLMs, MLLMs, Medical AI for scientific applications, and AI Agents. 🎊🎉

news

Feb 15, 2026 We released the Seed 2.0 Model Card — Towards Intelligence Frontier for Real-World Complexity
Feb 13, 2026 We published the MedXIAOHE Tech Report - A Comprehensive Recipe for Building Medical MLLMs
Jan 15, 2026 Two papers accepted by ICLR 2026!
Dec 18, 2025 We released the Seed1.8 Model Card — Towards Generalized Real-World Agency
Oct 15, 2025 :trophy: Winner of the LLM Medical Reasoning CURE-Bench - Internal and Agent Track!

Selected Publications

  1. Seed
    seed20.jpg
    Seed 2.0 Model Card
    ByteDance Seed
    ByteDance Seed Technical Report, 2026
  2. Seed1.8 Model Card: Towards Generalized Real-World Agency
    ByteDance Seed
    arXiv preprint arXiv:2603.20633, 2025
  3. MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
    Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, and 14 more authors
    arXiv preprint arXiv:2602.12705, 2025
  4. BaseReward: A Strong Baseline for Multimodal Reward Model
    Yi-Fan Zhang, Haihua Yang, Huanyu Zhang, Yang Shi, Zezhou Chen, Haochen Tian, and 8 more authors
    In Advances in Neural Information Processing Systems, 2025
  5. CustAny: Customizing Anything from A Single Example
    Lingjie Kong, Kai Wu, Chengming Xu, Xiaobin Hu, Wenhui Han, Jinlong Peng, and 5 more authors
    In CVPR, 2025
  6. VI
    efficient_mllm.png
    Efficient Multimodal Large Language Models: A Survey
    Yizhang Jin, Jian Li, Tianyun Gu, Yexin Liu, Bo Zhao, Jinyuan Lai, and 6 more authors
    Visual Intelligence, 2025
  7. VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
    Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Xiang Peng, Kai Wu, and 5 more authors
    In CVPR, 2025
  8. Tuning-Free Image Customization with Image and Text Guidance
    Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, and 4 more authors
    In European Conference on Computer Vision, 2024
  9. NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models
    Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, and 2 more authors
    arXiv preprint arXiv:2405.20081, 2024
  10. Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt
    Jiaqi Liu, Kai Wu, Qiang Nie, Ying Chen, Bin-Bin Gao, Yong Liu, and 3 more authors
    In AAAI Conference on Artificial Intelligence, 2024
  11. SoftPatch: Unsupervised Anomaly Detection with Noisy Data
    Xi Jiang, Jiaqi Liu, Jinbao Wang, Qiang Nie, Kai Wu, Yong Liu, and 2 more authors
    In Advances in Neural Information Processing Systems, 2022
  12. Class-Aware Contrastive Semi-Supervised Learning
    Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, and 3 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022