Generative Medical AI · Multimodal Learning

Junzhi (Raymond) Ning

I am a Machine Learning Researcher in the General Medical AI (GMAI) group at Shanghai AI Lab, supervised by Dr. Junjun He. My research focuses on generative AI and multimodal learning for medical applications, specializing in large-scale synthetic data generation and deep generative models. I work on developing scalable workflows to create millions of high-quality medical training samples, addressing critical challenges in data scarcity and domain adaptation for healthcare AI.

I completed my MRes with Distinction at Imperial College London (Oct 2023 - Oct 2024), supervised by Dr. Matthieu Komorowski and Dr. Guang Yang jointly. I developed deep generative models for chest X-ray image translation to improve diagnostic accuracy. During this period, I collaborated with ICU clinicians and contributed to research proposals for industrial funding.

My educational background includes a Bachelor of Science (Honours) in Data Science with University Medal from The University of Sydney, a concurrent Diploma in Computing, and a Bachelor of Science in Mathematics and Statistics from The University of Melbourne (First-Class Honours).

Research focus Publications & highlights

PhD-seeking · Fall 2025 / Spring 2026

Generative AI × Medicine

Portrait of a man in quest of the unknown, yet satisfied.

MRes in Machine Learning at Imperial College London. Researching generative models, synthetic data pipelines, and vision–language systems for medical imaging.

Research Overview

Multimodal Medical AI

Contributing to GMAI-VL-R1 (RL-enhanced medical reasoning) and vision–language models for text-guided medical image generation and analysis.

Large-scale Synthetic Data

Building RetinaLogos-1400k (1.4M synthetic retinal images) and scalable workflows for generating millions of high-quality medical training samples.

Deep Generative Models

Developing generative models for chest X-ray translation, opacity removal, and anatomical enhancement to support more accurate and robust diagnosis.

MICCAI 2025 4× 1 oral · 1 spotlight

Other venues 5+ WACV · ISBI · IJCAI · PRL · NeurIPS WS

Open to opportunities PhD 2025–26 Research internships & collaborations

Highlights

Publications summary

4× MICCAI 2025 (1 oral, 1 spotlight) · 1× WACV 2025 · 1× IJCAI 2024 · 1× ISBI 2025 · 1× Pattern Recognition Letters · 1× NeurIPS Workshop (oral) · 3× under review (arXiv / technical reports).

Research Focus
                Medical imaging
                Vision–language
                Synthetic data
              

              Awards & recognition
              University Medal, BSc (Hons) in Data Science, University of Sydney (2023)
Dean's Honours List for Data Science, University of Sydney (2023)
Melbourne International Undergraduate Scholarship, University of Melbourne (2022)
Dean's Honours List, University of Melbourne (2019)

            

Seeking opportunities

I am actively seeking PhD positions for Fall 2025 and Spring 2026, as well as research internships in generative AI, multimodal learning, and medical imaging.

Feel free to reach out if you are interested in collaborations or have openings in related areas.

"Positivity is the essence of progress. In every challenge, I see an opportunity for learning and growth."

Publications

First-author Publications

UniMedVL medical multimodal model — UniMedVL: Unifying Medical Multimodal Understanding and Generation with Unified Language Modeling

Junzhi Ning*, Wei Li*, Cheng Tang*, Jiashi Lin, Chenglong Ma, Chaoyang Zhang, Jiyao Liu, Ying Chen, Shujian Gao, Lihao Liu, Yuandong Pu, Huihui Xu, Chenhui Gou, Ziyan Huang, Yi Xin, Qi Qin, Zhongying Deng, Diping Song, Bin Fu, Guang Yang, Yuanfeng Ji, Tianbin Li, Yanzhou Su, Jin Ye, Shixiang Tang, Ming Hu, Junjun He†
* Equal contribution · † Corresponding author

arXiv 2025 · Co-first author · Under Review

Medical diagnostic applications require models that can process multimodal inputs (images, patient histories, lab results) and generate both structured predictions and natural language explanations. We present UniMedVL, a unified medical vision-language model that bridges understanding and generation through a novel language modeling paradigm. UniMedVL achieves state-of-the-art performance on multiple medical imaging tasks including visual question answering, report generation, and diagnostic prediction while maintaining strong zero-shot generalization capabilities.

arXiv · Code · Website · Hugging Face

Chest X-ray opacity removal — Unpaired Translation of Chest X-Ray Images for Lung Opacity Diagnosis

**Junzhi Ning**, Dominic Marshall, Yijian Gao, Xiaodan Xing, Nan Yang, Yingying Fang, Sheng Zhang, Matthieu Komorowski, Guang Yang

Pattern Recognition Letters 2025 · First author

Lung opacity in chest X-rays often obscures diagnostic features, complicating disease assessment. We propose an unpaired image-to-image translation framework that removes opacity artifacts while preserving critical diagnostic content. Our method employs adaptive activation masks and cross-domain consistency constraints to learn robust mappings between normal and opacity-affected images without requiring paired training data. The approach demonstrates significant improvements in downstream segmentation accuracy and diagnostic confidence across multiple lung disease datasets.

Paper

Latent diffusion for CXR classification — Unveiling the Capabilities of Latent Diffusion Models for Classification of Lung Diseases in Chest X-Rays

**Junzhi Ning**, Xiaodan Xing, Sheng Zhang, Xiao Ma, Guang Yang

ISBI 2025 · First author

Latent diffusion models have shown remarkable performance in image generation, but their potential for medical image classification remains underexplored. We investigate how conditional latent diffusion models can be adapted for zero-shot lung disease classification in chest X-rays. Our analysis reveals that the intermediate latent representations learned during the denoising process encode rich diagnostic information, producing interpretable lesion localizations that align with radiological findings without explicit supervision for classification tasks.

Paper

Deep Generative Models for medical imaging — Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

Junzhi Ning*, Xiaodan Xing*, Yang Nan, Guang Yang
* Equal contribution

NeurIPS Workshop 2024 · Co-first author

Deep generative models have revolutionized medical image analysis, but their ability to unveil underlying patterns through vision-language conditioning remains underexplored. We present a novel framework that leverages vision-language conditioning to guide deep generative models in discovering and visualizing subtle patterns in medical images. By conditioning on natural language descriptions of anatomical structures and pathological features, our approach enables interpretable pattern discovery across diverse medical imaging modalities. The framework demonstrates strong performance in revealing clinically meaningful patterns that align with radiological expertise while maintaining generative quality.

arXiv · Code · Video

Key Collaborations – Highlighted

GMAI-VL-R1 medical vision-language model — GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Yanzhou Su, Tianbin Li, Jiyao Liu, Chenglong Ma, **Junzhi Ning**, Cheng Tang, Sibo Ju, Jin Ye, Pengcheng Chen, Ming Hu, Shixiang Tang, Lihao Liu, Bin Fu, Wenqi Shao, Xiaowei Hu, Xiangwen Liao, Yuanfeng Ji, Junjun He

arXiv 2025 · Key Collaboration · Under Review

Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boosting diagnostic accuracy and clinical support. We also develop a reasoning data synthesis method, generating step-by-step reasoning data via rejection sampling, which further enhances the model's generalization. Experimental results show that after RL training, GMAI-VL-R1 excels in tasks such as medical image diagnosis and visual question answering.

arXiv · Hugging Face

Ophora ophthalmology multimodal model — Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Wei Li, Ming Hu, Guoan Wang, Lihao Liu, Kaijin Zhou, **Junzhi Ning**, Xin Guo, Zongyuan Ge, Lixu Gu, Junjun He

MICCAI 2025 · Oral Presentation · Key Collaboration

In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. We first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K.

Paper · Code

MedGround-R1 medical grounding model — MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization

Huihui Xu, Yuanpeng Nie, Hualiang Wang, Ying Chen, Wei Li, **Junzhi Ning**, Lihao Liu, Hongqiu Wang, Lei Zhu, Jiyao Liu, Xiaomeng Li, Junjun He

MICCAI 2025 · Spotlight · Key Collaboration

Medical Image Grounding (MIG), which involves localizing specific regions in medical images based on textual descriptions, requires models to not only perceive regions but also deduce spatial relationships of these regions. Existing Vision-Language Models (VLMs) for MIG often rely on Supervised Fine-Tuning (SFT) with large amounts of Chain-of-Thought (CoT) reasoning annotations, which are expensive and time-consuming to acquire. Recently, DeepSeek-R1 demonstrated that Large Language Models (LLMs) can acquire reasoning abilities through Group Relative Policy Optimization (GRPO) without requiring CoT annotations. In this paper, we adapt the GRPO reinforcement learning framework to VLMs for Medical Image Grounding. We propose the Spatial-Semantic Rewarded Group Relative Policy Optimization to train the model without CoT reasoning annotations.

Paper · Code

Multi-modal MRI translation — Multi-modal MRI Translation via Evidential Regression and Distribution Calibration

Jiyao Liu, Shangqi Gao, Yuxin Li, Lihao Liu, Xin Gao, Zhaohu Xing, **Junzhi Ning**, Yanzhou Su, Xiao-Yong Zhang, Junjun He, Ningsheng Xu, Xiahai Zhuang

MICCAI 2025 · Key Collaboration

Multi-modal Magnetic Resonance Imaging (MRI) translation leverages information from source MRI sequences to generate target modalities, enabling comprehensive diagnosis while overcoming the limitations of acquiring all sequences. While existing deep-learning-based multi-modal MRI translation methods have shown promising potential, they still face two key challenges: 1) lack of reliable uncertainty quantification for synthesized images, and 2) limited robustness when deployed across different medical centers. To address these challenges, we propose a novel framework that reformulates multi-modal MRI translation as a multi-modal evidential regression problem with distribution calibration. Extensive experiments on three datasets from the BraTS2023 challenge demonstrate that our framework achieves superior performance and robustness across domains.

Paper Coming Soon

DMRN airway segmentation — DMRN: A Dynamical Multi-Order Response Network for the Robust Lung Airway Segmentation

Sheng Zhang, Jinge Wu, **Junzhi Ning**, Guang Yang

WACV 2025 · Key Collaboration

Robust airway segmentation from CT scans is challenging due to varying tree topology and imaging artifacts. DMRN introduces a dynamical multi-order response architecture that combines supervised segmentation with unsupervised structure learning. The network adaptively adjusts its receptive fields to capture both fine-grained bronchiolar details and large-scale airway topology, achieving state-of-the-art segmentation performance across diverse lung disease datasets including COPD, COVID-19, and lung cancer cohorts.

Paper

Cyclic Vision-Language Manipulator for Reliable Image Interpretation

Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Zhiling Yue, Yijian Gao, **Junzhi Ning**, Zhi Li, Simon Walsh, Guang Yang

IJCAI 2025 · Key Collaboration

Automated medical report generation must produce reliable and interpretable outputs. We propose a cyclic manipulation framework that establishes bidirectional consistency between image features and textual reports. The model learns to manipulate visual representations in response to report modifications and vice versa, ensuring that changes in one modality produce expected changes in the other. This cyclic constraint improves both the factual accuracy and clinical reliability of generated reports while providing interpretable attention maps for key diagnostic findings.

Paper

Other Contributions

Intern-S1 Scientific Foundation Model — Intern-S1: A Scientific Multimodal Foundation Model

Lei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, ..., **Junzhi Ning**, *et al.* (200+ authors)

Technical Report · arXiv 2025

Scientific discovery requires integrating knowledge across diverse domains and modalities. Intern-S1 is a large-scale mixture-of-experts foundation model with 241B parameters, trained on scientific literature, molecular structures, experimental data, and research code. The model achieves state-of-the-art performance on molecular property prediction, crystal stability forecasting, retrosynthesis planning, and scientific question answering, demonstrating strong zero-shot transfer across chemistry, materials science, and biology.

arXiv

Scientific LLM Survey — A survey of scientific large language models: From data foundations to agent frontiers

Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, ..., **Junzhi Ning**, *et al.* (100+ authors)

Survey · arXiv 2025

The rapid development of large language models has opened new possibilities for scientific research automation. This comprehensive survey examines scientific LLMs across the full pipeline from data curation to autonomous agents. We analyze 270+ specialized scientific datasets, 190+ domain-specific benchmarks, and emerging architectures for scientific reasoning. The survey covers applications in physics, chemistry, biology, medicine, and materials science, identifying key challenges including hallucination in scientific contexts, integration of structured scientific knowledge, and ethical considerations for AI-assisted research.

arXiv

For a complete and up-to-date list of publications, please see my Semantic Scholar profile or OpenReview .

Expertise

Research Areas

Medical Imaging & AI for Healthcare
Generative Models & Diffusion Models
Multimodal Vision-Language Models
Medical Image Translation & Synthesis
Data-Centric AI & Synthetic Data Generation

Technical Skills

Deep Learning: PyTorch, TensorFlow
Medical Imaging: Chest X-ray, Retinal, MRI
Computer Vision: Segmentation, Classification
Generative AI: Latent Diffusion, GANs, T2I
Scientific Computing: Python, NumPy, Pandas

Tools & Platforms

High-Performance Computing (HPC)
Large-scale Data Generation Pipelines
Version Control: Git, GitHub
Experiment Tracking: Weights & Biases
Cloud Platforms: SLURM, Docker

Recognition

Aug 2023 Highest Honor

University Medal, University of Sydney

Awarded University Medal in Bachelor of Science (Honours) for achieving the highest academic distinction (WAM 89.5) in Data Science Honours program. The honour thesis on night-to-day image translation was subsequently accepted at the Australasian Database Conference.

2025 Conference Recognition

Oral Presentations at MICCAI and NeurIPS Workshop

Selected for oral presentations at MICCAI 2025 (Ophora - Ophthalmic Surgical Video Generation) and NeurIPS 2024 Workshop on Advancements in Medical Foundation Models (Deep Generative Models for Medical Imaging).

2022 Scholarship

Melbourne International Undergraduate Scholarship

Received merit-based scholarship from University of Melbourne in recognition of academic excellence in Mathematics and Statistics (Overall WAM: 86.8/First Class Honours).

2019 Academic Recognition

Dean's Honours List, University of Melbourne

Placed on Dean's Honours List for First Year Bachelor of Science students for exceptional academic performance among the entire cohort.

News

Oct 2025 Preprint

UniMedVL arXiv preprint released

New preprint on arXiv: UniMedVL, a unified multimodal framework for medical image understanding and generation built around the Observation–Knowledge–Analysis paradigm.

Jun 2025 MICCAI 2025

4 papers accepted at MICCAI 2025

4 papers accepted at MICCAI 2025, including 1 first-author paper (RetinaLogos), 1 oral presentation (Ophora), and 1 spotlight (MedGround-R1), advancing generative and multimodal medical AI.

Apr 2025 IJCAI 2025

Cyclic Vision-Language Manipulator accepted at IJCAI 2025

CVLM paper on reliable and fine-grained image interpretation for automated report generation accepted at IJCAI 2025.

Mar 2025 Pattern Recognition Letters

First-author PRL paper on unpaired CXR translation

Unpaired chest X-ray translation method for lung opacity diagnosis accepted in Pattern Recognition Letters, improving segmentation and classification across multiple datasets.

Nov 2024 Position

Joined Shanghai AI Lab

Started as Machine Learning Researcher at Shanghai AI Lab in the GMAI group, focusing on multimodal medical AI models and large-scale synthetic dataset generation for healthcare applications.

Oct 2024 Graduation

MRes in Machine Learning, Imperial College London

Completed MRes (Distinction), supervised by Dr. Matthieu Komorowski and Dr. Guang Yang, working on deep generative models for chest X-ray image translation in collaboration with ICU clinicians.

Academic path

Academic Service

Reviewer - Journals

TMI

Reviewer - Conferences

ICLR, CVPR, ISBI

Gratitude

I am deeply grateful to my supervisors Dr. Junjun He, Dr. Guang Yang, Dr. Matthieu Komorowski, and Prof. Minming Gong for their invaluable guidance, mentorship, and support throughout my research journey. Their insights and encouragement have been instrumental in shaping my academic growth.

I also extend my sincere thanks to my collaborators Lihao Liu, Sheng Zhang, Xiaodan Xing, Yingying Fang, Cheng Tang, Wei Li, Jiyao Liu, Huihui Xu, and many others. Their expertise, dedication, and teamwork have been essential to our research achievements.

Contact

Collaborations & PhD opportunities

If you are working on generative medical AI, multimodal learning, or data-centric healthcare, I would love to connect.

Quick profile

Current Machine Learning Researcher, Shanghai AI Lab (GMAI)
MRes Machine Learning, Imperial College London (Distinction)
BSc (Hons) Data Science, University of Sydney (University Medal)
Interests Generative models · Medical imaging · Multimodal reasoning

Junzhi (Raymond) Ning

Generative AI & Multimodal Medical Intelligence

Multimodal Medical AI

Large-scale Synthetic Data

Deep Generative Models

Publications, Awards & Opportunities

Publications summary

Research Focus

Awards & recognition

Seeking opportunities

Selected publications

First-author Publications

UniMedVL: Unifying Medical Multimodal Understanding and Generation with Unified Language Modeling

RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

Unpaired Translation of Chest X-Ray Images for Lung Opacity Diagnosis

Unveiling the Capabilities of Latent Diffusion Models for Classification of Lung Diseases in Chest X-Rays

Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

Key Collaborations – Highlighted

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization

Multi-modal MRI Translation via Evidential Regression and Distribution Calibration

DMRN: A Dynamical Multi-Order Response Network for the Robust Lung Airway Segmentation

Cyclic Vision-Language Manipulator for Reliable Image Interpretation

Other Contributions

Intern-S1: A Scientific Multimodal Foundation Model

A survey of scientific large language models: From data foundations to agent frontiers

Skills & Expertise