Suraj Pai
Research Scientist & Machine Learning Engineer
Harvard Medical School · Mass General Cancer Center
Boston, MA · 331-806-7744 · surajpai.tech
I’m a researcher and ML engineer working at the intersection of AI and medicine. I recently defended my PhD in Biomedical Engineering and spend most of my time building foundation models for medical imaging and wrangling LLMs into clinical workflows. Currently a Research Fellow at Harvard Medical School, where I lead projects on multimodal AI for radiation oncology. I care a lot about open science — a few of my tools have found their way into other labs’ pipelines, which is always a nice feeling. 20+ papers, 700+ citations, h-index 10.
News
- 2025.12: Paper accepted at ML4H Findings — extracting one-liner patient summaries from radiation oncology notes using LLMs
- 2025.11: 🎓 Successfully defended my PhD in Biomedical Engineering at Maastricht University! Thesis: Representation Learning in Radiology and Cancer Imaging — watch the defense
- 2025: Lighter — our configuration-driven deep learning framework published in JOSS
- 2025.01: Preprint on vision foundation models for CT imaging is out
- 2024.03: Foundation model for cancer imaging biomarkers published in Nature Machine Intelligence
Experience
- Architecting a code-first agentic multimodal AI framework integrating vision-language models with LLM workflows (RAG, tool use, MCP servers) for radiation therapy planning to reduce planning time from weeks to hours.
- Developed a multi-stage multimodal foundation model combining 3D medical imaging encoders with language models for segmentation of referring objects, improving personalized patient care through clinical context-driven contouring.
- Leading research direction and managing a team of 2 PhD students and 2 interns in building multi-modal foundation models, collaborating with a cross-functional team of medical physicists, radiation oncologists, and dosimetrists — resulting in 3 conference abstracts and 2 in-progress journal submissions.
- Developed domain-specific implementations of self-supervised learning (SimCLR, VicReg, MAE, DinoV2) using 11k+ 3D CT scans, creating robust vision foundation models with 20% performance improvements on lung cancer tasks under limited labelled data constraints.
- Built an AI model to measure immune health using data from 27,000 patients — adopted by 5 clinical studies at Mass General for more personalized patient care.
- Co-created Lighter, an open-source YAML-driven deep learning framework, achieving 200+ GitHub stars and 30% faster iteration cycles.
- Researched CycleGAN architectures for medical image enhancement, implementing custom frequency-domain losses and invertible networks for 3D medical imaging to facilitate more accurate radiation delivery.
- Developed Ganslate, an open-source PyTorch framework for image-to-image translation adopted by multiple research collaborators.
- Successfully integrated research prototypes into radiotherapy workflows, demonstrating practical impact of ML in a clinical setting.
- As a founding ML engineer, built DAG-based workflows for sequential decision-making using detection and classification events to model user interactions for customer support scenarios via an AR-based platform.
- Architected automated ML deployment system (GCP) with cloud triggers for model training and serving, reducing deployment time by 50%.
- Developed data annotation web app and ETL pipelines integrated with Kurento media server for real-time inference.
- Implemented YOLO object detection as a C++ socket service with Node.js client, achieving 5× speedup over baseline through optimized OpenCV inference.
- Deployed edge AI systems (RPI + Intel Neural Compute Stick) for real-time detection with custom CNN models for facial analysis and customer engagement scoring.
- Built cross-platform CV library combining CNNs with classical methods (Hough transforms, edge detection), deployed on Android, iOS, and RPI for automated inventory tracking.
- Developed Kinect-based pose estimation using CNN models for low-resource gait monitoring of patients undergoing physical recovery.
Education
Skills
Achievements
- 700+ citations, h-index 10, 20+ publications — significant impact in top-tier journals
- First author publication in Nature Machine Intelligence; two first-author papers under review at Nature and two at Nature Communications
- Research featured in major media outlets including NYT, Science Magazine, and Science Daily
- Reviewed 19 manuscripts for Nature Scientific Reports, npj Breast Cancer, Nature Biomedical Engineering, QIMS, MICCAI, ML4H, JOSS
- Invited Speaker at DL IndabaX, Microsoft Health Futures, Novartis, Fred Hutchinson Cancer Research Center
- 200+ GitHub stars across self-led open-source projects
- Awarded Brigham Research Institute Microgrant
- Active contributor to NVIDIA’s Project MONAI, nnunet, nnssl
Publications
Foundation Models & Representation Learning
-
arXiv 2025Vision Foundation Models for Computed Tomography Suraj Pai*, Ibrahim Hadzic*, Dennis Bontempi, Keno Bressem, Benjamin H. Kann, Andriy Fedorov, Raymond H. Mak, Hugo J. W. L. Aerts -
Nature Machine Intelligence 2024Foundation Model for Cancer Imaging Biomarkers Suraj Pai, Dennis Bontempi, Ibrahim Hadzic, Vasco Prudente, Mateo Sokač, Tafadzwa L. Chaunzwa, Simon Bernatz, Ahmed Hosny, Raymond H. Mak, Nicolai J. Birkbak, Hugo J. W. L. Aerts -
Scientific Reports 2025Foundation Model Based Prediction of Lung Cancer Survival Using Temporal Changes in Dual Time Point CT Scans Jessica Petrochuk, Suraj Pai, John He, Fridolin Haugg, Yiwen Xu, David Christiani, Raymond Mak, Hugo Aerts -
Research Square 2025Foundation Model Embeddings for Quantitative Tumor Imaging Biomarkers Hugo Aerts, Suraj Pai, Ibrahim Hadzic, Andrey Fedorov, Raymond Mak
Medical Image Synthesis & Translation
-
Sensors 2023Frequency-Domain-Based Structure Losses for CycleGAN-Based CBCT Translation Suraj Pai, Ibrahim Hadzic, Chinmay Rao, Ivan Zhovannik, Andre Dekker, Alberto Traverso, Stylianos Asteriadis, Enrique Hortal -
Medical Imaging 2024Optimizing CycleGAN Design for CBCT-to-CT Translation Ibrahim Hadzic, Suraj Pai, Vicki Trier Taasti, Dennis Bontempi, Ivan Zhovannik, Richard Canters, Jan Jakob Sonke, Andre Dekker, Jonas Teuwen, Alberto Traverso -
Medical Image Analysis 2024Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report Evi M. C. Huijben, Maarten L. Terpstra, Suraj Pai, et al.
Clinical AI & Biomarkers
-
Nature Communications 2024End-to-End Reproducible AI Pipelines in Radiology Using the Cloud Dennis Bontempi, Leonard Nuernberg, Suraj Pai, et al. -
Annals of Oncology 2025Thymic Health is Associated with Immunotherapy Outcomes in Patients with Cancer S. Bernatz, Suraj Pai, V. Prudente, et al. -
MICCAI HECKTOR 2020Oropharyngeal Tumour Segmentation Using Ensemble 3D PET-CT Fusion Networks Chinmay Rao, Suraj Pai, Ibrahim Hadzic, Ivan Zhovannik, Dennis Bontempi, Andre Dekker, Jonas Teuwen, Alberto Traverso -
Physics & Imaging in Radiation Oncology 2021Radiomics Integration into a PACS Ivan Zhovannik, Suraj Pai, et al. -
IEEE Trans. Radiation & Plasma Medical Sciences 2021Artificial Intelligence in Radiation Therapy Yabo Fu, Hao Zhang, …, Suraj Pai, Alberto Traverso, et al.
Open Science & Tools
-
JOSS 2025Lighter: Configuration-Driven Deep Learning Ibrahim Hadzic, Suraj Pai, Keno Bressem, Borek Foldyna, Hugo J. W. L. Aerts -
ML4H Findings 2025Give Me The One-liner: Extracting Short Patient Summaries from Radiation Oncology Notes Thibault Heintz, Suraj Pai, Marion Tonneau, et al. (Submitted Dec 2025) -
Universal Access in Information Society 2020Automated 3D Sign Language Caption Generation for Video Nayan Mehta, Suraj Pai, Sanjay Singh