Advancing Data Techniques, Multimodal understanding, Reasoning, and Agent systems.
My recent work focuses on data-centric pretraining and selection, multimodal foundation models, and reasoning methods for language, vision, code, and mathematical problem solving.
- Data Techniques
- Pretraining
- Multimodal
- Reasoning
- Vision-Language
- Agent
Ph.D. in CSEHong Kong University of Science and Technology / 2021-Present
M.S. in Computer ScienceUniversity of Electronic Science and Technology of China / 2018-2021
B.S. in MathematicsUniversity of Electronic Science and Technology of China / 2014-2018
Recent Updates
newsconst jipeng = new Terminal('updates')
ACTIVEResearcher at MicrosoftData Techniques, Multimodal understanding, Reasoning, and Agent systems.
ExeSQL and DIDS accepted to EMNLP 2025Execution-driven Text-to-SQL bootstrapping and domain impact-aware data sampling.
Bridge-Coder, TAGCOS, and ScaleBiO accepted in 2025Low-resource code generation, coreset selection, and scalable bilevel data reweighting.
Experience
careerMicrosoft
Researcher / Present
Data Techniques, Agent systems, and model reasoning.TensorOpera
Research Intern / 2024
From-scratch pre-training for a 1.6B decoder-only language model for cloud and edge deployment.ByteDance
Research Intern / 2022-2023
Multimodal foundation language models with Xinsong Zhang and Hang Li.Microsoft
Research Intern / 2021
Code generation and pre-training language models for code with Nan Duan.Living Analytics Research Centre
Research Assistant / 2019-2020
Multimodal knowledge graphs and math word problem solving.Selected Publications
papersRaft
Reward-ranked fine-tuning for generative foundation model alignment.
G-LLaVA
Solving geometric problems with multimodal large language models.
Mitigating the Alignment Tax of RLHF
Reducing the capability cost introduced by RLHF alignment.
Graph-to-Tree
Graph-to-tree learning for solving math word problems.
Template-Based MWP Solvers
Recursive neural networks for template-based math word problem solving.
DetGPT
Detecting visual objects through multimodal reasoning.