Papers

2026

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Provably Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning

Tian Xu, Zexuan Chen, Zhilong Zhang, Yi-Chen Li, Chenyang Wang, Lei Yuan, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation

Zhichao Wu, Junyin Ye, Zhilong Zhang, Yihao Sun, Haoxin Lin, Jiaheng Luo, Haoxiang Ren, Lei Yuan, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models

Zhilong Zhang, Haoxiang Ren, Yihao Sun, Yifei Sheng, Haonan Wang, Zhichao Wu, Haoxin Lin, Pierre-Luc Bacon, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

Nan Tang, Jing-Cheng Pang, Guanlin Li, Chao Qian, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Clipping Low-Probability Tokens in SFT Yields a Generalizable Initialization for RL

Tian-Shuo Liu, Chengxing Jia, Haoyu Liu, Pengyuan Wang, Shiyuan Zhang, Jie Fu, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Offline Multi-agent Continual Cooperation via Skill Partition and Reuse

Yuchen Xiao, Lei Yuan, Ruiqi Xue, Tieyue Yin, Yang Yu

In:** Proceedings of the 43rd International Conference on Machine Learning (ICML'26),** Seoul, South Korea, 2026.

Learning Disentangled Multi-Agent World Model for Decentralized Control

Di Xue, Jing Jiang, Shaowei Zhang, Wenhao Guo, Lei Yuan, Zongzhang Zhang, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Towards Complete Multi-Agent Coordination Policy Learning via Denoising Maximum Entropy Optimization

Guanghao Li, Lei Yuan, Ruiqi Xue, Hengchang Zhang, Jianhong Wang, Yi-Chen Li, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Decentralized and Disentangled Task–Role Representation Learning for Generalizable Offline Multi-Agent Meta Reinforcement Learning

Lei Yuan, Ruiqi Xue, Yang Yu

In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, 2026.

Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control

Zhilong Zhang, Yunpeng Mei, Xinghao Du, Hongjie Cao, Haonan Wang, Pengyuan Min, Chenyu Wang, Pengfei Chen, Chenbo Xin, Yijie Wang, Wenyu Luo, Yihao Sun, Yidi Wang, Lei Yuan, Gang Wang, Yang Yu

In: Proceedings of the 14th International Conference on Learning Representations (ICLR'26), Rio de Janeiro, Brazil, 2026.

ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation

Haoxin Lin, Siyuan Xiao, Yi-Chen Li, Zhilong Zhang, Yihao Sun, Chengxing Jia, Yang Yu

In: Proceedings of the 14th International Conference on Learning Representations (ICLR'26), Rio de Janeiro, Brazil, 2026.

EMFuse: Energy-based Model Fusion for Decision Making.

Kejie He, Yi-Chen Li, Yang Yu

In: Proceedings of the 14th International Conference on Learning Representations (ICLR'26), Rio de Janeiro, Brazil, 2026.

A Study on PAVE Specification for Learnware

Hao-Yu Shi, Zhi-Hao Tan, Zi-Chen Zhao, Yang Yu, Zhi-Hua Zhou

In: Proceedings of the 14th International Conference on Learning Representations (ICLR'26), Rio de Janeiro, Brazil, 2026.

Boosting Offline MARL under Imbalanced Datasets via Compositional Diffusion Models

Lihe Li, Shenghe Hu, Bingxuan Lan, Yuqi Bian, Huan ZHang, ZhaoMing, Chongjie Zhang, Lei Yuan, Yang Yu

In: Proceedings of the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'26), 2026.

Efficient Teammate Adaptation with Language-assisted Progressive Intention Alignment

Zhichao Wu, Ruiqi Xue, Yi-Chen Li, Cong Guan, Jing-Wen Yang, Lei Yuan, Yang Yu

In: Proceedings of the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'26), 2026.

An interactive simulation framework by ensemble imitation learning agents for training robust trading policies.

Julian Zhong-Nan Zhang, Yang Yu.

In: Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI'26) IAAI Track, Singapore, 2026.

Multi-agent in-context coordination via decentralized memory retrieval.

Tao Jiang, Zichuan Lin, Lihe Li, Yi-Chen Li, Cong Guan, Lei Yuan, Zongzhang Zhang, Yang Yu, Deheng Ye.

In: Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI'26) Main Track, Singapore, 2026. (Oral)

Reward model Evaluation via automatically-ranked policy alignment.

Aoran Wang, Lei Ou, Yang Yu, Zongzhang Zhang.

In: **Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI'26) **Main Track, Singapore, 2026. (Oral)

2025

Generalizable Multi-Modal Adversarial Imitation Learning for Non-Stationary Dynamics.

Yi-Chen Li, Ningjing Chao, Zongzhang Zhang, Fuxiang Zhang, Lei Yuan, Yang Yu.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(7): 5600-5612, 2025.

Multiagent Continual Coordination via Progressive Task Contextualization.

Lei Yuan, Lihe Li, Ziqian Zhang, Fuxiang Zhang, Cong Guan, Yang Yu.

IEEE Transactions on Neural Networks and Learning Systems, 36(4): 6326-6340, 2025.

Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning.

Cong Guan, Feng Chen, Lei Yuan, Zongzhang Zhang, Yang Yu.

IEEE Transactions on Neural Networks and Learning Systems, 36(5): 9044-9056, 2025.

Learning to Coordinate With Different Teammates via Team Probing.

Hao Ding, Chengxing Jia, Zongzhang Zhang, Cong Guan, Feng Chen, Lei Yuan, Yang Yu.

IEEE Transactions on Neural Networks and Learning Systems, 36(9): 15807-15821, 2025.

Reinforcement Learning With Sparse-Executing Action via Sparsity Regularization.

Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu.

IEEE Transactions on Neural Networks and Learning Systems, 36(9): 16072-16084, 2025.

Learning de-biased environment models for delivery incentive policy optimization on food delivery platforms.

Yu-Ren Liu, Xiong-Hui Chen, Siyuan Xiao, Xinyu Yang, Xintong Qi, Linjun Zhou, Yang Yu & Fangsheng Huang.

Machine Learning, 114:262, 2025.

Constraining an Unconstrained Multi-agent Policy with offline data.

Cong Guan, Tao Jiang, Yi-Chen Li, Zongzhang Zhang, Lei Yuan, Yang Yu.

Neural Networks, 186: 107253, 2025.

Offline model-based reinforcement learning with causal structured world models.

Zhengmao Zhu, Hong-Long Tian, Xionghui Chen, Kun Zhang, Yang Yu.

Frontiers of Computer Science, 19(4): 194347, 2025.

Open and real-world human-AI coordination by heterogeneous training with communication.

Cong Guan, Ke Xue, Chunpeng Fan, Feng Chen, Lichao Zhang, Lei Yuan, Chao Qian, Yang Yu.

Frontiers of Computer Science, 19(4): 194314, 2025.

Efficient Multi-Agent Cooperation Learning through Teammate Lookahead.

Feng Chen, Xinwei Chen, Rong-Jun Qin, Cong Guan, Lei Yuan, Zongzhang Zhang, Yang Yu.

Transactions on Machine Learning Research, 2025.

Interactive Large Language Models for Reliable Answering under Incomplete Context.

Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jiahao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Ming-Kun Xie, Xiang Chen, Sheng-Jun Huang, Yang Yu.

Transactions on Machine Learning Research, 2025.

Uncertainty-sensitive privileged learning.

Fan-Ming Luo, Lei Yuan, Yang Yu.

In: Advances in Neural Information Processing Systems 39 (NeurIPS'25), Mexico City, Mexico, 2025.

Adaptable safe policy learning from multi-task data with constraint prioritized decision transformer.

Ruiqi Xue, Ziqian Zhang, Lihe Li, Cong Guan, Lei Yuan, Yang Yu.

In: Advances in Neural Information Processing Systems 39 (NeurIPS'25), Mexico City, Mexico, 2025.

Multi-agent imitation by learning and sampling from factorized soft Q-Function.

Yi-Chen Li, Zhongxiang Ling, Tao Jiang, Fuxiang Zhang, Pengyuan Wang, Lei Yuan, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 39 (NeurIPS'25), Mexico City, Mexico, 2025.

Focus-then-reuse: Fast adaptation in visual perturbation environments.

Jiahui Wang, Chao Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 39 (NeurIPS'25), Mexico City, Mexico, 2025.

Controlling Large Language Model with Latent Action.

Chengxing Jia, Ziniu Li, Yi-Chen Li, Pengyuan Wang, Zhenyu Hou, Yuxiao Dong, Yang Yu.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML'25), Vancouver, Canada, 2025.

LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent.

Lihe Li, Lei Yuan, Pengsen Liu, Tao Jiang, Yang Yu.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML'25), Vancouver, Canada, 2025.

Learning to Reuse Policies in State Evolvable Environments.

Ziqian Zhang, Bohan Yang, Lihe Li, Yuqi Bian, Ruiqi Xue, Feng Chen, Yi-Chen Li, Lei Yuan, Yang Yu.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML'25), Vancouver, Canada, 2025.

Improving Reward Model Generalization from Adversarial Process Enhanced Preferences.

Zhilong Zhang, Tian Xu, Xinghao Du, Xingchen Cao, Yihao Sun, Yang Yu.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML'25), Vancouver, Canada, 2025.

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning.

Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, Zongzhang Zhang.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML'25), Vancouver, Canada, 2025

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning.

Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

Semantic Skill Extraction via Vision-Language Model Guidance for Efficient Reinforcement Learning.

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen, Lixuan Jin, Pengyuan Wang, Zhilong Zhang, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching.

Lei Yuan, Yuqi Bian, Lihe Li, Ziqian Zhang, Cong Guan, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation.

Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models.

Xu-Hui Liu, Yali Du, Jun Wang, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

Learning View-invariant World Models for Visual Robotic Manipulation.

Jing-Cheng Pang, Nan Tang, Kaiyuan Li, Yuting Tang, Xin-Qiang Cai, Zhen-Yu Zhang, Gang Niu, Masashi Sugiyama, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch.

Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization.

Hong Qian, Yiyi Zhu, Xiang Shu, Xin An, Yaolin Wen, Shuo Liu, Huakang Lu, Aimin Zhou, Ke Tang, Yang Yu.

In: Proceedings of the 13th International Conference on Learning Representations (ICLR'25), Singapore, 2025.

2024

Understanding, rehearsing, and introspecting: Learn a policy from textual tutorial books in football games

Xiong-Hui Chen, Ziyan Wang, Yali Du, Shengyi Jiang, Meng Fang, Yang Yu, Jun Wang.

In: Advances in Neural Information Processing Systems 38 (NeurIPS'24), Vancouver, Canada, 2024. (Oral)

Knowledgeable agents by offline reinforcement learning from large language model rollouts

Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu.

In: Advances in Neural Information Processing Systems 38 (NeurIPS'24), Vancouver, Canada, 2024.

Efficient recurrent off-policy RL requires a context-encoder-specific learning rate

Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu.

In: Advances in Neural Information Processing Systems 38 (NeurIPS'24), Vancouver, Canada, 2024.

Provably and practically efficient adversarial imitation learning with general function approximation

Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun, Yang Yu.

In: Advances in Neural Information Processing Systems 38 (NeurIPS'24), Vancouver, Canada, 2024.

Provably and practically efficient adversarial imitation learning with general function approximation.

Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun, Yang Yu.

In: Advances in Neural Information Processing Systems 38 (NeurIPS'24), Vancouver, Canada, 2024.

Dynamics Adaptive Safe Reinforcement Learning with a Misspecified Simulator

Ruiqi Xue, Ziqian Zhang, Lihe Li, Feng Chen, Yi-Chen Li, Yang Yu, Lei Yuan.

In: Proceedings of the 35th European Conference on Machine Learning (ECML'24), Vilnius, Lithuania, 2024.

Beimingwu: A Learnware dock system

Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiaochuan Zou, Yang Yu, Zhi-Hua Zhou.

In: Proceedings of the 30th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'24) Applied Data Science Track, Barcelona, Spain, 2024.

Deep Demonstration Tracing: Learning Generalizable Imitator for Runtime One-Shot Imitation

Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Xu-Hui Liu, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Yang Yu, Kai Xu, Zongzhang Zhang, Anqi Huang.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Policy-conditioned Environment Models are More Generalizable

Ruifeng Chen, Xiong-Hui Chen, Yihao Sun, Siyuan Xiao, Minhui Li, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

Xingchen Cao, Fan-Ming Luo, Junyin Ye, Tian Xu, Zhilong Zhang, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Offline Transition Modeling via Contrastive Energy Learning

Ruifeng Chen, Chengxing Jia, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics

Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024.

Understanding or manipulation: Rethinking online performance gains of modern recommender systems

Zhengbang Zhu, Rongjun Qin, Junjie Huang, Xinyi Dai, Yang Yu, Yong Yu, Weinan Zhang.

ACM Transactions on Information Systems, 42(4):1-32, 2024, CoRR abs/2210.05662.

Learning in games: A systematic review

Rong-Jun Qin, Yang Yu.

SCIENCE CHINA Information Sciences, 2024.

When is RL better than DPO in RLHF? A representation and optimization PerspectivePolicy optimization in RLHF: The impact of out-of-preference data

Ziniu Li, Tian Xu, Yang Yu.

In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24) Tiny Papers, Vienna, Austria, 2024.

Distributional reinforcement learning with sample-set Bellman update

Weijian Zhang, Jianshu Wang, Yang Yu.

In: Proceedings of 2024 IEEE International Conference on Robotics and Automation (ICRA'24), Yokohama, Japan, 2024.

Reward-consistent dynamics models are strongly generalizable for offline reinforcement learning

Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu.

In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24), Vienna, Austria, 2024, (Spotlight) https://arxiv.org/abs/2310.05422.

Policy rehearsing: Training generalizable policies for reinforcement learning

Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu, Zhi-Hua Zhou.

In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24), Vienna, Austria, 2024.

Language model self-improvement by reinforcement learning contemplation

Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24), Vienna, Austria, 2024, http://arxiv.org/abs/2305.14483.

Flow to better: Offline preference-based reinforcement learning via preferred trajectory generation

Zhilong Zhang, Yihao Sun, Junyin Ye, Tian-Shuo Liu, Jiaji Zhang, Yang Yu.

In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24), Vienna, Austria, 2024.

Cost-aware offline safe meta reinforcement learning with robust in-distribution online task adaptation

Cong Guan, Ruiqi Xue, Ziqian Zhang, Lihe Li, Yichen Li, Lei Yuan, Yang Yu.

In: Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024), 2024.

Foresight distribution adjustment for off-policy reinforcement learning

Ruifeng Chen, Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Feng Xu, Yang Yu.

In: Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024), 2024.

Disentangling policy from offline task representation learning via adversarial data augmentation

Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chenxiao Gao, Xu-Hui Liu, Lei Yuan, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024), 2024.

Deep anomaly detection via active anomaly search

Chao Chen, Dawei Wang, Feng Mao, Jiacheng Xu, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024), 2024.

Episodic return decomposition by difference of implicitly assigned sub-trajectory reward

Haoxin Lin, Hongqiu Wu, Jiaji Zhang, Yihao Sun, Junyin Ye, Yang Yu.

In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), 2024.

Focus-Then-Decide: Segmentation-assisted reinforcement learning

Chao Chen, Jiacheng Xu, Weijian Liao, Hao Ding, Zongzhang Zhang, Yang Yu, Rui Zhao.

In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), 2024.

ACT: Empowering decision transformer with dynamic programming via advantage conditioning

Chenxiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), 2024.

Generalizable task representation learning for offline meta-reinforcement learning with data limitations

Renzhe Zhou, Chenxiao Gao, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), 2024.

Model gradient: unified model and policy learning in model-based reinforcement learning

Chengxing Jia, Fuxiang Zhang, Tian Xu, Jing-Cheng Pang, Zongzhang Zhang, Yang Yu.

Frontiers of Computer Science, 18:184339, 2024.

MixLight: Mixed-Agent Cooperative Reinforcement Learning for Traffic Light Control

Ming Yang, Yiming Wang, Yang Yu , Mingliang Zhou, Leong Hou U.

IEEE Transactions on Industrial Informatics, 20(2): 2653-2661, 2024.

2023

Offline model-based adaptable policy learning for decision-making in out-of-support regions

Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12): 15260-15274, 2023.

Learning physically realizable skills for online packing of general 3D shapes

Hang Zhao, Zherong Pan, Yang Yu, Kai Xu.

ACM Transactions on Graphics, 42(5): 165:1-165:21, 2023, https://arxiv.org/abs/2212.02094.

Fully decentralized multiagent communication via causal inference

Han Wang, Yang Yu, Yuan Jiang.

IEEE Transactions on Neural Networks and Learning Systems, 34(12): 10193-10202, 2023.

Memory-efficient transformer-based network model for traveling salesman problem

Hua Yang, Minghao Zhao, Lei Yuan, Yang Yu, Zhenhua Li, Ming Gu.

Neural Networks, 161:589-597, 2023.

Learning to coordinate with anyone

Lei Yuan, Lihe Li, Ziqian Zhang, Feng Chen, Tianyi Zhang, Cong Guan, Yang Yu, Zhi-Hua Zhou.

In: Proceedings of the Distributed Artificial Intelligence (DAI'23), 2023, (Best paper award) CoRR abs/2309.12633.

Adversarial counterfactual environment model learning

Xiong-Hui Chen, Yang Yu, Zheng-Mao Zhu, Zhihua Yu, Zhenjun Chen, Chenghe Wang, Yinan Wu, Hongqiu Wu, Rong-Jun Qin, Ruijin Ding, Fangsheng Huang.

In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023, (Spotlight), CoRR abs/2206.04890.

Learning World models with identifiable factorization

Yu-Ren Liu, Biwei Huang, Zheng-Mao Zhu, Honglong Tian, Mingming Gong, Yang Yu, Kun Zhang.

In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023, https://arxiv.org/abs/2306.06561.

Natural language-conditioned reinforcement learning with inside-out task language development and translation

Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu.

In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023, CoRR abs/2302.09368.

Imitation learning from imperfection: Theoretical justifications and algorithms

Ziniu Li, Tian Xu, Zeyu Qin, Yang Yu, Zhi-Quan Luo.

In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023, (Spotlight) CoRR abs/2301.11687.

Model-based reinforcement learning with multi-step plan value estimation

Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu.

In: Proceedings of the 26th European Conference on Artificial Intelligence (ECAI'23), Kraków, Poland, 2023, CoRR abs/2209.05530.

Degradation-resistant offline optimization via accumulative risk control

Huakang Lu, Hong Qian, Yupeng Wu, Ziqi Liu, Ya-Lin Zhang, Aimin Zhou, Yang Yu.

In: Proceedings of the 26th European Conference on Artificial Intelligence (ECAI'23), Kraków, Poland, 2023, CoRR abs/2209.05530.

Object-oriented option framework for robotics manipulation in clutter

Jing-Cheng Pang, Si-Hang Yang, Xiong-Hui Chen, Xinyu Yang, Yang Yu, Mas Ma, Ziqi Guo, Howard Yang, Bill Huang.

In: Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'23), 2023.

Internal logical induction for pixel-symbolic reinforcement learning

Jiacheng Xu, Chao Chen, Fuxiang Zhang, Lei Yuan, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'23), Long Beach, CA, 2023.

Provably efficient adversarial imitation learning with unknown transitions

Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo.

In: Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI'23), Pittsburgh, PA, 2023.

Fast teammate adaptation in the presence of sudden policy change

Ziqian Zhang, Lei Yuan, Lihe Li, Ke Xue, Chengxing Jia, Cong Guan, Chao Qian, Yang Yu.

In: Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI'23), Pittsburgh, PA, 2023.

Model-Bellman inconsistency for model-based offline reinforcement learning

Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'23), Honolulu, HA, 2023.

Policy regularization with dataset constraint for offline reinforcement learning

Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 40th International Conference on Machine Learning (ICML'23), Honolulu, HA, 2023.

AliExpress learning-to-rank: Maximizing online model performance without going online

Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Wen-Ji Zhou, Qing Da, Anxiang Zeng, Yang Yu, and Zhi-Hua Zhou.

IEEE Transactions on Knowledge and Data Engineering, 35(2): 1214-1226, 2023. CoRR abs/2003.11941.

Sim2Rec: A simulator-based decision-making approach to optimize real-world long-term user engagement in sequential recommender systems

Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei (Tony) Qin, Wenjie Shang, Jieping Ye, Chen Ma.

In: Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE'23), 2023.

Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data

Fuxiang Zhang, Chengxing Jia, Yi-Chen Li, Lei Yuan, Yang Yu, Zongzhang Zhang.

In: Proceedings of the 11th International Conference on Learning Representations (ICLR'23), Kigali, Rwanda, 2023.

How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement](https://dl.acm.org/doi/abs/10.5555/3545946.3598773)

Xuhui Liu, Feng Xu, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Ruifeng Chen, Zongzhang Zhang and Yang Yu.

In: Proceedings of the 22th International nference on Autonomous Agents and MultiAgent Systems (AAMAS'23), 2023.

Self-Motivated Multi-Agent Exploration

Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu and De-Chuan Zhan.

In: Proceedings of the 22th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS'23), 2023.

Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers

Lei Yuan, Zi-Qian Zhang, Ke Xue, Hao Yin, Feng Chen, Cong Guan, Li-He Li, Chao Qian, Yang Yu.

In: Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI'23), 2023.

Policy-independent behavioral metric-based representation for deep reinforcement learning

Wei-Jian Liao, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI'23), 2023.

2022

Multi-agent policy transfer via task relationship modeling

Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Yipeng Kang, Zongzhang Zhang, Chongjie Zhang, Yang Yu.

In: NeurIPS'22 Workshop on Deep RL, 2022.

On efficient reinforcement learning for full-length game of StarCraft II

Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu.

Journal of Artificial Intelligence Research, 75:213-260 , 2022.

Cascaded algorithm selection with extreme-region UCB bandit

Yi-Qi Hu, Xu-Hui Liu, Shu-Qiao Li, Yang Yu.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6782-6794, 2022.

Error bounds of imitating policies and environments for reinforcement learning

Tian Xu, Ziniu Li, Yang Yu.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6968-6980, 2022.

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

Rong-Jun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 35 (NeurIPS'22, Datasets and Benchmarks), New Orleans, LA, 2022. CoRR abs/2102.00714.

Efficient Multi-agent Communication via Self-supervised Information Aggregation

Cong Guan, Feng Chen, Lei Yuan, Chenghe Wang, Hao Yin, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 35 (NeurIPS'22), New Orleans, LA, 2022.

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 35 (NeurIPS'22), New Orleans, LA, 2022.

Multi-agent Dynamic Algorithm Configuration

Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 35 (NeurIPS'22), New Orleans, LA, 2022.

Efficient reinforcement learning for StarCraft by abstract forward models and transfer learning

Ruo-Ze Liu, Haifeng Guo, Xiaozhong Ji, Yang Yu, Zhen-Jia Pang, Zitai Xiao, Yuzhou Wu, Tong Lu.

IEEE Transactions on Games, 14(2): 294-307, 2022.

The teaching dimension of regularized kernel learners

Hong Qian, Xu-Hui Liu, Chen-Xi Su, Aimin Zhou, Yang Yu.

In: Proceedings of the 39th International Conference on Machine Learning (ICML'22), 2022.

Efficient multi-agent communication via shapley message value

Di Xue, Lei Yuan, Zongzhang Zhang, Yang Yu.

In: Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI'22), Virtual Conference, 2022.

Multi-agent concentrative coordination with decentralized task representation

Lei Yuan, Chenghe Wang, Jianhao Wang, Fuxiang Zhang, Feng Chen, Cong Guan, Zongzhang Zhang, Chongjie Zhang, Yang Yu.

In: Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI'22), Virtual Conference, 2022.

Rethinking ValueDice: Does It Really Improve Performance?

Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo.

In: Blog Track at 10th International Conference on Learning Representations (ICLR'22 Blog Track), 2022, CoRR abs/2202.02468.

Context-aware sparse deep coordination graphs

Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang.

In: Proceedings of the 10th International Conference on Learning Representations (ICLR'22), Virtual Conference, 2022.

Active hierarchical exploration with stable subgoal representation learning

Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang.

In: Proceedings of the 10th International Conference on Learning Representations (ICLR'22), Virtual Conference, 2022.

Learning efficient online 3D bin packing on packing configuration trees

Hang Zhao, Yang Yu, Kai Xu.

In: Proceedings of the 10th International Conference on Learning Representations (ICLR'22), Virtual Conference, 2022.

Improve generated adversarial imitation learning with reward variance regularization

Yi-Feng Zhang, Fan-Ming Luo, Yang Yu.

Machine Learning, 2022.

Adapt to environment sudden changes by learning context sensitive policy

Fan-Ming Luo, Shengyi Jiang, Yang Yu, Zongzhang Zhang, Yi-Feng Zhang.

In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), Virtual Conference, 2022.

Invariant action effect model for reinforcement learning

Zheng-Mao Zhu, Shengyi Jiang, Yu-Ren Liu, Yang Yu, Kun Zhang.

In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), Virtual Conference, 2022.

Multi-agent incentive communication via decentralized teammate modeling

Lei Yuan, Jianhao Wang, Fuxiang Zhang, Chenghe Wang, Zongzhang Zhang, Yang Yu, Chongjie Zhang.

In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), Virtual Conference, 2022.

ZOOpt: Toolbox for derivative-free optimization

Yu-Ren Liu, Yi-Qi Hu, Hong Qian, Yang Yu, and Chao Qian.

SCIENCE CHINA Information Sciences, 65: 207101, 2022. CoRR abs/1801.00329.

2021

More efficient adversarial imitation learning algorithms with known and unknown transitions

Tian Xu, Ziniu Li, Yang Yu.

In: Ecological Theory of RL Workshop in NeurIPS 2021.

Offline model-based adaptable policy learning

Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye.

In: Advances in Neural Information Processing Systems 34 (NeurIPS'21), Virtual Conference, 2021.

Regret minimization experience replay in off-policy reinforcement learning

Xu-Hui Liu, Zhenghai Xue, Jing-Cheng Pang, Shengyi Jiang, Feng Xu, Yang Yu.

In: Advances in Neural Information Processing Systems 34 (NeurIPS'21), Virtual Conference, 2021.

Cross-modal domain adaptation for cost-efficient visual reinforcement learning

Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu.

In: Advances in Neural Information Processing Systems 34 (NeurIPS'21), Virtual Conference, 2021.

Adaptive online packing-guided search for POMDPs

Chenyang Wu, Guoyu Yang, Zongzhang Zhang, Yang Yu, Dong Li, Wulong Liu, Jianye Hao.

In: Advances in Neural Information Processing Systems 34 (NeurIPS'21), Virtual Conference, 2021.

Fast Pareto optimization for subset selection with dynamic cost constraints

Chao Bian, Chao Qian, Frank Neumann, and Yang Yu.

In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI'21), Virtual Conference, 2021.

QPLEX: Duplex dueling multi-agent Q-Learning

Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, and Chongjie Zhang.

In: Proceedings of the 9th International Conference on Learning Representations (ICLR'21), Virtual Conference, 2021.

Sequential and dynamic constraint contrastive learning for reinforcement learning

Weijie Shen, Lei Yuan, Junfu Huang, Songyi Gao, Yuyang Huang, Yang Yu.

In: Proceedings of the IEEE 2021 International Joint Conference on Neural Networks (IJCNN'21), Shenzhen, China, 2021.

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

Wenjie Shang, Qingyang Li, Zhiwei Qin, Yang Yu, Yiping Meng, Jieping Ye.

Machine Learning, 110(9): 2603-2640, 2021.

Improving Search Engine Efficiency through Contextual Factor Selection

Anxiang Zeng, Han Yu, Qing Da, Yusen Zhan, Yang Yu, Jingren Zhou, Chunyan Miao.

AI Magazine, 42(2): 50-58, 2021.

Derivative-free reinforcement learning: A review

Hong Qian, Yang Yu.

Frontiers of Computer Science, 15(6): 156336, 2022. CoRR abs/2102.05710.

Machine learning steered symbolic execution framework for complex software code

Lei Bu, Yongjuan Liang, Zhunyi Xie, Hong Qian, Yi-Qi Hu, Yang Yu, Xin Chen, Xuandong Li.

Formal Aspects of Computing, 33(3): 301-323, 2021.

Analysis of Noisy Evolutionary Optimization When Sampling Fails

Chao Qian, Chao Bian, Yang Yu, Ke Tang, and Xin Yao.

Algorithmica, 83(4): 940-975, 2021.

On the robustness of median sampling in noisy evolutionary optimization

Chao Bian, Chao Qian, Yang Yu, Ke Tang.

Science China Information Sciences, 64(5), 2021.

2020

Error bounds of imitating policies and environments

Tian Xu, Ziniu Li, Yang Yu.

In: Advances in Neural Information Processing Systems 33 (NeurIPS'20), Virtual Conference, 2020. (PDF).

Offline imitation learning with a misspecified simulator

Shengyi Jiang, Jing-Cheng Pang, Yang Yu.

In: Advances in Neural Information Processing Systems 33 (NeurIPS'20), Virtual Conference, 2020. (PDF).

Running time analysis of the (1+1)-EA for robust linear optimization

Chao Bian, Chao Qian, Ke Tang, Yang Yu.

Theoretical Computer Science, 843: 57-72, 2020.

A technical view on neural architecture search

Yi-Qi Hu and Yang Yu.

International Journal on Machine Learning and Cybernetics, 11(4): 795-811, 2020.

Reinforcement learning with action-specific focuses in video games

Meng Wang, Yingfeng Chen, Tangjie Lv, Yan Song, Kai Guan, Changjie Fan, Yang Yu.

In: Proceedings of IEEE 2020 Conference on Games (CoG'20), Osaka, Japan, 2020, pp.9-16.

Derivative-free optimization with adaptive experience for efficient hyper-parameter tuning

Yi-Qi Hu, Zelin Liu, Hua Yang, Yang Yu, and Yunfeng Liu.

In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI'20), Santiago de Compostela, Spain, 2020. (PDF).

An efficient evolutionary algorithm for subset selection with general cost constraints

Chao Bian, Chao Feng, Chao Qian, and Yang Yu.

In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI'20), New York, NY, 2020. (PDF).

Enhancing neural mathematical reasoning by abductive combination with symbolic library

Yangyang Hu, Yang Yu.

In: ICML 2020 Workshop on Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond, Vienna, Austria, 2020.

Papers ​

2026 ​

2025 ​

2024 ​

2023 ​

2022 ​

2021 ​

2020 ​

Papers

2026

2025

2024

2023

2022

2021

2020