Skip to content
Your Image Description

LAMDA RL LAB is a subgroup of LAMDA that focuses on advancing the field of reinforcement learning (RL) and its application to creating general decision-making intelligence. Key areas we are exploring include: model-based RL and world model learning, multi-agent and collaborative RL, planning and learning with large models, etc. Through both fundamental and application research, our aim is to create RL-based systems that exhibit general decision-making capabilities.

Highlights

Supported Product

REVIVE is Polixir's next-generation intelligent decision-making system that simplifies complex processes into easy workflows, enabling reinforcement learning control algorithms to be applied in real-world industrial scenarios.

Recent News

(in Chinese)
  1. ICML 2025 | Lapse: 面对状态可演变环境的强化学习策略复用方法

    非常高兴我们的工作《Learning to Reuse Policies in State Evolvable Environments》 被 ICML 2025 接收,这项工作致力于解决强化学习智能体在真实世界部署时,环境状态空间变化而导致策略失效的挑战...
  2. ICML 2025 | APEC:利用对抗模仿学习过程自动生成偏好数据,提升奖励模型泛化能力

    非常高兴我们的工作《Improving Reward Model Generalization from Adversarial Process Enhanced Preferences》已被 ICML 2025 接收!这是我们在奖励建模(Reward Modeling...
  3. ICML 2025 | CoLA: 基于latent action控制的语言模型

    在前段时间,在俞老师的指导下,我们进一步的考虑了语言模型可持续演化的结构,基于之前在模型结构上的探索 BWArea Model: 决策视角下的可控语言生成 ,设计了一种可以更加高效的做强化学习的语言模型结构:CoLA。这一工作也被今年的ICML25接收...
  4. ICML 2025 | 大语言模型辅助的语义层面多样队友生成

    最近, 我们提出了大语言模型辅助的语义层面多样队友生成方法 LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent Coordination (SemDiv)...
  5. NeoRL-2:面向现实场景的离线强化学习基准测试

    项目页面: https://github.com/polixir/NeoRL2 文章链接:NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios 研究背景与意义强化学习...
  6. ICLR 2025| 强化学习中基于视觉语言模型的时序抽象

    在俞老师 @俞扬 的指导下,我们完成了强化学习中基于视觉语言模型的时序抽象这一工作,被ICLR2025接收。 在我们组之前的工作中,[1]和[2]均是借助LLM的通用能力,想象出了部分数据,从而提升了强化学习agent的通用性。由此可见,预训练大模型从通用语料中学到了决策相关的知识...
  7. ICLR 2025 | Q-Adapter: 使用个性化人类偏好定制语言模型的同时避免遗忘

    最近ICLR放榜,我们在语言模型定制化方面的工作Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation有幸被接收,在这里跟大家分享一下...
  8. ICLR 2025 | ReViWo:无惧视角切换的机器人控制

    分享一下我们最近被ICLR'25接收的论文:《Learning View-invariant World Models for Visual Robotic Manipulation》。这项工作主要由 @俞扬 老师指导,也是我去年在日本理化研究所访问期间合作完成的工作...
  9. ICLR 2025|MADiTS:基于扩散轨迹拼接的高效协作多智能体强化学习新方法

    在 @俞扬 老师的指导下,我们近期完成了一篇基于扩散模型的多智能体轨迹拼接工作 (MADiTS: Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching...
  10. ICLR 25 | 低秩优化算法的优化地形研究

    论文题目:On the Optimization Landscape of Low-Rank Adaptation Methods for Large Language Models 论文链接: https://openreview.net/pdf...

LAMDA  RL LAB
School of Artificial Intelligence
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Contact us

yuanl AT lamda DOT nju DOT edu DOT cn

Yi Fu Building, Xianlin Campus