Skip to content
Your Image Description

LAMDA RL LAB is a subgroup of LAMDA that focuses on advancing the field of reinforcement learning (RL) and its application to creating general decision-making intelligence. Key areas we are exploring include: model-based RL and world model learning, multi-agent and collaborative RL, planning and learning with large models, etc. Through both fundamental and application research, our aim is to create RL-based systems that exhibit general decision-making capabilities.

Highlights

Supported Product

REVIVE is Polixir's next-generation intelligent decision-making system that simplifies complex processes into easy workflows, enabling reinforcement learning control algorithms to be applied in real-world industrial scenarios.

Recent News

(in Chinese)
  1. NeoRL-2:面向现实场景的离线强化学习基准测试

    项目页面: https://github.com/polixir/NeoRL2 文章链接:NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios 研究背景与意义强化学习...
  2. ICLR 2025| 强化学习中基于视觉语言模型的时序抽象

    在俞老师 @俞扬 的指导下,我们完成了强化学习中基于视觉语言模型的时序抽象这一工作,被ICLR2025接收。 在我们组之前的工作中,[1]和[2]均是借助LLM的通用能力,想象出了部分数据,从而提升了强化学习agent的通用性。由此可见,预训练大模型从通用语料中学到了决策相关的知识...
  3. ICLR 2025 | Q-Adapter: 使用个性化人类偏好定制语言模型的同时避免遗忘

    最近ICLR放榜,我们在语言模型定制化方面的工作Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation有幸被接收,在这里跟大家分享一下...
  4. ICLR 2025 | ReViWo:无惧视角切换的机器人控制

    分享一下我们最近被ICLR'25接收的论文:《Learning View-invariant World Models for Visual Robotic Manipulation》。这项工作主要由 @俞扬 老师指导,也是我去年在日本理化研究所访问期间合作完成的工作...
  5. ICLR 2025|MADiTS:基于扩散轨迹拼接的高效协作多智能体强化学习新方法

    在 @俞扬 老师的指导下,我们近期完成了一篇基于扩散模型的多智能体轨迹拼接工作 (MADiTS: Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching...
  6. ICLR 25 | 低秩优化算法的优化地形研究

    论文题目:On the Optimization Landscape of Low-Rank Adaptation Methods for Large Language Models 论文链接: https://openreview.net/pdf...
  7. ICLR 2025 | ADMPO:跨任意步预测的动力学模型能够有效提升有模型强化学习

    24年在 @俞扬 老师的指导下完成了一个关于model-based reinforcement learning (MBRL)的工作,方法简单有效,已被ICLR'2025接收。这也是我个人博士阶段的第一篇一作文章,在这里分享下文章的主要内容...
  8. NeurIPS'24 Oral | 让机器从教程书籍里学会决策(Policy Learning from Tutorial Books)

    错峰和大家分享一下我们最近发表在NeurIPS’24的oral 工作,《Policy learning from Tutorial Books via Understanding,Rehearsing and Introspecting》,本文也是我们的oral presentation的修改文稿 为什么要从书里学策略 近年来...
  9. NeurIPS 2024 | KALM:大语言模型的知识可用于强化学习训练

    分享一下我们NeurIPS 2024大语言模型驱动强化学习的工作KALM:《Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts》...
  10. BWArea Model: 决策视角下的可控语言生成

    前言在前段时间,在俞老师 @俞扬 的指导下,和鹏远师弟、子牛师兄 @李子牛 以及组内其他师弟一起做了可控语言模型方向上的探索[1],也对我们的做的工作简单介绍一下。随着语言模型的发展,大家对语言模型的要求也在不断提高,希望大语言模型去完成更加复杂和精确的任务...

LAMDA  RL LAB
School of Artificial Intelligence
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Contact us

yuanl AT lamda DOT nju DOT edu DOT cn

Yi Fu Building, Xianlin Campus