[J34] Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning
Weiye Zhao, Tairan He, Feihan Li and Changliu Liu
Journal of Artificial Intelligence Research, 2025
Citation Formats:
Abstract:
Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining 95%±9% cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.
[J29] Physics-Aware Combinatorial Assembly Sequence Planning using Data-free Action Masking
Ruixuan Liu, Alan Chen, Weiye Zhao and Changliu Liu
IEEE Robotics and Automation Letters, 2025
Citation Formats:
Abstract:
Combinatorial assembly uses standardized unit primitives to build objects that satisfy user specifications. This paper studies assembly sequence planning (ASP) for physical combinatorial assembly. Given the shape of the desired object, the goal is to find a sequence of actions for placing unit primitives to build the target object. In particular, we aim to ensure the planned assembly sequence is physically executable. However, ASP for combinatorial assembly is particularly challenging due to its combinatorial nature. To address the challenge, we employ deep reinforcement learning to learn a construction policy for placing unit primitives sequentially to build the desired object. Specifically, we design an online physics-aware action mask that filters out invalid actions, which effectively guides policy learning and ensures violation-free deployment. In the end, we apply the proposed method to Lego assembly with more than 250 3D structures. The experiment results demonstrate that the proposed method plans physically valid assembly sequences to build all structures, achieving a 100% success rate, whereas the best comparable baseline fails more than 40 structures.
Video:
Teaser:
[C95] Continual Learning and Lifting of Koopman Dynamics for Linear Control of Legged Robots
Feihan Li, Abulikemu Abuduweili, Yifan Sun, Rui Chen, Weiye Zhao and Changliu Liu
Learning for Dynamics and Control Conference, 2025
Citation Formats:
Abstract:
The control of legged robots, particularly humanoid and quadruped robots, presents significant challenges due to their high-dimensional and nonlinear dynamics. While linear systems can be effectively controlled using methods like Model Predictive Control (MPC), the control of nonlinear systems remains complex. One promising solution is the Koopman Operator, which approximates nonlinear dynamics with a linear model, enabling the use of proven linear control techniques. However, achieving accurate linearization through data-driven methods is difficult due to issues like approximation error, domain shifts, and the limitations of fixed linear state-space representations. These challenges restrict the scalability of Koopman-based approaches. This paper addresses these challenges by proposing a continual learning algorithm designed to iteratively refine Koopman dynamics for high-dimensional legged robots. The key idea is to progressively expand the dataset and latent space dimension, enabling the learned Koopman dynamics to converge towards accurate approximations of the true system dynamics. Theoretical analysis shows that the linear approximation error of our method converges monotonically. Experimental results demonstrate that our method achieves high control performance on robots like Unitree G1/H1/A1/Go2 and ANYmal D, across various terrains using simple linear MPC controllers. This work is the first to successfully apply linearized Koopman dynamics for locomotion control of high-dimensional legged robots, enabling a scalable model-based control solution.
Video:
[C94] Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization
Feihan Li, Yifan Sun, Weiye Zhao, Rui Chen, Tianhao Wei and Changliu Liu
Learning for Dynamics and Control Conference, 2025
Citation Formats:
Abstract:
Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL agent learns from those mistakes. On the other hand, safe control techniques ensure persistent safety satisfaction but demand strong priors on system dynamics, which is usually hard to obtain in practice. To address these problems, we present Safe Set Guided State-wise Constrained Policy Optimization (S-3PO), a pioneering algorithm generating state-wise safe optimal policies with zero training violations, i.e., learning without mistakes. S-3PO first employs a safety-oriented monitor with black-box dynamics to ensure safe exploration. It then enforces an "imaginary" cost for the RL agent to converge to optimal behaviors within safety constraints. S-3PO outperforms existing methods in high-dimensional robotics tasks, managing state-wise constraints with zero training violation. This innovation marks a significant stride towards real-world safe RL deployment.
[C101] SPARK: Safe Protective and Assistive Robot Kit
Yifan Sun, Rui Chen, Kai S Yun, Yikuan Fang, Sebin Jung, Feihan Li, Bowei Li, Weiye Zhao and Changliu Liu
IFAC Symposium on Robotics, 2025
Citation Formats:
Abstract:
This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. To facilitate the safe deployment of complex robot systems, SPARK can be used as a toolbox that comes with state-of-the-art safe control algorithms in a modular and composable robot control framework. Users can easily configure safety criteria and sensitivity levels to optimize the balance between safety and performance. To accelerate humanoid safety research and development, SPARK provides a simulation benchmark that compares safety approaches in a variety of environments, tasks, and robot models. Furthermore, SPARK allows quick deployment of synthesized safe controllers on real robots. For hardware deployment, SPARK supports Apple Vision Pro (AVP) or a Motion Capture System as external sensors, while also offering interfaces for seamless integration with alternative hardware setups. This paper demonstrates SPARK’s capability with both simulation experiments and case studies with a Unitree G1 humanoid robot. Leveraging these advantages of SPARK, users and researchers can significantly improve the safety of their humanoid systems as well as accelerate relevant research. The open-source code is available at (https://github.com/intelligent-control-lab/spark)
Video:
2024
[J26] Improve Certified Training with Signal-to-Noise Ratio Loss to Decrease Neuron Variance and Increase Neuron Stability
Tianhao Wei, Ziwei Wang, Peizhi Niu, Abulikemu Abuduweili, Weiye Zhao, Casidhe Hutchison, Eric Sample and Changliu Liu
Transactions on Machine Learning Research, 2024
Citation Formats:
Abstract:
Neural network robustness is a major concern in safety-critical applications. Certified robustness provides a reliable lower bound on worst-case robustness, and certified training methods have been developed to enhance it. However, certified training methods often suffer from over-regularization, leading to lower certified robustness. This work addresses this issue by introducing the concepts of neuron variance and neuron stability, examining their impact on over-regularization and model robustness. To tackle the problem, we extend the Signal-to-Noise Ratio (SNR) into the realm of model robustness, offering a novel perspective and developing SNR-inspired losses aimed at optimizing neuron variance and stability to mitigate over-regularization. Through both empirical and theoretical analysis, our SNR-based approach demonstrates superior performance over existing methods on the MNIST and CIFAR-10 datasets. In addition, our exploration of adversarial training uncovers a beneficial correlation between neuron variance and adversarial robustness, leading to an optimized balance between standard and robust accuracy that outperforms baseline methods.
[J24] State-wise Constrained Policy Optimization
Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei and Changliu Liu
Transactions on Machine Learning Research, 2024
Citation Formats:
[J23] Guard: A safe reinforcement learning benchmark
Weiye Zhao, Rui Chen, Yifan Sun, Ruixuan Liu, Tianhao Wei and Changliu Liu
Transactions on Machine Learning Research, 2024
Citation Formats:
[C83] Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills
Tianhao Wei, Liqian Ma, Rui Chen, Weiye Zhao and Changliu Liu
Conference on Robot Learning, 2024
Citation Formats:
Abstract:
The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development of a universal robotic foundation model. In this work, we propose Meta-Control, the first LLM-enabled automatic control synthesis approach that creates customized state representations and control strategies tailored to specific tasks. Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems. Specifically, human experts heavily use a model-based, hierarchical (from abstract to concrete) thought model, then compose various dynamic models and controllers together to form a control system. Meta-Control mimics the thought model and harnesses LLM’s extensive control knowledge with Socrates’ "art of midwifery" to automate the thought process. Meta-Control stands out for its fully model-based nature, allowing rigorous analysis, generalizability, robustness, efficient parameter tuning, and reliable real-time execution.
Video:
[C77] Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence
Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei and Changliu Liu
International Conference on Machine Learning, 2024
Citation Formats:
[C70] Real-time Safety Index Adaptation for Parameter-varying Systems via Determinant Gradient Ascend
Rui Chen, Weiye Zhao, Ruixuan Liu, Weiyang Zhang and Changliu Liu
American control Conference, 2024
Citation Formats:
[C69] Hybrid Task Constrained Incremental Planner for Robot Manipulators in Confined Environments
Yifan Sun, Weiye Zhao and Changliu Liu
American Control Conference, 2024
Citation Formats:
[C67] Safety Index Synthesis with State-dependent Control Space
Rui Chen, Weiye Zhao and Changliu Liu
American Control Conference, 2024
Citation Formats:
[T2] State-wise Safe Learning and Control
Weiye Zhao
PhD Thesis, 2024
Citation Formats:
Abstract:
Ensuring safety by persistently satisfying hard state constraints is a critical capability in the fields of reinforcement learning (RL) and control. While RL and control have achieved impressive feats in performing challenging tasks, the lack of safety assurance remains a significant obstacle for real-world applications. Consequently, the research focus has shifted towards developing methods that meet stringent safety specifications in uncertain environments, driving the field of safe learning and control.
In the realm of safe control, energy function-based methodologies allocate diminished energy levels to safe states while orchestrating secure control laws to dissipate energy. However, prevailing safe control methods necessitate explicit analytical models of the dynamic system, a constraint often unmet in real-world scenarios. Moreover, current safe control techniques typically presuppose an unbounded control space, a premise divergent from the bounded nature of actual control spaces in reality. This incongruity can lead to the possibility of an empty set of safe controls, thereby jeopardizing the assurance of state-wise safety.
In the sphere of safe RL, extensive endeavors have been undertaken to address safety within the framework of Constrained Markov Decision Processes (CMDP), which is not capable of handling state-wise safety constraints. Furthermore, existing safe RL algorithms predominantly acquire policy learning through trial-and-error, a process that introduces inevitable unsafe exploration. This characteristic renders them unsuitable for training in real-world, safety-critical applications.
In this thesis, we present groundbreaking advancements in RL and control, ensuring state-wise safety by effectively addressing these challenges: (i) For safe control, we design energy functions to ensure a nonempty set of safe controls under dynamics limits and different knowledge levels of system dynamics, which can achieve forward invariance and finite time convergence. (ii) In safe learning, we propose a set of novel policy search algorithms for state-wise constrained RL. Specifically, (a) State-wise Constrained Policy Optimization (SCPO) guarantees state-wise constraint satisfaction in expectation per iteration, (b) Absolute Policy Optimization (APO) guarantees monotonic improvement of worst-case performance per iteration, and (c) Absolute State-wise Constrained Policy Optimization (ASCPO) guarantees worst-case state-wise constraint satisfaction per iteration. The proposed approaches accommodates high-dimensional neural network policies. Furthermore, we combine benefits from safe control and learning to pioneer an algorithm generating state-wise safe optimal policies with zero training violations, a learning-without-mistakes paradigm. (iii) Lastly, we introduce a comprehensive and adaptable benchmark, the first of its kind, for safe RL and control. This benchmark caters to diverse agents, tasks, and safety constraints, while offering unified implementations of cutting-edge safe learning and control algorithms within a controlled environment.
Enforcing state-wise safety constraints is critical for the application of reinforcement learning (RL) in real-world problems, such as autonomous driving and robot manipulation. However, existing safe RL methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. The former does not exclude the probability of safety violations, while the latter is impractical. Our insight is that although it is intractable to guarantee hard state-wise constraints in a model-free setting, we can enforce state-wise safety with high probability while excluding strong assumptions. To accomplish the goal, we propose Absolute State-wise Constrained Policy Optimization (ASCPO), a novel general-purpose policy search algorithm that guarantees high-probability state-wise constraint satisfaction for stochastic systems. We demonstrate the effectiveness of our approach by training neural network policies for extensive robot locomotion tasks, where the agent must adhere to various state-wise safety constraints. Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks, highlighting its potential for real-world applications.
2023
[J16] A hierarchical long short term safety framework for efficient robot manipulation under uncertainty
Suqin He, Weiye Zhao, Chuxiong Hu, Yu Zhu and Changliu Liu
Robotics and Computer-Integrated Manufacturing, 2023
Citation Formats:
[C60] State-wise safe reinforcement learning: A survey
Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei and Changliu Liu
International Joint Conferences on Artificial Intelligence, 2023
Citation Formats:
[C57] Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models
Weiye Zhao, Tairan He and Changliu Liu
Learning for Dynamics and Control Conference, 2023
Citation Formats:
[C56] Safety index synthesis via sum-of-squares programming
Weiye Zhao, Tairan He, Tianhao Wei, Simin Liu and Changliu Liu
American Control Conference, 2023
Citation Formats:
[C54] Autocost: Evolving intrinsic cost for zero-violation reinforcement learning
Tairan He, Weiye Zhao and Changliu Liu
Proceedings of the AAAI Conference on Artificial Intelligence, 2023
Citation Formats:
[U] Learning predictive safety filter via decomposition of robust invariant set
Zeyang Li, Chuxiong Hu, Weiye Zhao and Changliu Liu
arXiv:2311.06769, 2023
Citation Formats:
2022
[J15] Persistently feasible robust safe control by safety index synthesis and convex semi-infinite programming
Tianhao Wei, Shucheng Kang, Weiye Zhao and Changliu Liu
IEEE Control Systems Letters, 2022
Citation Formats:
[J12] Provably Safe Tolerance Estimation for Robot Arms via Sum-of-Squares Programming
Weiye Zhao, Suqin He and Changliu Liu
IEEE Control Systems Letters, 2022
Citation Formats:
Video:
2021
[C38] Model-free Safe Control for Zero-Violation Reinforcement Learning
Weiye Zhao, Tairan He and Changliu Liu
Conference on Robot Learning, 2021
Citation Formats:
Abstract:
Maintaining safety under adaptation has long been considered to be an important capability for autonomous systems. As these systems estimate and change the ego-model of the system dynamics, questions regarding how to develop safety guarantees for such systems continue to be of interest. We propose a novel robust safe control methodology that uses set-based safety constraints to make a robotic system with dynamical uncertainties safely adapt and operate in its environment. The method consists of designing a scalar energy function (safety index) for an adaptive system with parametric uncertainty and an optimization-based approach for control synthesis. Simulation studies on a two-link manipulator are conducted and the results demonstrate the effectiveness of our proposed method in terms of generating provably safe control for adaptive systems with parametric uncertainty.
[C37] Safe Adaptation with Multiplicative Uncertainties Using Robust Safe Set Algorithm
Charles Noren, Weiye Zhao and Changliu Liu
Modeling, Estimation, and Control Conference, 2021
Citation Formats:
Abstract:
Maintaining safety under adaptation has long been considered to be an important capability for autonomous systems. As these systems estimate and change the ego-model of the system dynamics, questions regarding how to develop safety guarantees for such systems continue to be of interest. We propose a novel robust safe control methodology that uses set-based safety constraints to make a robotic system with dynamical uncertainties safely adapt and operate in its environment. The method consists of designing a scalar energy function (safety index) for an adaptive system with parametric uncertainty and an optimization-based approach for control synthesis. Simulation studies on a two-link manipulator are conducted and the results demonstrate the effectiveness of our proposed method in terms of generating provably safe control for adaptive systems with parametric uncertainty.
2020
[C24] Experimental Evaluation of Human Motion Prediction: Toward Safe and Efficient Human Robot Collaboration
Weiye Zhao, Liting Sun, Changliu Liu and Masayoshi Tomizuka
American Control Conference, 2020
Citation Formats:
Abstract:
Human motion prediction is non-trivial in modern industrial settings. Accurate prediction of human motion can not only improve efficiency in human robot collaboration, but also enhance human safety in close proximity to robots. Among existing prediction models, the parameterization and identification methods of those models vary. It remains unclear what is the necessary parameterization of a prediction model, whether online adaptation of the model is necessary, and whether prediction can help improve safety and efficiency during human robot collaboration. These problems result from the difficulty to quantitatively evaluate various prediction models in a closed-loop fashion in real human-robot interaction settings. This paper develops a method to evaluate the closed-loop performance of different prediction models. In particular, we compare models with different parameterizations and models with or without online parameter adaptation. Extensive experiments were conducted on a human robot collaboration platform. The experimental results demonstrated that human motion prediction significantly enhanced the collaboration efficiency and human safety. Adaptable prediction models that were parameterized by neural networks achieved the best performance.
Video:
2019
[C20] Human motion prediction using semi-adaptable neural networks
Yujiao Cheng, Weiye Zhao, Changliu Liu and Masayoshi Tomizuka
American Control Conference, 2019
Citation Formats:
Abstract:
Human motion prediction is an important component to facilitate human robot interaction. Robots need to accurately predict human’s future movement in order to efficiently collaborate with humans, as well as to safely plan its own motion trajectories. Many recent approaches predict human’s future movement using deep learning methods, such as recurrent neural networks. However, existing methods lack the ability to adapt to time-varying human behaviors. Moreover, many of them do not quantify uncertainties in the prediction. This paper proposes a new approach that uses an adaptable neural network for human motion prediction, in order to accommodate human’s time-varying behaviors and to provide uncertainty bounds of the predictions in real time. In particular, a neural network is trained offline to represent the human motion transition model. Recursive least square parameter adaptation algorithm (RLS-PAA) is adopted for online parameter adaptation of the neural network and for uncertainty estimation. Experiments on several human motion datasets verify that the proposed method outperforms the state-of-the-art approach with a significant improvement in terms of prediction accuracy and computation efficiency.