Engineering safer machine learning

By Andrew McCollum On Jun 15, 2023

Engineering safer machine learning — Normalized final exposure for µ = 0.1 as a function of , for varying tolerance level α. Larger values of α and achieve lower handicap (which implies faster detection). Credit: *IEEE Transactions on Automatic Control* (2023). DOI: 10.1109/TAC.2023.3240925

Children first learning to walk may go a bit too fast and fall down, or run into a piece of furniture. However, that cause-and-effect element teaches them invaluable information about how their bodies move through space so that they can avoid falling in the future.

Machines learn in a lot of the same ways that humans do, including learning from their mistakes. However, for many machines—like self-driving cars and power systems—learning on the job with human safety at stake presents a problem. As machine learning matures and proliferates, there is a growing interest in applying it to highly complex, safety-critical autonomous systems. The promise of these technologies, however, is hindered by the safety risks inherent in the training process and beyond.

A new research paper challenges the idea that you need an unlimited number of trials to learn safe actions in unfamiliar environments. The paper, published recently in the journal IEEE Transactions on Automatic Control, presents a fresh approach that ensures learning safe actions with complete confidence, while managing the balance between being optimal, encountering dangerous situations, and quickly recognizing unsafe actions.

“Generally, machine learning looks for the most optimized solution, which can result in more errors along the way. That’s problematic when the error could mean crashing into a wall,” explained Juan Andres Bazerque, assistant professor of electrical and computer engineering at the Swanson School of Engineering, who led the research along with Associate Professor Enrique Mallada at Johns Hopkins University.

“In this study, we show that learning safe policies is fundamentally different from learning optimal policies, and that it can be done separately and efficiently.”

The research team conducted studies in two different scenarios to illustrate their concept. By making reasonable assumptions about exploration, they created an algorithm that detects all unsafe actions within a limited number of rounds. The team also tackled the challenge of finding optimal policies for a Markov decision process (MDP) with almost sure constraints.

Their analysis emphasized a tradeoff between the time required to detect unsafe actions in the underlying MDP and the level of exposure to unsafe events. MDP is useful because it provides a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker.

To validate their theoretical findings, the researchers conducted simulations that confirmed the identified tradeoffs. These findings also suggested that incorporating safety constraints can expedite the learning process.

“This research challenges the prevailing belief that learning safe actions requires an unlimited number of trials,” stated Bazerque. “Our results demonstrate that by effectively managing tradeoffs between optimality, exposure to unsafe events, and detection time, we can achieve guaranteed safety without an infinite number of explorations. This has significant implications for robotics, autonomous systems, and artificial intelligence, and more.”

More information:
Agustin Castellano et al, Learning to Act Safely With Limited Exposure and Almost Sure Certainty, IEEE Transactions on Automatic Control (2023). DOI: 10.1109/TAC.2023.3240925

Provided by
University of Pittsburgh

Citation:
Engineering safer machine learning (2023, June 14)
retrieved 14 June 2023
from https://techxplore.com/news/2023-06-safer-machine.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

For all the latest Technology News Click Here

For the latest news and updates, follow us on Google News.

Read original article here

Denial of responsibility! NewsBit.us is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.