Over 450 Total Lots Up For Auction at Three Locations - CO 05/12, PA 05/15, NY 05/20

AI model 'learns' from patient data to make cancer treatment less toxic

Press releases may be edited for formatting or style | August 13, 2018 Artificial Intelligence Rad Oncology Radiation Therapy

Rewards and penalties are basically positive and negative numbers, say +1 or -1. Their values vary by the action taken, calculated by probability of succeeding or failing at the outcome, among other factors. The agent is essentially trying to numerically optimize all actions, based on reward and penalty values, to get to a maximum outcome score for a given task.

The approach was used to train the computer program DeepMind that in 2016 made headlines for beating one of the world's best human players in the game "Go." It's also used to train driverless cars in maneuvers, such as merging into traffic or parking, where the vehicle will practice over and over, adjusting its course, until it gets it right.

stats Advertisement
DOTmed text ad

Training and education based on your needs

Stay up to date with the latest training to fix, troubleshoot, and maintain your critical care devices. GE HealthCare offers multiple training formats to empower teams and expand knowledge, saving you time and money

stats

The researchers adapted an RL model for glioblastoma treatments that use a combination of the drugs temozolomide (TMZ) and procarbazine, lomustine, and vincristine (PVC), administered over weeks or months.

The model's agent combs through traditionally administered regimens. These regimens are based on protocols that have been used clinically for decades and are based on animal testing and various clinical trials. Oncologists use these established protocols to predict how much doses to give patients based on weight.

As the model explores the regimen, at each planned dosing interval -- say, once a month -- it decides on one of several actions. It can, first, either initiate or withhold a dose. If it does administer, it then decides if the entire dose, or only a portion, is necessary. At each action, it pings another clinical model -- often used to predict a tumor's change in size in response to treatments -- to see if the action shrinks the mean tumor diameter. If it does, the model receives a reward.

However, the researchers also had to make sure the model doesn't just dish out a maximum number and potency of doses. Whenever the model chooses to administer all full doses, therefore, it gets penalized, so instead chooses fewer, smaller doses. "If all we want to do is reduce the mean tumor diameter, and let it take whatever actions it wants, it will administer drugs irresponsibly," Shah says. "Instead, we said, 'We need to reduce the harmful actions it takes to get to that outcome.'"

This represents an "unorthodox RL model, described in the paper for the first time," Shah says, that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction). Traditional RL models work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome. On the other hand, the researchers' model, at each action, has flexibility to find a dose that doesn't necessarily solely maximize tumor reduction, but that strikes a perfect balance between maximum tumor reduction and low toxicity. This technique, he adds, has various medical and clinical trial applications, where actions for treating patients must be regulated to prevent harmful side effects.

You Must Be Logged In To Post A Comment