by John R. Fischer
, Senior Reporter | June 11, 2019
More than 50 percent of a data scientist’s time is spent cleaning data, according to a Cloudflower 2017 Data Scientist Report.
Michael Garel, director of data strategy for Accruent, gave a presentation at AAMI Exchange in Cleveland this weekend, in which he argued that estimate is actually "quite low."
“From what I’ve seen, data scientists spend most, if not 80 or 90 percent, of their time cleaning data,” he said. “Everybody thinks this data scientist role is the greatest ever. It’s really processing a lot of data.”
Cleaning data refers to the process of detecting and correcting corrupt or inaccurate records so that the true analytic insight can shine through the information.
With over 500 million work orders — more than 230 million in healthcare — Accruent utilizes a number of tools in data analytics, machine learning and deep learning to clean data and uncover insights for completing hospital equipment work orders more efficiently. Which tools to use comes down to what information the user is trying to uncover, the type of work order and the variables involved.
In his presentation, entitled Big Data Insights on Capital Equipment from 500 Million Work Orders
, Garel examined specific uses and scenarios that a few of these tools are best suited for addressing:
Data science is the ability to comprehend and process data, and to extract value from it, visualize it and communicate it. Applying data analytics can be helpful for this, depending on the type of scenarios users are faced with.
- Descriptive analytics – Describes what has happened in the past to understand current conditions, and visualize and communicate insights extracted from data to peers or management.
- Predictive analytics – Predicts what will happen in the future. This form is inherently probabilistic in nature and utilizes historical data to anticipate future performance, events and results.
- Prescriptive analytics – Maps out recommendations for next steps to achieve objectives and goals.
Machine learning is data analysis that automates analytical model building. While most software requires training on where to look, the aim of this technology is to uncover hidden insights without explicitly being programmed where to search. It can instead learn from data, by identifying patterns and by making predictions.
- Supervised Learning – Utilizing tagged data, the machine is trained to identify features in select images and applied to identify them in images not used in their training. This is especially helpful for risk assessment, fraud detection, and image and speech recognition.