From the April 2018 issue of HealthCare Business News magazine
By Bipin Thomas
In the health care industry’s transition to consumer-centric and value-based care, we need plenty of data ranging from outcomes data, biometric data, socioeconomic data as well as genomic and familial data.
This complete ecosystem of data that is needed will increase the total amount of health care data exponentially. According to IDC research, the health care industry will generate 44 zettabytes of data by 2020. Even health care organizations that have enterprise data warehouses will be challenged to handle such volumes of data.
A data lake is an open reservoir for vast amounts of data inherent with health care. Health care organizations do not have enough time and resources to map it. A data lake brings value to health care because it stores all the data in a central repository and only maps it as needs arise. Determining how to structure data before it’s brought in — although common in health care — wastes time, money and resources. When the data is stored in the data lake, it’s impossible to know how to structure the data since all the use cases for that data are not known. Using the data lake approach of bringing data in and then adding structure as use cases arise is the right thing to do in health care to avoid long-term projects that ultimately fail.
The role of a data lake in the health care industry is essential, creating broad data access and usability across the enterprise. Data lake benefits include improved scale, schema, processing workloads, data accessibility, data complexity and data usability. A data lake is the preferred choice for larger structured and unstructured datasets coming from multiple internal and external sources, such as radiology, physician notes and claims. This removes data silos and it doesn’t demand definitions on the data it ingests. The data can be refined once the questions are known. A data lake offers great flexibility on the tools and technology used to run queries. These benefits are instrumental to socializing data access and developing a data-driven culture across the health care organization. A data lake is prepared for the future of health care data with the ability to integrate patient data from implanted monitors and wearable devices.
A data lake can scale to petabytes of information of both structured and unstructured data and can ingest data at a variety of speeds from batch to real time. Unfortunately, these capabilities have led to a negative side effect. Gartner’s hype cycle for 2017 shows that data lakes have passed the peak of inflated expectations and have started the slide into the trough of disillusionment. Initially, data lakes were predicted to solve all of health care’s outcomes problems, but they have ended up just collecting petabytes of data. Now, data lake users see a lot of detritus that can’t be used to build anything. The data lake has become a data swamp.