Reviewed by Prof. John Kane

Professor of Psychiatry at The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell and Director of the Institute of Behavioral Science at the Feinstein Institutes for Medical Research

Big Data in Healthcare
0:00 / 18:42
file_download Download
file_download Download

Big Data on The Rise

The data that is generated from the use of the Internet, social media, healthcare records, and purchasing history – herein referred to a ‘big data’ – is increasing at an exponential rate worldwide. Both the veracity and velocity of data collection continue to rise with advancements in technology and with the use of more diverse forms of those technologies.1 This atmospheric increase in big data has been made possible by an equally rapid increase in the number of people globally with access to Internet and mobile technologies. In 2014, over 2 billion people worldwide had access to the Internet and over 5 billion people had a mobile phone. Within the next 4 years, over 5 billion people will have Internet access.

“The result of this dramatic increase in Internet access is a production of data 44 times greater than seen in 20092.”

This amount of data to be generated is truly staggering and continues to grow each day as more and more people gain Internet access and adopt “smart” technologies including smartphones and smartwatches. To put this amount of data into context, research has indicated that if one were to collate all of the data from recorded history through the year 2003, one would have approximately 5 billion gigabytes of data.3 In 2011, that same volume of data was generated every two days. Just four years later, in 2015, that same amount of data was available every ten seconds. And it continues to increase with every minute of every day (Figure 1).

image Image
file_download Download
Data Generation Every Minute, 2012-2016
Figure 1. Data generation every minute, 2012-2016

With more and more people throughout the world connected to the Internet every day, the generation of data is growing at a nearly exponential rate. There are now more mobile devices in the world than there are people. With all of this connectivity, just how much data is generated every minute? The numbers are astonishing and are brilliantly illustrated in this graphic by DOMO, Inc.

As this graphic illustrates, people are using Internet-connected devices more and more in their daily lives. Data scientists are attempting to harness the power of the compiled “big data” from those devices for a number of purposes, from business initiatives to improving healthcare

DOMO, Inc. Graphic used with permission.32

grid_view Slide
file_download Download
Big Data
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. And the amount of data is ever expanding.  This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data”.
IBM. “What is big data”. Available at http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Creating value from big data

Big Data analytics

Over recent years, the dramatic increase in the number of users and volume of generated data has prompted innovation from both the private and public sector, primarily in the development of methods capable of making sense of such large quantities of data. Several companies, including giants such as Uber and Facebook, have been able to create value from the data generated, translating user activity into profitable gains for their companies. As these companies and researchers have acknowledged, however, the sheer volume of data is larger than the tools currently available to analyze them,4 limiting the ability to answer questions based on big data in a cost- or time-effective manner. As such, and given the exceptional rate at which technologies capable of collecting data has increased in the past 20 years – especially in the past 5 years – methodologies and systems able to analyze such large quantities of data are in continuous development and still evolving.

While proving profitable in the private sector, with such seemingly unfathomable volumes of data at our fingertips, questions arise as to how one can extract meaningful information on a population and even an individual level in sectors such as the healthcare industry to improve patient outcomes, reduce costs, and increase quality of life.5 In other words, what can big data do for us to improve health for both healthy individuals and those in need of care?

Systems biology

In order to treat medical diseases, one must have a thorough understanding of what is occurring both chemically and biologically in the human body. As such, and before taking a discussion on the role big data can play in the healthcare industry, it is prudent to comment on the possibilities big data provides in modeling chemical and biological processes. Using so called “systems biology,” researchers have begun to explore and better understand how the interactions of biological processes shape human behavior as well as the development and course of various diseases. Such an understanding would revolutionize modern medicine as it would allow physicians and researchers the ability to better treat a disease from the development of medication to the treatment of a patient in the clinic. Fortunately, such a revolution in disease understanding has already been made possible. Big data, and subsequent methodologies to analyze the data, allows researchers to study the interactions of the so-called “omics” (epigenomic, genomics, and proteomics to name a few) in order to better understand how the body functions from a biological perspective.6 This is of vital importance since it is widely accepted that the different biological processes studied through the various “omics” interact amongst each other, not as independent constructs. Having the ability to assess these interactions has led to breakthroughs in our understanding of diseases, and could potentially lead to advancements in treatments for a countless number of diseases.7

Computer-modeling of disease models and the ability to test the interactions of a wide range of “omics” data allows research scientists the ability to formulate and test hypotheses and in much more efficient manner, ultimately saving valuable time and money. Progressed further, this technology could lead advancements in the drug development process, including in target validation, and would ultimately lead to improved quality of life for patients by getting safe and effective treatments to market that target the underlying biology of the disease.9

“Big data provides an opportunity to use mountains of information from thousands – even millions – of patients to better understand and treat medical illnesses”

Systems medicine

While the possibilities of big data are vast and include many public health arenas, its utility in influencing healthcare may be one of the most encouraging,10 including the potential for advancements in drug discovery and development. Within the healthcare industry, researchers hope that big data can serve to effect positive change in ways which were previously not possible.

There are a number of possibilities and questions surrounding the use of big data in healthcare, not least of which is how such quantities of data can be brought to the level of the individual patient.11,12 How can big data be used to impact one person seeking treatment? Can it be used to personalize care?13 With a rapid increase in the number of devices which can collect data, including watches and mobile phones, more data than ever is being generated and used for a multitude of purposes from promoting healthy behaviors14 to monitoring and improving our understanding of the progression of progressive diseases like Parkinson’s15 and Alzheimer’s.16

Coupled with advancements in the Internet along with mobile phones and wearable devices is also the electronicalization of medical records. According to the US Centers for Disease Control, in 2013 nearly 80% of physicians used some form of electronic medical record system, an increase from 18% in 2001.18 In 2014, 3 out of 4 hospitals had adopted electronic health record systems, a nearly 8-fold increase from 2008.19 With more and more physicians and hospitals using electronic medical records, researchers will have the ability to assess more data and consequently better understand population health.

While clearly advantageous for public health analytics, what benefit could all of this data have for an individual patient? Imagine if a doctor could input a patients’ medical history, including their laboratory values, diagnosis, and family history and obtain a recommendation for care from a database of thousands of scientific journal articles written on the topic coupled with tens of thousands of other patients similar to the patient in question. By translating knowledge gained from systems biology into so-called “systems medicine,” and through the use of cognitive-computing systems such as that being developed through the IBM Watson project, healthcare professionals have begun to see this possibility become a reality.20

“Big data allows physicians to use technology to enhance the treatment of their patients by improving outcomes while reducing time and costs17

Computer-assisted healthcare: IBM Watson

IBM Watson is a technology platform using a combination of natural learning processing and machine learning to garner insights into a particular query from large amounts of unstructured data.22 IBM Watson first came to media fame in 2011 when it competed against human contestants on the American television quiz show Jeopardy! where it demonstrated an ability to answer nuanced questions in a remarkably accurate manner. In fact, the IBM Watson system quite easily defeated two of the shows most celebrated winners over a several episode competition.

In recent years, Watson has been adapted and re-invented in an effort to transform the healthcare industry, where Watson has been directed to assist physicians with both diagnosis and treatment planning for one of the leading causes of death worldwide, cancer.

Oncologists from the hospital and doctors from some of the most prominent cancer research and treatment institutions in the world continually “teach” Watson based on clinical outcomes so that Watson “learns” for future cases. First and Formemost, it provides a completely objective assessment of a patient based on their full medical and social history.

How it works

With so much information available for Watson to sort through in order to answer a specific question or give a treatment recommendation, how does it determine what the best answer to a question or solution to a problem is? A multi-faceted process is used by Watson in order to interpret information and ultimately answer a question. First, Watson determines what type of question is being asked and, more importantly, what the question is asking for by breaking down the question into parts of speech. Watson then scans its database of information, coming up with thousands of possible solutions. Where Watson excels – and differentiates itself from simple computers – is in the next step, where Watson tests hypotheses and evidence, developing both pro and con evidence for the thousands of potential solutions gathered in the previous step. In the final step, Watson ranks the possible solutions based on its hypothesis and evidence-testing as well as on previous experience, ultimately providing a percentage score of how likely that the answer provided is correct. All of this is done in a matter of minutes.

“The “Watson for Oncology” cognitive-computing system harnesses the power of big data to provide an evidence-based treatment plan for each patient23

Clinical Use

As noted earlier, the clinical utility of systems medicine, and cognitive-computing systems like IBM Watson, is exciting.

Systems like IBM’s Watson for Oncology have a number of benefits for the healthcare industry. First and foremost, it provides a completely objective assessment of a patient based on their full medical and social history. As any physician knows, there is far too much data for a physician to obtain and assess for each individual patient and new scholarly articles which the physician may have the time or access to read. Further, these systems “learn” from each patient who has a successful or failed treatment along with every article and textbook written on the disease of interest, allowing confidence that the data utilized in providing treatment recommendations is not based on static or outdated information.

From a hospital administration perceptive, these systems allow for a virtual, collaborative effort between physicians and researchers worldwide. It also serves to fill gaps that healthcare shortages can cause or in areas of the world where specialized physicians are in demand. Within treatment for cancers in the United States, for example, the latest report from the American Society of Clinical Oncology notes that while the number of cancer cases is growing, the clinical workforce is aging and exists largely in metropolitan areas, facts which may adversely impact the ability of the medical community to meet the clinical demand for care.24 Having such diagnostic and treatment recommendation technologies available through Watson or similar systems in underserved or vulnerable populations such as patients in rural, prison, or refugee settings would provide physicians who do not have the support of a large team of colleagues with the ability to obtain a more comprehensive assessment of these patients.

While the above noted benefits of systems medicine approaches propagated by programs like Watson for Oncology clearly have the potential to advance clinical practice and help patients, a number of challenges are also noteworthy.25 For example, what happens when the clinical recommendation of the treating physician or team of physicians conflicts with that of a system such as IBM Watson?26 Such systems are meant to serve as a guidance mechanism for the physician, not as the definitive solution to treatment or diagnosis. With such a powerful system providing a conflicting (suggested) treatment course, however, the physician and patient may feel conflicted about what constitutes the correct path to health. Such discussions need to be taken between the physician, patient, and family in order to determine what will have the greatest likelihood of achieving a better quality of life for the patient.

Ethics in Big Data-Assisted Healthcare

Ethical questions, in terms of the way in which data is obtained and utilized, are at the forefront of big data discussions and will likely remain until core principles of use are properly implemented throughout the healthcare industry.27,28

In a broader sense of ethical concerns surrounding big data, the question of who “owns” the data that is gathered through the use of the Internet, mobile and wearable devices, and healthcare information poses a very real and valid ethical question.29,30 With so much data available from so many sources, this question can be difficult to answer. Can the data gathered through these devices be shared with others? Who is responsible for the data? To what extent do people have the “right to be forgotten”, or in other words, to what extent can ordinary people control the access and sharing of their data?31

Discussions surrounding the proper use of big data prompt strong opinions from citizens and policy makers, from private companies and public sector agencies, and from physicians within the healthcare industry. Further discussions will be required – and soon – in order to ensure that the massive volumes of data being obtained throughout the world are utilized in a manner which is beneficial to society, but also accords with a set of best ethical principles. This will help to ensure the greatest degree of mutual benefit for those wishing to access and use the data as well as those from whom the data is collected.

“Acknowledging and addressing ethical questions that may arise through the use of big data and systems like IBM Watson is of the utmost urgency”


The rapid expansion of Internet access and mobile technologies worldwide provides opportunities to revolutionize healthcare in ways that were not possible 15 years ago. More people than ever use smartphones, wear smartwatches, and have regular and reliable access to the Internet. These technologies generate vast troves of data, the volume of which continues to grow each day.

It is no surprise that so-called big data is at the forefront of medicine, and cognitive-computing technologies such as those seen with IBM Watson have already started to unlock the power of this data in an effort to aid physicians in the diagnosis and treatment of patients fighting cancer. With further advancements in this and similar technologies, these systems could be expanded to combat other deleterious and complex diseases, including mental health disorders. With a growing need for healthcare services worldwide, multi-faceted collaborations between business and the healthcare industry are required in order to ensure the health of the global population. Harnessing the power of big data may help to fill gaps in care while ensuring better health and quality of life for patients throughout the world.

Access video: Andreas Meyer-Lindenberg – Big Data and How do we use it for Artificial Intelligence | Part 2