Abstract
Big data in health is a subject associated with many column inches and claims. It’s closely connected with other areas that are also close to the peak of the Gartner Hype Cycle: Artificial Intelligence, Machine Learning, Precision Medicine and Genomics. Given this exposure and also the potential for great benefit if done right, this chapter will sound a largely cautionary note. We must be aware of, and properly address the technical, trust, privacy and governance issues that Big Data brings up. From Hippocratic times, Primum non nocere has driven medical advance. Big Data needs to be no exception. We need to take great care at the outset with Big Data as once out, the genie will not go back in the bottle. Answers to some of the technical concerns have existed since the early 1980s and unless addressed at the point of care now, will continue to foster a Garbage in and Garbage out model. While there is no doubt that a degree of composting can improve matters, machine learning, AI and true precision medicine requires high quality, semantically interoperable, structured and coded, curated data. Alongside these technical concerns we consider the potential impact on the data subject, citizen and consumer. We also examine whether clinicians are appropriately led and equipped with education and tools for use at the point of care. It is only with these in place that we will be able to deliver solutions on the data quality challenges. Through ensuring these criteria are met with the highest possible quality, big data will start to meet the multiple expectations already in place.