How bias creeps into data-driven healthcare to impact minority ethnic groups (and how to stop it)

Niki O’Brien & Saira Ghafur, Imperial College London



Inequities in health and healthcare are a major challenge in the UK. Preventable mortality is associated with socioeconomic gradients, with the poorest areas of the UK having the highest rates of preventable mortality(1). Ethnicity-based disparities exist widely, with minority ethnic communities generally experiencing poorer health than the overall population. This trend has been made more visible by the disproportionate toll COVID-19 has had among minority ethnic individuals and communities(2). Similarly, women in the UK faced higher rates of mental health deterioration than men during the first COVID-19 lockdown due to differences in caring responsibilities and wider social factors(3).

Data-driven healthcare has the power to improve disease surveillance, enable better early detection of health conditions, allow for improved diagnosis, uncover novel treatments; all acting to facilitate more tailored therapy and personalised patient care(4, 5). However, data-driven healthcare, including but not limited to, artificial intelligence solutions and large genetic datasets, has historically been developed using unrepresentative data which has resulted in bias. Such bias has major implications for the population groups underrepresented. In Genome UK: The Future of Healthcare it is highlighted that “If people from ethnic minority groups are not appropriately represented in datasets, their information will be compared with people of a different genetic background to them, and the information they receive might not be as personalised”(5). This statement highlights the core challenge: how can we reduce bias in data-driven healthcare to make innovations work well for everyone?

Data-driven healthcare offers the opportunity to develop highly effective and targeted personalised healthcare from health and genomics data, but data, datasets and technology must be optimised to improve health and healthcare provision without perpetuating existing inequities. In this article, we explain the challenge of bias in data-driven healthcare and explore potential solutions, with examples from across critical sectors.

Data is increasingly driving healthcare, but at what cost?

Data-driven healthcare has already shown promise in several areas. Genomic applications, such as newborn screening for treatable inherited conditions, identification of individuals with hereditary cancers, exploration genetic predisposition to adverse drug effects, and use of polygenic risk scores (PRS) in disease detection and prevention, have been successfully applied in public health and clinical care(6, 7). Some artificial intelligence (AI) technologies have also been shown to outperform healthcare practitioners in diagnosing certain diseases, including in the diagnosis of malignant tumours (8). However, when it comes to AI technologies, there is a lot of hype and promise, though very few are in clinical use currently. There are several reasons for the lag between technology development and deployment, but one critical factor is how technical success can be translated into clinical impact when the healthcare data used to develop AI algorithms can produce biased results(9-11), as can the AI developers themselves. Legacies of institutional, social, and historical discrimination in biomedicine and healthcare further contribute to the challenge in developing unbiased technologies(11, 12).

Longstanding bias in healthcare


Bias is not necessarily a new challenge for the health sector. Medical professionals’ unconscious, or implicit, bias can lead them to make assumptions about diagnoses and treatments for patients, which can further lead to misdiagnosis and harm(13). Such bias may be focused on age, ethnicity, gender, sexual orientation, socioeconomic status, or a combination of these. For example, research studies over the past two decades have shown that men were investigated more thoroughly and treated more extensively than women with the same symptom severity across disease areas such as coronary artery disease, irritable bowel syndrome, knee joint arthrosis, neck pain, Parkinson’s disease, and tuberculosis(14).

Bias and minority ethnic groups


In the same way that bias in the health sector has resulted in disparities in treatments and outcomes across groups, ethnic minorities have, and continue to, face a range of poorer health outcomes than White groups in the UK. This has been observed through the COVID-19 pandemic. Minority ethnic groups, specifically Black and South Asian groups had a higher COVID-19 mortality risk than White groups, even when adjusting for location, measures of disadvantage, occupation, living arrangements, and pre-existing health conditions(15). The full extent of disparities may not yet be known given data collection on ethnicity in healthcare is poor and inconsistent. A Nuffield Trust report published in June 2021 noted the proportion of electronic records with a valid ethnic group code varied from 53% to almost 100% across healthcare providers in the UK, and minority ethnic patients were more likely to have different ethnicity codes assigned to them in different databases(16).

Funding allocated for health research has also been historically associated with bias. For example, sickle cell disease (SCD) affects around 15,000 people in England, most of Black African or African-Caribbean, Mediterranean and Asian origin, but SCD is allocated as much as 30 times less research funding than other genetic disorders, such as cystic fibrosis which affects around 10,000 people in England, mostly of White ethic origin(17, 18). These figures is more stark in the US context where the birth rate of SCD is 1 in 365 Black individuals as compared with rates of cystic fibrosis (1 in 2500 White individuals)(19).

The development of data-driven healthcare has ushered in a new era but brought its own challenges for addressing bias in health, specifically through bias related to data. Genome UK: The Future of Healthcare reported in 2020 that there is currently an ethnic bias in most large genetic datasets and the bioinformatics tools used in healthcare, as the main databases host genetic information from European ancestries(5). The report highlights that work is being undertaken to improve representation in datasets and advocates for over-sampling, where minority ethnic groups are overrepresented in samples to ensure adequate information to inform genetic diversity. In the artificial intelligence space, similar problems persist. Case study 1 outlines the challenge of developing AI from imaging datasets, given persistent bias impacting underrepresented groups.

​Case study 1: Ethnicity-based bias in AI technologies

​Data-driven AI technologies developed to treat a range of diseases have been found to be biased. For example, AI driven software used to read chest X-rays has consistently underperformed in testing based on a range of technical challenges. Seyyed-Kalantari et al. found bias in all four of the chest X-ray datasets tested, with Hispanic patients the least favoured subgroup and White patients the most favoured(20). The authors further observed bias based on age, gender, and socioeconomic disparities. Similarly, early development of AI to diagnose melanoma has faced challenges with bias as the datasets used to train the technology are largely made-up of images of lighter skin. Project, one of the largest and most commonly used open-source, public-access archives of pigmented lesions, uses patient data heavily collected from lighter skinned populations in the US, Europe, and Australia(21). Outside of improving appropriate representation within datasets for AI development, statistical methods exist to prevent lower performance of algorithms for underrepresented groups, such as data stratification or sub-population weighting, but they cannot always improve levels of bias(20). Furthermore, they are still not standard practice in the development and testing of algorithms. Statistical methods disclaimers about the dataset collection process and the potential for algorithmic bias, as well as improved regulation, could improve the assessment of AI driven software for use in healthcare(20, 22).



If we are to learn from the health sector, where are the positive examples?

Addressing the challenges of bias in developing data-driven technologies for healthcare is urgent if these technologies are going to be optimised for entire populations. There are already several positive initiatives across sub-specialities and organisations. In recognition that underrepresented groups, who have historically suffered from a lack of investment and participation in research(17), have experienced worse COVID-19 health outcomes, major funding bodies in the UK and US have launching funding calls specifically seeking research to understand the social, behavioural, economic impact of COVID-19 in these groups(23, 24). Such a response to a clear unmet need should be developed by funders focused on the development of data-driven technologies for healthcare. A prescient has already been set in this endeavour; earlier this year the Health Foundation and NHS AI Lab launched a much-needed funding call for research into data-driven technologies and AI(25).

Beyond research funding to address bias challenges in advancing data-driven technologies for healthcare, researchers and developers are continuously increasing the understanding of the challenges associated with bias and working to mitigate them. Case study 2 outlines the development of an AI model to detect breast cancer risk which was developed with equity in mind.



Case study 2: AI built with equity in mind

​Mirai, a machine learning model to predict breast cancer risk based on traditional mammograms, performed better than the commonly used Tyrer-Cuzick model and previous deep learning models at identifying both 5-year breast cancer risk and high-risk patients across multiple international cohorts(26). As Black women in the US are 43% more likely to die from breast cancer, testing the technology across ethnic groups was essential.(27) The developers validated all risk models for different clinical subgroups of interest, including the computation of model C-indices for patients of different races (White, African American, and Asian American), different age groups, different density categories, and different mammography devices. They found that Mirai performed similarly across race and ethnicity categories, suggesting the potential for improvement in patient care across the US. Following the development of the models, the team sought to ensure they performed consistently in diverse clinical environments.(27) To produce consistent predictions across environments, the team used an adversarial scheme(28) where the model specifically learns mammogram representations that are invariant to the source clinical environment. This was further tested in diverse clinical settings in the US, Sweden, Taiwan(27). Further development and evaluation of computational approaches to prevent and identify bias in modelling across the potential scope of implementation of AI, evidenced in existing best-practice in research and development, can pave the way for developing AI and ML with minimal bias.


A major concern related to bias in data-driven technologies is bias in the workforce which develops the technologies, where underrepresented groups are often underrepresented in the workforce. A lack of diversity in developer teams, for example, can lead to solutions that may not be representative of or appropriate for the intended end-users, or perpetuate stereotypes(29). Problems with bias may be more easily overlooked when teams are too homogenous, for example in teams with a similar educational background, ethnicity, gender, age, or socioeconomic status(30). Similarly, a lack of diversity and equal treatment in the UK NHS, may have implications for staff feeling like they can raise concerns about bias in the application or impacts of data-driven technologies used in clinical practice. Case study 3 outlines progress to develop diversity and equity across the NHS.



Case study 3: Developing more diverse teams in the NHS

​To address equity within the NHS, the Workforce Race Equality Standard (WRES) programme was established in 2015. It requires NHS organisations to report against nine indicators(31) of race equality and supports continuous improvement to tackle the root causes of discrimination. The benefits of a more diverse workforce are numerous. For example, if medical professionals and teams are more representative of the diverse patients they treat, patients may feel they have a better understanding of their medical needs and greater confidence in their treatment(32). The WRES Report 2020 signalled improvements, including the total number of BME staff at very senior manager (VSM) pay band increased by 41.7%, from 108 in 2017 to 153 in 2020, and the number of BME board members in trusts increased by 22.2% between 2019 and 2020(31). However, challenges remain. The report further reported that White applicants were 1.61 times more likely to be appointed from shortlisting compared to BME applicants (worse than 2019), more BME staff reported experiencing harassment, bullying, or abuse from patients, relatives, or the public, and BME staff were still 1.16 times more likely to enter the formal disciplinary process compared to White staff. These findings highlight the importance of proactively working towards a culture of diversity and inclusion within the workforce, with accountability at the highest levels of the organisation. Organisations should avoid tokenistic inclusion of diversity policies. Crucially, diversity hiring needs to be matched with a culture of diversity and inclusion that enables equal participation in review and feedback processes, provides equal opportunities for upward progression, and encourages open discussion of the challenges among staff and senior leaders.


If we are to learn from best practice elsewhere, where can we look?

There are additional examples of initiatives to address some of the challenges of bias in data-driven technologies in wider society. As noted, a major concern related to bias in data-driven technologies for healthcare is bias in the wider workforce. Efforts to reduce bias in data-driven technology must include diversifying the workforce. Case study 4 outlines a data-driven initiative to develop diversity, accountability, and transparency in the private sector, which could be applied to the technology sector, as well as the NHS.



Case study 4: A data-driven approach to develop diversity, accountability, and transparency at Target Corporation

​There are several good examples of initiatives from the private sector, where time and resources have been invested in recent years to develop diversity and inclusion(33). The approach of Target Corporation, the 8th largest retailer in the US, offers an example of an initiative that promotes diversity, but also accountability and transparency. Target Corporation have use data to drive real-time transparency, recruiting a dedicated development and inclusion analytics team(33). The team track progress with multiple dashboards, and through quarterly and annual processes, the organisation reassesses its goals and adjusts tactics quarterly. Encouraging a positive culture of diversity within the organisation, business leaders are expected to make use of this disaggregated data to drive decision-making when setting pay or advancement. A culture of diversity is also encouraged through a Diversity Action Committee, a volunteer group in each business area that works with the development and inclusion team on initiatives for their part of the organisation. Progress has been made as part of these efforts, with women and minority ethnic groups well represented at all levels of the company, including on the board of directors where women make up one third and Black and Latino make up nearly half of its membership(34). However, there is still work to be done to continue to encourage sustainable and meaningful diversity. Underpinning the future strategic planning around diversity, using data to track and analyse diversity against set goals and commitments encourages great accountability as leaders across business areas can access and understand trends in the data to inform how to best progress diversity across the workforce.



Across the individuals and organisations developing data-driven technology in healthcare, transparency is essential in building trust in their safety and quality. Patients and the public should have a greater role in developing and appraising technologies before and during their application in clinical practice. However, transparency must also go beyond the aspects of the process where patients and the public can be deeply involved. Researchers and developers, then later commissioners, local authorities, and Trusts, can build public trust in the work they are doing by not only celebrating the successes, but informing interested parties when things did not go to plan. Case study 5 outlines an honest reporting example from social care.



Case study 5: AI reporting transparency in social care

​In 2019-2020, What Works in Social Care collaborated with four local authorities in England to develop machine learning models from data from children’s social care cases to predict eight outcomes for individual cases in children’s social care(35). As part of the project, they sought to determine whether the technology could predict outcomes equally well for different groups of children. The research team prepared the datasets for the model, validated the data, and created new ways of summarising or categorising the data. Despite their efforts, the final models developed missed most children at risk of the outcomes, which led to the model discouraging a social worker to support a child or young person, potentially resulting in harm to them. These results were published openly, along with their findings that there is currently a low level of acceptance of the use of these techniques among social workers. The overall transparency of reporting and considered nature of data curation openly exposes the weaknesses of using these machine learning models for decision-making in social care. Such reporting of negative findings is helpful to inform future research and practice. Additionally, as part of this project, What Works in Social Care developed a standardised way of reporting machine learning models in the sector to enable transparent communication about and comparison of models(35).

Bias in data-driven technology risks negatively impacting on already unequal health and healthcare in the UK. While the application of these technologies is still developing, multidisciplinary collaboration is essential to address bias if innovations with the power to improve health and healthcare for whole populations are to be developed. Lessons can be learned from positive practices and initiatives in the health sector and beyond. Without attention to addressing current and future challenges, there is a risk of creating and exacerbating existing inequalities in healthcare.



 

Niki is the Policy Fellow in Global Health at the Centre for Health Policy, Institute of Global Health Innovation, Imperial College London, bringing expertise in global health and international development. She works across projects and grants to support the delivery of research and policy encompassing low- and middle-income countries (LMICs) and/or multiple country health systems, and provides health system expertise globally, including in the EMEA, Americas, and Asia Pacific regions. Niki also leads operations and research conducted by the Leading Health Systems Network (LHSN) in the areas of healthcare worker safety and cybersecurity in healthcare.


Niki has worked extensively with low- and middle-income governments and policymakers in sub-Saharan Africa and Southeast Asia to develop technical capacity and expertise across several areas of health and healthcare. She has published on topics such as digital health and cybersecurity, patient safety, Health Technology Assessment and priority setting, global surgery, and donor priorities and financing.



Saira is the Digital Health Lead at the Institute of Global Health Innovation and Security Science Fellow at ISST, Imperial College London and a honorary consultant in respiratory medicine at St Mary’s Hospital. At Imperial College Saira has spearheaded the College’s collaboration for healthcare cybersecurity. Saira also leads work on evidence generation for digital health, the value of healthcare data and AI and machine learning for health in low and middle income countries.


Saira is also the co-founder of two start-ups: Psyma (mental health) and Prova Health (evidence generation in digital health). Saira holds a MSc in Health Policy from Imperial and was also a Harkness Fellow in Health Policy and Practice in New York (2017).



 

References

1. Marmot M, Allen J, Boyce T, Goldblatt P, Morrison J. Health Equity in England: The Marmot Review 10 Years On. The Health Foundation; 2020.

2. Public Health England. Disparities in the risk and outcomes of COVID-19. Public Health England; 2020.

3. Suleman M. Gender divide: a post-COVID recovery must address pandemic inequalities.2021 16 Sept 2021]. Available from: https://www.health.org.uk/news-and-comment/blogs/gender-divide-a-post-covid-recovery-must-address-pandemic-inequalities.

4. Fogel AL, Kvedar JC. Artificial intelligence powers digital medicine. npj Digital Medicine. 2018;1(1):5.

5. HM Government. GENOME UK: The future of healthcare. 2020.

6. Khoury MJ, Holt KE. The impact of genomics on precision public health: beyond the pandemic. Genome Medicine. 2021;13(1):67.

7. Green ED, Gunter C, Biesecker LG, Di Francesco V, Easter CL, Feingold EA, et al. Strategic vision for improving human health at The Forefront of Genomics. Nature. 2020;586(7831):683-92.

8. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94-8.

9. The Lancet. Artificial intelligence in health care: within touching distance. Lancet. 2017;390(10114):2739.

10. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. Practical guidance on artificial intelligence for health-care data. The Lancet Digital Health. 2019;1(4):e157-e9.

11. Leslie D, Mazumder A, Peppin A, Wolters MK, Hagerty A. Does “AI” stand for augmenting inequality in the era of covid-19 healthcare? BMJ. 2021;372:n304.

12. Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. npj Digital Medicine. 2020;3(1):81.

13. Oxtoby K. How unconscious bias can discriminate against patients and affect their care. BMJ. 2020;371:m4152.

14. Hamberg K. Gender Bias in Medicine. Women's Health. 2008;4(3):237-43.

15. Office For National Statistics. Updating ethnic contrasts in deaths involving the coronavirus (COVID-19), England: 24 January 2020 to 31 March 2021. 2021. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/updatingethniccontrastsindeathsinvolvingthecoronaviruscovid19englandandwales/24january2020to31march2021.

16. Scobie S, Spencer J, Raleigh V. Ethnicity coding in English health service datasets. Nuffield Trust; 2021.

17. Dyson S, Atkin K. Achieve equity in access to sickle cell services. 2013 26 Nov 2013. Available from: https://www.hsj.co.uk/commissioning/achieve-equity-in-access-to-sickle-cell-services/5065176.article.

18. Strouse JJ, Lobner K, Lanzkron S, Haywood C. NIH and National Foundation Expenditures For Sickle Cell Disease and Cystic Fibrosis Are Associated With Pubmed Publications and FDA Approvals. Blood. 2013;122(21):1739.

19. Farooq F, Mogayzel PJ, Lanzkron S, Haywood C, Strouse JJ. Comparison of US Federal and Foundation Funding of Research for Sickle Cell Disease and Cystic Fibrosis and Factors Associated With Research Productivity. JAMA Network Open. 2020;3(3):e201737-e.

20. Seyyed-Kalantari L, Liu G, McDermott M, Chen IY, Ghassemi M. CheXclusion: Fairness gaps in deep chest X-ray classifiers. Pac Symp Biocomput. 2021;26:232-43.

21. Adamson AS, Smith A. Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatology. 2018;154(11):1247-8.

22. Larrazabal AJ, Nieto Ns, Peterson V, Milone DH, Ferrante E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences of the United States of America. 2020;117:12592 - 4.

23. National Institute for Health Research. Multimillion investment in new research projects to investigate higher COVID-19 risk among certain ethnic groups. 2020 29 Jul 2020]. Available from: https://www.nihr.ac.uk/news/multimillion-investment-in-new-research-projects-to-investigate-higher-covid-19-risk-among-certain-ethnic-groups/25333

24. National Institutes of Health. Opportunities: Notice of Special Interest (NOSI): Social, Behavioral, and Economic Impact of COVID-19 in Underserved and Vulnerable Populations. 2021 20 Aug 2021]. Available from: https://grants.nih.gov/grants/guide/notice-files/NOT-MH-21-330.html.

25. The Health Foundation. Artificial Intelligence and Racial and Ethnic Inequalities. 2021. Available from: https://www.health.org.uk/funding-and-partnerships/programmes/artificial-intelligence-and-racial-and-ethnic-inequalities.

26. Yala A, Mikhael Peter G, Strand F, Lin G, Smith K, Wan Y-L, et al. Toward robust mammography-based models for breast cancer risk. Science Translational Medicine. 2021;13(578):eaba4373.

27. Gordon R. Robust artificial intelligence tools to predict future cancer 2021 28 Jan 2021]. Available from: https://news.mit.edu/2021/robust-artificial-intelligence-tools-predict-future-cancer-0128.

28. Zhao M, Yue S, Katabi D, Jaakkola TS, Bianchi MT. Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture. In: Doina P, Yee Whye T, editors. Proceedings of the 34th International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2017. p. 4100--9.

29. UNESCO. I’d blush if I could: Closing gender divides in digital skills through education. UNESCO; 2019.

30. Lee N. Detecting racial bias in algorithms and machine learning. J Inf Commun Ethics Soc. 2018;16:252-60.

31. NHS England. Workforce Race Equality Standard 2020: Data Analysis Report for NHS Trusts and Clinical Commissioning Groups. NHS England; 2021 Feb 2021.

32. Heller A. Diversity in the medical workforce: are we making progress? 2020 03 Feb 2020]. Available from: https://www.kingsfund.org.uk/blog/2020/02/diversity-medical-workforce-progress.

33. Hunt V, Dixon-Fyle S, Prince S, Dolan K. Diversity wins: How inclusion matters. McKinsey & Company; 2020 19 May 2020.

34. Target. Target’s 2020 Workforce Diversity Report Shows Our Progress — and What’s Ahead. 2021 27 Jul 2021]. Available from: https://corporate.target.com/article/2021/07/workforce-diversity-report.

35. Clayton V, Sanders M, Schoenwald E, Surkis L, Gibbons D. Machine Learning in Children’s Services: Technical Report. What Works for Children’s Social Care; 2020.