top of page

Genomics and the importance of ethnicity coding

Sam Rodger & Owen Chinembiri, NHS Race and Health Observatory

At the frontier of medical innovation, new technologies promise to make healthcare more personalised, more efficient, and safer. Precision medicine methods can take us beyond a one-size-fits-all approach to treatment; artificial intelligence and machine learning can improve our diagnostic capacities as well as our population-level service planning; and genomics can aid our understanding and treatment of diseases and other determinants of health.

There is a risk, however, that these advances could leave behind those communities who already experience ethnic health inequalities. We know that ethnic minority groups are under-represented in medical research, and that access to innovative medical services can be lower among these communities.(1)(2)(3) If we want biomedical research and genomic medicine to work for everyone, there are two key priorities we must engage with now – ensuring that research is conducted in diverse populations with a range of ancestral backgrounds; and ensuring that the medical advances that result from this research is available for all.(4)

We must tread carefully though. Effective genomic medicine relies on an understanding of the complexities of geographical ancestry and genomic variation, and an appreciation of where the two correlate.(5) It would, however, be incorrect and dangerous to use genomic research to make generalised assumptions about ethnic groups. As Bonham et al describe it, ‘imprecise use of race and ethnicity data as population descriptors in genomics research has the potential to miscommunicate the complex relationships among an individual’s social identity, ancestry, socioeconomic status, and health, while also perpetuating misguided notions that discrete genetic groups exist.’(6)

It is vital that we cultivate a research and medical landscape that not only reflects the diversity of the national population, but that also equips clinicians and other professionals with sufficient understanding of the interplay between race, ethnicity and geographical ancestry.

This will not be a simple journey, not least because there are fundamental issues in the NHS regarding the collection and analysis of ethnicity data. If innovative medical services are to genuinely serve everyone, we must first ensure we have robust and representative data at all levels in our health service. In a recent report from the Public Policy Institute, UK Genomics: Genomics Revolution, the authors rightly highlight the potential of combining the UK's expertise and infrastructure in genomics, with its population-scale data.(7) This potential is indeed considerable, but we must first turn our attention to the embedded issues around ethnicity recording in the NHS if we don’t want to exacerbate existing inequality.

Data and Ethnicity

Like the rest of the world, the NHS is becoming increasingly data-driven, though arguably at a slower pace and often less evenly. Data is being used to develop new treatments that are improving health care outcomes, including genomics and precision medicine. However, not all people are benefitting from these new treatments.(8)

Though ethnic health inequalities have been a fact of like in this country for many years, it has been cast in brighter light as a result of unequal outcomes experienced globally during the COVID-19 pandemic.(9) Over the past year or so, the NHS has demonstrated a significant amount of will in tackling health inequalities. In the context of the pandemic, we have seen people and organisations who previously had not been involved in addressing inequalities become allies, advocates and activists. Meanwhile, the pandemic has accelerated the development and adoption of new technologies such as the COVID-19 track and trace app, and a necessary proliferation of digital consultations, seen as the safest way to resume clinical appointments without risking disease transmission.(10)

However, one area that hasn’t improved as quickly to support this new digital impetus is the quality of ethnicity coding in health care datasets. The challenges in this space are many. Ethnicity data is not uniformly collected or of a consistent quality; many of the codes still used in the NHS are outdated; ethnicity coding has long been absent in death registrations (from which mortality statistics are derived); and coverage is especially poor in in primary care data. Recent research published jointly by the Nuffield Trust and the Observatory found that, even though the proportion of health records containing a valid the patient’s ethnicity code was high, there is significant still work to do to improve the quality of ethnicity coding in health care records.(11)

Current state of ethnicity coding

The Nuffield Trust and NHS Race and Health Observatory research looked at ethnicity coding in English health service datasets. The research analysed Hospital Episode Statistics (HES) data (year range 2010/11 to 2019/20).

The research found out that:

  • 87% of inpatient spells, 83% of outpatient and 86% of A&E attendances had a valid ethnic group recorded in 2019/20. However, this includes “not stated” and “other” codes, which are not useful for analysis purposes. As an example, for inpatients, 8.5% of records had a code of ‘not stated’ and 8.8% had an ‘other’ ethnic group.

  • London had a high proportion of patients with ethnicity “not stated” or in the ‘other’ categories.

  • The proportion of records with a valid ethnic group varied markedly between providers, from 53% to almost 100%.

  • People from an ethnic minority background disproportionately receive a different coding on different occasions that they encounter the NHS.

  • The lack of comprehensive, high quality data on health and mortality by ethnicity is a significant obstacle to understanding ethnic inequalities in health, and therefore how the diverse health needs of different ethnic groups can be addressed.

Other NHS data sets also show incomplete ethnicity coding. NHS Digital regularly publishes Management Information Ethnic Category Coverage Information which includes GPES Data for Pandemic Planning and Research (GDPPR). The information shows the recorded ethnicity of patients broken down by Clinical Commissioning Group (CCG). The overall proportion of patients with known and ethnic categories in GDPPR datasets for 14 October 2021 data was 81.7%. At a CCG level the proportion of patients with known and ethnic categories ranged from 17.6% to 91.6%.


As highlighted above, published data and research shows that NHS datasets for primary care, secondary care and workforce all show gaps in ethnicity coding. The variations in the completeness of ethnicity coding and use of different ethnicity classifications in NHS datasets means that extra care must be taken when using the data for secondary uses. The significant variations between these datasets by region and organisation can result in biases and unintended inaccuracies in research and development of new treatments.

Inferences drawn from these and similar data sets often set the stage for biomedical research and genomic medical services. However, as the Observatory has highlighted elsewhere, technologies that are primarily developed for and tested on non-diverse cohorts can lead to health inequalities when they are used.(12) The same can be said for the application of genomic medicine where genomic studies include primarily those with European ancestry.(13) If data being used in research, or to develop new treatment and technologies has ethnicity coding that is incomplete, inaccurate or not representative, this can perpetuate existing inequalities and biases.

Health outcomes, access and experience vary significantly between ethnic groups in England and around the world. This needs to be a consideration in the development and continued roll out of the NHS Genomic Medicine Service. This must include considerations of representative research sampling, capacity and capability of medical professionals, and equitable access to services.

We will end by restating the importance of nuanced understanding in this area. A broad literature base tells us that ‘researchers should include underrepresented populations more often and materially in their research, describe diverse cohorts in specific and detailed ways, and engage marginalized communities meaningfully in the research process’ (Bentley et al, 2017). This is not just ethically the right thing to do but can broaden our understanding of disease and other determinants of heath for all. It should not, though, be confused with the idea that ethnic groups are genetically discrete populations. We need to learn from variations in ancestry and geography without inadvertently ingraining the idea that ethnic groups are genetically discrete (Bonham et al, 2018). While we pursue the practical steps below, we must not lose sight of this complexity.

Improving our data to enhance our understanding

Data used in genomics, whether to shape studies, or contextualise conclusions, should be representative of the UK population, with accurately coded and granular ethnicity information. To do this, those drawing on NHS data while conducting research or designing services should ensure the following:

1) Data being used must use the most recent census ethnicity classifications.

2) The data must have high levels of completeness when it comes to ethnicity coding.

3) Reports must include information on how ethnicity data was collected and quality assured.

4) The data must be representative of the local or national population in line with the most recent census data or agreed methodology

5) If data doesn’t meet these standards, steps must be taken to improve the quality of the data before work goes ahead.

Better use of data across the NHS means a better overall understanding of the populations and communities the system is serving. Complete data and nuanced understanding also help us to avoid dangerous conflations between ancestry and ethnicity, and to ensure that genomic medicine leaves nobody behind. The Observatory is committed to funding and supporting ground-breaking new research in this area over the next few years.


Sam is responsible for overseeing strategy and policy at the Race and Health Observatory, making connections across the health and care landscape, and using robust evidence to inform national policy. Before joining the team, Sam worked at NHS England and NHS Improvement as policy lead for the Workforce Race Equality Standard Implementation team, where he oversaw an action research programme into how organisational culture is experienced by black and minority ethnic staff groups in the NHS.

Sam has also worked at the General Medical Council on fitness to practice policy; at the Department of Health and Social Care on workforce strategy, with a particular focus on temporary staffing; and at Arts Council England, where he worked on making access to the arts in England more equitable.

Sam also has an MA in Humanitarianism and Conflict Response, where his research focussed on global health, post-colonial international relations, and the intersection of the public and private sectors in global peacekeeping and humanitarian intervention.

Owen is the Senior Implementation Lead in the Race and Health Observatory. He joined the NHS in 2004 as a Mental Health Occupational Therapist (OT). Prior to working in the UK, Owen worked as an OT in his native Zimbabwe and Botswana. He is also an alumnus of the NHS Graduate Scheme (Health Informatics specialism). He has worked in various management/leadership roles for clinical, transformation, performance and informatics teams. Owen is passionate about using data to reduce race inequality, improve patient care and staff experiences. Away from work, Owen is a keen runner and when he gets the time, he plays on his Xbox.



1. Smart, Andrew, and Eric Harrison. "The under-representation of minority ethnic groups in UK medical research." Ethnicity & health 22.1 (2017): 65-82. 2. Razai, Mohammad S., et al. "Covid-19 vaccine hesitancy among ethnic minority groups." (2021). 3. UK Government Scientific Advisory Group for Emergencies. Factors influencing covid-19 vaccine uptake among minority ethnic groups, 17 December 2020: 3. UK Government Scientific Advisory Group for Emergencies. Factors influencing covid-19 vaccine uptake among minority ethnic groups, 17 December 2020: 4. Bentley, Amy R., Shawneequa Callier, and Charles N. Rotimi. "Diversity and inclusion in genomic research: why the uneven progress?." Journal of community genetics 8.4 (2017): 255-266. 5. Bonham, Vence L., Shawneequa L. Callier, and Charmaine D. Royal. "Will precision medicine move us beyond race?." The New England journal of medicine 374.21 (2016): 2003. 6. Bonham, Vence L., Eric D. Green, and Eliseo J. Perez-Stable. "Examining how race, ethnicity, and ancestry data are used in biomedical research." Jama 320.15 (2018): 1533-1534. 7. 8. 9. 10. 11. 12. 13. Popejoy, Alice B., and Stephanie M. Fullerton. "Genomics is failing on diversity." Nature News 538.7624 (2016): 161.

bottom of page