Data and ethnic diversity in epigenetics

Anndior Boateng, University of Brighton


Data science is used widely in healthcare to advance processes and facilities. As a student nurse, I realised the need to have this knowledge and the impact it would have, to educate healthcare practitioners on the use of data science in their fields. From experience, many healthcare practitioners do not find the new technology implemented in the hospital helpful as they are not trained to a confident level and this in turn means they won’t fill out data to a replicable standard. This reflects in the data that is then collected in the hospitals, as most data can be auto filled from memory, rather than a real reflection of a patient’s current presenting symptoms at that time. Data should be diverse, so that we can truly gauge the needs of people we are currently serving.


During this time, I was part of a university workshop that was taking the experiences of clinical students on placement, specifically for those in the BAME community. It opened my eyes to the gap of knowledge that stops students from this specific community of scientists experiencing and being the solution to the problems they faced in a clinical setting. The biggest gap was the lack of community they had within the fields they were in; this was due to multiple reasons. Being that opportunities in data were not offered to them and not made accessible.


Pursing this internship with Health Data Research UK (HDRUK) was challenging because I had to come out of my scope, as the role was for people of a data background which I did not possess. But being that this initiative was targeted towards young black people in higher education (science), those who would not have the opportunity otherwise to experience these roles until a later stage in their lives, I pursued it regardless. Genomics, especially in cancer really caught my eye, its study was starting to become a transforming tool in diagnosis and treatment, which was fascinating to me. For the longest time I have envisioned a healthcare system that focuses on preventative treatment and being able to see the process that allows for the fruition of research projects like the 100,000-genome project, really excited me.


My time at DATA-CAN, the Heath Data Research Hub for Cancer, showed me that data plays a major role in the advancement of treatment for diseases, medical imaging, drug discovery, predictive diagnosis and most of all, genetics which was of great interest to me. I was given great insight and exposure into the commercial aspect of how data moves, something that was prioritised and that I can speak proudly of was DATA-CAN's desire to always have patient representatives in meetings and decisions made. Cancer plays a central role in genomic research and having the experience to work with people who are creating databases that allows for data to be shared sustainably and used in the best interest of people experiencing the disease and its treatment.


Accessibility to data of a high quality is being explored as the main key to improving healthcare outcomes and services. In the case of cancer care and treatment, a great amount is being done to create trusted research environments that allow for predictable data to be collected and programmed for specific care. At DATA-CAN there was a lot of emphasis on making sure that it was easier to interpret data through the NHS more securely. The United Kingdom has one of the longest health data systems, but many clinicians and researchers struggle to locate data of a good quality. Working in the commercial sector opened my eyes to the legalities of sharing data, through data controllers such as universities, charities, and other NHS organisations. It was interesting that I would be working to optimise the use of data, without ever using any - apart from my own research.


Broadening diversity in databases must become an integral part of research quality. Clinical and biomedical research has closely focused its research on what they deem to be accessible. This has resulted in a large cohort of mostly white males being studied. Diversity in data is constantly a topic of conversation as research continues to have gaps in it. If we do not broaden the scope to diverse populations that predominantly face health inequalities that can be avoided, health disparities widen between deprived groups. There is a great need for promoting diversity in genomic research, firstly as a matter of justice, when we fail to engage different ethnicities, we perpetuate health inequities.


The reason I focused on epigenetics was to try and elucidate emerging research that focuses on the socio-economic and environmental effects that directly affected healthcare prognosis and outcomes. The areas populated with these avoidable outcomes held a higher percentage of ethnic minorities. We know that certain diseases are carried differently across different ethnicities and races, but this led me to deeply explore the ignored contributions of environmental and social factors such as, smoking, poverty, air pollution and poverty which results in inadequate access to medical services, which contribute to long term condition outcomes in certain populations.


When we fail to engage different ethnicities, we perpetuate health inequities.

I explored the biological process of DNA methylation with a large focus on these conditions, this epigenetic modification that alters the function of genes by the addition of methyl groups to DNA strands, which is influenced by both environmental and genetics. The Marmot: 10 years on review, was another reason I strongly wanted to link the socioeconomic disadvantage to worsening health outcomes on a genetic lens. Its research stated that this link was striking and that in the ten years since his first review, not much change had been done to close these disparities. But these disparities are still widening, without any public health solution being addressed.


In 2009, the genome wide association studies (GWAS), questioned the striking lack of diversity in genomic research that develops precision medicine and that this would restrict. With 96 percent of its participants being of European decent, which highlighted the legal, ethical, and social implications that results in the underrepresentation of the BAME community. There has been a failure in epigenetic research to engage diverse data, especially in the United Kingdom (UK). Whilst in the United States of America there is a broader scope of participants in epigenetic research, trying to reach ethnic minorities in poorer communities to naturally assess the environments that people are disproportionality affected. Sourcing diversity in data was one of the challenges I faced when researching. Many papers expressed the need for diversity and its lack of representation in data, even though most had highlighted the great contributions and barriers broken by diverse population genomic research. There was not a lot of initiative taken on how to approach getting diverse genetic data. This would take us analysing and going into different communities to understand their lifestyles pertaining to health, as well as grouping hereditary trends that are found in ethnic groups facing deprivation.


Whilst the research in epigenetics is developing across the world, the research in the UK does not grasp the full effects on a large enough scale, to make strong enough links to ignite change in society. The UK is lucky enough to have an extensive data ecosystem that can explore the changes that these epigenetic changes cause’. But because of the lack of certainty, we have across deprivation, it is hard to know where to start. All data banks should start to prioritise diversifying their data if we want better health outcomes in the near future.




Anndior is a third year nursing student at the University of Brighton.