Visualisation of data bias within genomic datasets

Stefanie Posavec

I’m a designer, artist, and author and I mainly work on creating experimental data design projects. My practice focuses on creating non-traditional (physical, danceable, wearable, or experiential) representations of data for all ages and audiences, often using a hand-crafted approach.

For the majority of my career, my projects centre around the visualisation of a specific dataset, where the work’s aesthetic and message are created through faithfully translating every data point into graphic elements and visual form. However, I've recently started working with data in a different way, exploring how to visualise the processes of acquisition, preparation, storage, and analysis inherent in every dataset.

When visualising these data processes, I don’t work with actual datasets but instead use the visual language and design processes of data visualization to ‘visualise’ hard-to-define concepts and processes with blurry edges that aren’t as precise as ‘hard’, quantitative data.

This way of working with (but not working with) data started during my recent ‘art as inquiry’ residency with the UK research group People Like You, a group that is looking at the outcomes and consequences of data personalisation. During the residency I was working under Helen Ward at Imperial College, creatively exploring how data is used to personalise medicine.

For my residency, I investigated how the various stakeholders within a biobank perceive the ‘people behind the numbers’ who consent to their biological samples and data being used and stored for research. My focus was on the participants and researchers of the Airwave Health Monitoring Study, a cohort study and biobank based at Imperial College that’s been following the lives of 53,000 members of the police force since 2003.

After interviewing Airwave’s staff and study participants, I created a detailed, drawn map of the winding journey a participant’s biosamples and data take within the study, alongside which I also created a series of drawings that present the various perspectives of study stakeholders from their ‘positions’ within the Airwaves system, showing how their ability to ‘see’ the individual participant within the aggregated data changes depending on their location.

Data Murmurations: The System Map

Data Murmurations: The Database Manager

During this residency, I couldn’t access any study data (alas, making an artwork isn’t a valid research reason to use such sensitive data!) so instead of starting with a dataset, I started with drawing. Drawing became my way to understand and ‘figure out’ all the working parts of the Airwave system, where I built a unified visual language that, in different combinations, was used to ‘visualise’ every part of the study’s data acquisition process.

With these ‘artistic diagrams’, I made sure that I was faithfully presenting whatever data and information that I had discovered, but also accepting that my end result might not have the precision inherent in visualising ‘hard’ data. Here, creative license and subjectivity become tools in finding ways to communicate these invisible data processes.

I’ve used this approach to create these three drawings of possible data gaps for the Genomics England Diverse Data initiative. I started by thinking of visual metaphors (modelling clay, sieves, spotlights, piles of objects) that draw on common experiences and objects in the physical world. Using visual metaphors drawn from the ‘real’ world around us takes the slippery, stealthy, hard-to-define nature of a data gap and gives it a concrete, physical form, making it able to be understood more easily, and unable to be overlooked in future studies.

Working with (but not working with) data has convinced me of the value of incorporating subjective art practices within a rigorous research space - through the act of drawing, you can visualise data from a different angle. These drawings can function as conversation starters for researchers as they solidify their formal research and communicate it to a wider audience.

Data bias arising from data that is unable to be disaggregated

This drawing communicates how bias can arise from data that is unable to be disaggregated, thus removing opportunities to possibly gain insights about a minority group merged within the aggregated data.

In the drawing, many strands of disaggregated, multicoloured data points are being collected by a study, where they travel down a funnel that slowly compresses these data points into a solid mass (an aggregated dataset).

The visual metaphor I’ve used to communicate this is of multicoloured modelling clay, where when different colours are mixed and compressed, they can’t be unmixed.

Data bias arising from data architectures that are not fit for purpose

This drawing communicates how data bias can arise from data architectures not designed in a way that accommodates the data collection needs of a specific population, where some of their data might not 'fit' the architecture. In this scenario, data gaps occur if the mismatch of data architecture to population means the data is relegated to a less precise, messier part of the data structure (for example, is placed in the catch-all category ‘other’ instead of having a specific location) or not collected whatsoever.

In the drawing, data points are dropping into a data structure. Data points of a specific shape and size fit neatly through the structure’s holes and are collected in a tidy manner. Data points that are too large for the structure’s hole don’t fit and so bounce and scatter around the outside of the structure in chaotic fashion.

The visual metaphor used in this drawing alludes to sieves, colanders, and shape sorting toys, where all have been designed to have specific sized and shaped holes that only allow certain types of elements to pass through their structures.

Data bias arising from the questions a study chooses to ask

This drawing communicates how bias can arise from the questions that one chooses to ask and answer through data analysis, as if you ask a question that mainly serves the majority population the study results may offer less impact for a minority group.

The visual metaphor I’ve used to communicate this is of a shining spotlight. The study’s main question functions as a sort of 'spotlight' / 'view' upon a dataset, where a question will 'shed light' and illuminate insights for a specific part of the population.

In the drawing, a bold spotlight representing the study’s chosen question shines mainly over the majority population, leaving the minority group ‘in the dark’. In the background, spotlights with dotted lines represent other possible study questions that researchers could have chosen to ask/include that could potentially shed more light on a minority population, either through asking broad questions that answer questions for an entire population, or by asking questions specifically targeted towards the minority population.


Stefanie Posavec is a designer, artist, and author focused on creating playful, accessible, human-scaled approaches to communicating data.

Her data-driven work has been exhibited internationally at major galleries including the V&A, the Design Museum, Somerset House, and the Wellcome Collection (London), the Centre Pompidou (Paris), and MoMA (New York). She was Facebook's first data-artist-in-residence at their Menlo Park campus, and recent art residencies include the National Maritime Museum in Greenwich, London, and People Like You (Warwick / Goldsmiths / Imperial). Her work is also in the permanent collection of MoMA, New York, and was nominated for the London Design Museum’s ‘Designs of the Year’ competition in 2016.

Her latest illustrated book (I am a book. I am a portal to the universe., co-authored with Miriam Quick) was named one of the Financial Times’s ‘Best Books of the Year 2020’ and shortlisted for the Royal Society’s Young People’s Book Prize 2021. She has also co-authored two books that emphasise a handmade, personal approach to data: Dear Data and the journal Observe, Collect, Draw!