Re-animating a Social Science Data Set: A Reflection on Access and Preservation

Sharon Webb

One of the main goals of the Reanimating Data Project (2018-20) was to archive and make publicly available the interviews and field notes of the 1980s research project, Women, Risk and AIDS Project (WRAP). The collection, created between 1989-90, had for some time existed in various boxes, hard-drives, computers and folders in a London attic. In this current moment, the original sociological data-set is now historical, they are primary material which capture a unique moment in British history, of youth sexuality, of sub-cultures, of sex education, of sexual norms and perceptions, as well as teenage anguish and family life.

My role in the Reanimating Data Project was ensuring that the we had the correct infrastructures in place to archive the collection in the long-term, as well as for immediate access. Our plan was simple, use the University of Sussex institutional repository, Figshare, as means to store the entire original collection and the anonymised versions in the long-term, thus ensuring sustained access to this important collection long after the RAD project funding ends. Secondly, use Omeka, an open source publishing platform, to provide users, researchers and students with an access point that includes curated exhibitions that reflect the concerns, priorities and research of the current project team and wider partners. Following our own logic, we are also archiving the research data generated by the current project, therefore, we are currently working to catalogue the various experiments, project documents, etc., that we have generated – archiving in this sense is iterative, cyclical – we are archiving with the worm hole in mind, as we traverse to the future, we anticipate the needs of those researchers.

It’s important that we separate these activities and take advantage of the infrastructure available to us. We are fortunate to have the support of the institution, this ensures the burden of responsibility shifts from the project to a much larger machine – a third level institution. Knowing that the University, through the Library and ITS, take seriously the challenge of long-term digital preservation is a comfort but as a project we have a responsibility to ensure the way in which we describe the collection is future proof – there is no point creating collections that no one can find, or that users cannot assess (at a glance) if the content is useful to them or not. It is for this reason, that we spent a lot of time making sure that the metadata was a certain standard, that our descriptions were useful and our subject terms appropriate and standardised. I remember the first conversation I had with Rachel about this project, I cautioned that we should not underestimate the time and energy it takes to write metadata and to archive the collection. This task is ever more complicated by the nature of the objects – transcripts in various formats and with varying length. Each interview had to be read from start to finish in order to give a proper summary for the metadata description, the level of anonymity painstakingly reviewed and rediscussed and reviewed again. Even settling on subject key terms is a challenge and a task in and of itself, especially when discipline specific controlled vocabularies like HASSET seem archaic and outdated.

When we showcased Figshare and Omeka for the first time at our Edinburgh workshop (Nov. 2019) we were asked a question from the audience – but what can you do with the archive? Can I do text analysis? Beyond accessing it, what can I do? I struggled to answer the question momentarily (trying to remember any of limited functionality that comes with Figshare and what we planned to implement for Omeka), but then I remembered that, as project, we were tasked with archiving the collection first and foremost. And while, to some, mere access is no longer enough, the task of providing access is massive. Increasingly, we are used to things being readily available, and while additional user functionality is required (and I advocate for it), in some cases “mere” access is a luxury and not always a given. We have made the dataset available and from that starting point researchers and users can create additional access points, through text analysis, through data visualisation. As a project team we have experimented with feminist chat bots and sound installations among others… because we now have access to a previously closed, inaccessible collection.

The use of the two platforms, Figshare and Omeka, also allows us to interact with different audiences. Through Figshare the WRAP collection is available to an international audience, it automatically has a wider reach and is part of rich research, international, eco-system. Omeka, on the other hand, allows us to give project partners, researchers, and other users the opportunity to contribute content, to curate exhibitions which bring collection items together and to be part of the RAD project and team (and I always smile when I see the “RAD” team…because it really is rad!).

I’ll probably think more about lessons learned in a few weeks and document more fully our process but in the mean time if you have any questions about our archiving approach and method please feel free ask, And don’t forget to check out the collections on Figshare (available now) and on Omeka (soon to be published).

Thomson, Rachel (2020): Women, Risk and AIDS Project, Manchester, 1989-1990. figshare. Collection.