Feminism, Authorship Attribution, and the Value of the Hidden Work of Data ‘Cleaning’

When I joined the Thomas Nashe Project three years ago to work on Nashe’s dubia,[1] I was new to attribution studies. The responses I got from colleagues and academic friends over the next few months involved a lot of raised eyebrows, comments such as ‘you’re brave getting involved in that’ or ‘isn’t it a bit…macho?’, and someone muttered something about a tape measure. Not being one for melodrama, I carried on with the labour-intensive marking up of texts for stylometric analysis regardless. As a sixteenth-century scholar working on poetics and the occult tradition, my thesis was dominated by male writers, both in terms of primary texts and parts of the critical field. In that sense, the macho environment of attribution studies didn’t bother me as much as it probably should have done: I have held my own as female academic and saw no reason why I shouldn’t continue to do so, especially working for a project with such supportive colleagues, men and women. Now that the bulk of my work on Nashe’s dubia is finished, I have been able to reflect a little on the aggressive nature of attribution studies, a trait which is damaging on multiple fronts.

Andy Kesson hit the nail on the head in his Before Shakespeare blog series on attribution (2017) when he observed ‘I cannot imagine how difficult it must be for junior or female scholars or for anyone uncomfortable with the current state of attribution to engage critically with its practices’. His posts elicited a number of responses about attribution studies generally: some curious and hopeful of a constructive dialogue, others more confrontational.

To say that the field of attribution studies is an uncomfortable space is an understatement, and certain parts of it are particularly bad (*cough* Shakespeare *cough*).[2] The criticism of ‘rivals’ in attribution studies can be vituperative and targeted; some of the debate/shaming is carried on in national newspapers. If daring to publish something on attribution effectively means sticking a target on your back and declaring allegiance to one boys’ club or another, why would anyone – least of all a woman and an early career academic – bother?

There is of course nothing wrong with being protective of one’s work or constructively critical of someone else’s work. Perhaps in attribution studies there is more of a sense of a need to provide the answer, to crack the code, to prove something definitively, and it is this that encourages contest. It is easy to be seduced by the final result: the graph that proves that so-and-so may or may not have written a text, the charts that squash unfathomable amounts of data into clear, precise points that can be summed up in a neat visualisation. Perhaps it is this claim to have answers – real answers, not subjective arguments – that produces more aggressive commentary and defensive criticism than is considered acceptable in other academic fields that are more open to complex evidence and shades of meaning.

Take, for example, the methodology used by Brett Greatley-Hirsch for the Nashe Project: Principal Components Analysis. PCA is a data reduction method and its use in authorship attribution is due to the fact that ‘when analysing word-frequency counts across a mixed corpus of texts known to be of different authorship, the strongest factor that emerges in the relationship between the texts is generally authorial in nature’.[3] Like any other method, it has its critics.[4] The Nashe Project used PCA to help the editorial team confirm  the texts we wanted to include from those uncertainly attributed to Nashe in the forthcoming edition of his works. But this is only one of the several methods the team has used. PCA is a footnote to the swathes of scholarship that have gone before, and to the archival scholarship conducted by members of the team themselves. Tempting as it is to be seduced by a few graphs, no single method can claim to hold the definitive truth.

But, if we remain fixated on answering who did or did not write a text, and bemused by the ferocity of the debate between some attribution scholars, what are we forgetting about?

Most obviously, from the point of view of the researchers at least, we miss the labour it takes to get a result. Data preparation or data cleaning is at least 80% of the work and it is often done by post-docs and students. The workload can be vast and tedious. It is also, unfortunately, often ignored or forgotten about. In their book Data Feminism (2020), Catherine D’Ignazio and Lauren F. Klein note that ‘seen’ labour has value, meaning that the hours of tedious research and data preparation that go into a visualisation often remains unseen and uncredited. Yet, that labour is crucial because it provides more than data to be processed. For a start, it creates experts, the people who can explain why the processed data behaves in particular ways, or why there might be a surprising result. It is easy to become protective of our work because of the blood, sweat, and tears – literally, I tagged all 72,000 words of Gabriel Harvey’s Pierces Supererogation (1593), the sheer length of which even Nashe complains about – that we put into processes like tagging or data cleaning. Without a valid result, the whole enterprise seems pointless. Failure or a non-result is simply not on our radar, whereas our colleagues in the sciences accept these things as a matter of course. In addition, there is often no opportunity for a double-blind study as current funding is not generous enough to cover the cost of running the same tests again to double-check; a lot rides, then, on getting it right first time, and having something to show for it.

I spent hundreds of hours marking up function words in a corpus of texts that I had researched, and I breathed a huge sigh of relief when the data was processed and produced graphs with neat clusters indicating different authors. Even I, the intrepid (and tired) tagger, only saw value in the visualisations of the processed data on the screen, not in the work done to get to that point.

Data-cleaning, data-preparation, and tagging sound like tedious processes. Let’s be honest, they sound dull. Boring. Drying paint suddenly seems interesting by comparison. I won’t lie: tagging has its dull moments (did I mention Pierces Supererogation?). But it is also incredibly exciting and in those hours of seemingly futile labour are moments of crystallisation, original discovery, and surprise, and I’d like to focus on these now.

Engaging so closely with those texts we ‘clean’ and tag to glean particular data can be intellectually challenging and reveal unknown information; it is also a surprisingly emotional experience. Emotional labour is often seen as feminine and somehow unscholarly.

These emotive responses are worthy of comment though. I had read Christs Teares over Jerusalem (1593) as a postgraduate and I knew it contained graphic descriptions of dead bodies, pain, and suffering  However, I was unprepared for the visceral reaction I felt when I tagged the section with Mariam’s cannibalism of her own child. It was nauseating. It was not the actual moment of decapitation and cannibalism which got to me, but the speeches either side that describe Miriam’s desperation through to the smell of the dead ‘rost’ child. I had a similar response when tagging the rape of Heraclide in The Unfortunate Traveller (1594).  Again, it was not the brief description of the rape itself that caused a reaction, but the build-up to it and the exploration of the trauma after the event. This is more than gratuitous violence: Nashe makes something of the desperation of the characters seep through the layers of graphic description and revulsion which, when read closely, are truly affecting. The process of tagging focussed my attention differently as a reader, and allowed this desperation to seep into my consciousness, which surprised me. I wonder if reading that is active but not necessarily critically engaged, like tagging, might mean that there is little mental preparation for these graphic episodes; we become conscious in the midst of the violence and graphic description without processing it critically, leading to a more profound experience than sitting down and reading a text. This is something I’m still thinking about.

The experience of tagging texts is very different from other ‘reading’ experiences and constitutes a form of emotional labour – often coded as feminine and neither scholarly nor noteworthy – which is never reflected in a project’s output. When tagging a text, there is no escape from it. Every bit of it, no matter how gruesome, has to be treated in the same way. There is thus no opportunity to skim or block out portions of text even on a subconscious level. The engagement with the text is different, it transforms the reading experience and thus produces new insights and discoveries. The emotional labour that comes with it shouldn’t be ignored. In tagging and reacting to Miriam’s cannibalisation of her son and the rape of Heraclide, I was able to experience, I believe, something of Nashe’s intention as the writer and how he expected his text to affect the reader. I was more alert to the live ‘voices’ in the text that are intended to trouble us. As a consequence, I approach Nashe differently than I did before, which I can now turn into a contribution to studies on Nashe and give scholarly value to that emotional labour. My self-awareness as a tagger, I would like to suggest, has a place in the Nashe Project – it is confirming and shaping the impact we believe Nashe wanted to have.

In shifting the focus from the end-result to the labour that gets the result, attribution studies could open itself up to other conversations and engage with other disciplines and theories. By focussing more on the work that goes into the results, the heat could be taken out of the debate as it would allow those doing the work to contribute their insights. These contributions wouldn’t rely on the infallibility of methods but ask what else can be discovered through the many different processes of exploring authorship. Of course, I am not alone in seeing the value of the labour itself and using it as a research tool, but this approach is not usually part of the conversation in attribution studies. If attribution studies could embrace the messiness of the data behind its neat visualisations through showing its hidden labour – often done by women and ECAs and downplayed – then the field could change for the better. Perhaps some of the discomfort surrounding attribution studies would dissipate and new experiences and discoveries could be shared and celebrated as part of it. Let’s not get distracted by shiny visualisations but look beyond to the work – and the workers – that produced it. That is where the real discoveries lie.

 

With thanks to Andy Kesson, Alan Hogarth, Brett Greatley-Hirsch and the Nashe project team who were kind enough to comment on earlier versions of this piece.

 

[1] These are the texts that Nashe may have written pseudonymously or have had a hand in writing.

[2] Coughs are not Coronavirus related.

[3] Hugh Craig & Brett Greatley-Hirsch, Style, Computers and Early Modern Drama: Beyond Authorship (Cambridge: Cambridge University Press, 2017), 32.

[4] See, for example, Nan Z. Da, ‘The Computational Case Against Computational Literary Studies’, Critical Inquiry, 45:3 (2019), 601-639.