Reflecting on the Need and Apparent Futility of Archiving

Published on Mon Jan 23 2023Tarunima Prabhakar

This is an open ended reflection on the need and seeming futility of archiving on social media.

We started Tattle with the goal of making an open and searchable archive of content circulating on chat apps in India. Why? Because we got tired of seeing the same messages being circulated year after year on WhatsApp, as if the message had no context, no history. The same photograph of a soldier sleeping in a snow-filled trench used to refer to soldiers in Ukraine in 2014, and soldiers in Siachen in 2017.

Describing the history and repeated misuse of an image is the easiest way to debunk it. But cataloging this content also helps understand how discourse on chat apps and social media is changing our beliefs and actions. The web "wasn't built to preserve its past.". But for an archivist social media interfaces (think Instagram stories) are a nightmare, since they are deliberately designed to forget it.

We knew archiving from chat apps was going to be a daunting task. But we underestimated just how daunting. There are pragmatic concerns and philosophical ones. Storing a few month's worth of trending images and videos on Indian social media filled up 2TB storage and brought associated cost challenges. What of the consent of the people who post content. What if they don't want their content to be remembered? And doesn't archiving give a longer life to hateful content online?

When we started Uli, the tool to respond to OGBV, we stayed out of the needs assessment phase. Folks from The Center for Internet and Society conducted interviews and focus groups to understand what tools activists and researchers would need to feel safer on social media. Yet again, archiving emerged as an important one. People wanted a record of episodes and instances of harassment, when platforms or the perpetrators took the content down.

It is important to note that harassment is not just the content of the post. It is also the behaviour- a persistent 'hi sexy', or even 'hello' to remind a person of a menacing presence is harassment. When people get trolled, it is an army of accounts repeating the same trope. How does one archive the phenomenon? When a hundred people are hurling the same abuse at you, does archiving a specific abusive post matter? Does it capture the depth or extent of harm?

There is a technical aspect but also a social aspect to archiving. In many ways archiving is the opposite of fast and fiery social media posting. Archiving is slow, laborious and boring. And even though people love the idea of some one archiving, they rarely do it themselves. Archives are critical when things go bad- say when you have to go to, or are approached by law enforcement. But for the most part, things don't.

Furthermore, we archive for someone in the future to make sense of our present. So that they may find narratives distinct from the dominant ones of today, and that the multiplicity of narratives may complement, contrast and perhaps even conflict. But in the vast ocean of social media with the flotsam and jetsam of inaccurate, hateful and lazy content, it can be difficult to figure out what to pick and how much to pick. And, it can be hard to be motivated by the possibility of an imaginary future.

And yet, there is no option but to try, to risk- to experiment. There are attempts to crawl the mobile web, tools for high fidelity capturing of Twitter threads and tools for storing and tagging TikToks. The WayBack Machine shines out as a bold experiment for archiving that has persevered. The organization has demonstrated the long-term need for archiving, especially on an interface designed to forget. But we get into archiving knowing that many experiments will fail because of just how hard it is to keep up with a continually evolving Internet. And that is okay, because the failed experiments are the only pathway towards something working.

Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.