ODU Research Spurs World Interest in Lost Tweets from Egyptian Revolution
Hany SalahEldeen, a Ph.D. student in computer science at Old Dominion University, never would have guessed how much social media traffic he would generate when he posted the article, "Losing My Revolution," Feb. 11 on the Web page of the Web Science and Digital Libraries Research Group led by ODU computer scientist Michael Nelson.
SalahEldeen is Egyptian and the article he posted documents his findings about the fate of social media content produced during last year's Egyptian revolution: more than 10 percent of it has vanished.
There may have been some scrubbing of the record by people trying to protect themselves, but the young researcher believes most of the lost tweets, blog text and photographs/videos disappeared for more mundane reasons that illustrate the ephemeral nature of social media.
The posting has generated brisk twitter traffic around the world, and on Thursday, Feb. 16, the website of The Atlantic magazine posted an article about SalahEldeen's research. The overline was "Twitter gives us a new version of 'the first rough draft of history.' But tweets are fragile things."
"After only one year, more than 10 percent of the media that we thought we had stored for future generations was gone," SalahEldeen wrote in his initial article. "If the decay continued at the same rate and if we didn't do anything to preserve this digital heritage of the revolution, in less than 10 years there will be no story to tell for the future generations and we will lose these magnificent collections that can show what thousands of books couldn't convey."
Social networks such as Twitter, YouTube and Facebook have been credited with helping Egyptian revolutionaries organize - and gain international support - during the protests and street fighting in January and February of 2011 that led to the fall of the Mubarak government.
ODU's Nelson is a prominent researcher in digital preservation and one of the creators of the Memento architecture that has been called "time travel for the Web." Its development has been supported by the Library of Congress under the National Digital Information Infrastructure and Preservation Program.
In 2010, Memento won the Digital Preservation Award from the Institute for Conservation and Digital Preservation Coalition (DPC) based in London.
Nelson, an associate professor of computer science, said he and SalahEldeen decided with the first anniversary of the Egyptian revolution coming up to see how much of the information about the uprising and victory celebrations was still available on the social networks. The work has produced not only interesting results, but also some quandaries.
"Gathering the data was difficult: there is no single clearinghouse for all videos, images, tweets, etc.," Nelson said. "We ended up gathering our data from services that provide collection development, such as Storify, and sharing, such as IamJan25, so the resources that we measured were deemed important enough to be included in these collections by at least one person."
SalahEldeen scrutinized traffic between Jan. 20 and March 1, 2011. To rule out the possibility of transient errors skewing the results, he repeated the experiment three times over three weeks before declaring a resource missing.
"This study has brought to light many questions regarding the role of user-contributed social media documenting historical events," SalahEldeen said. "Thousands of people taking images and videos with their cell phones helps document the event, but that does not mean there are thousands of people who are committed to curating these resources in perpetuity. We'd like to archive these resources to protect against loss from accident or negligence, but it is not clear how archiving should apply to those who wish to withdraw their resources from the public record, especially if they fear reprisal because of the content of their images."
The ODU researchers are also contemplating another difficult issue: the problem of duplicate and near-duplicate resources. If an image is popular, it is likely to be copied many times. If one of those copies is lost or deleted, the image itself has been lost since multiple copies still remain. On the other hand, after the loss has occurred it is hard to say if the lost image was a copy of another image or an original image.
"Near duplicates are even trickier," SalahEldeen pointed out. "If two people stand shoulder to shoulder and take a picture at the same time and include those images in tweets, then there are two images in the system. But they could be considered interchangeable in the event that one of the images is lost." For example, in a Storify entry, instead of a having a tweet with a missing image, archivists may substitute similar images, even if from different angles.
"But how or when you work out these relationships is not clear, nor is it clear where the boundaries of interchangeability are. Answering these questions and further investigation to the problem we have uncovered will be in the focus of our research in the next period."
Nelson said this vein of research "fits in with our overall vision of preserving the Web."
He said social media such as Twitter, YouTube and Facebook are easy to overlook because their content is often of limited value. "Kitten photos and lunch updates" is how he described much of what is found on these networks.
"But the Egyptian revolution had a feedback loop with social media, simultaneously shaping and informing. Social media is interesting because its ease of access allowed this broad documentation of a revolution. But its low barrier of entry also means that it is likely to be poorly curated by the masses who participated. A news agency is likely to diligently preserve its photos and videos, but the same cannot be said for the mass of amateur documenters."
Nelson believes SalahEldeen's research is at the nexus of social media and Web archiving.
"Despite its prevalence in our society, social media is poorly represented in Web archives. This is due to a variety of reasons." He mentions engineering constraints such as the large files of YouTube videos; legal reasons such as those posed by videos that are posted by people who do not own the copyright; and user pushback, as shown at http://noloc.org/, by groups wary of the Library of Congress having an archive of tweets.
Memento is a protocol that allows web browsers to easily access the holdings of different Web archives. Unfortunately, Nelson added, since a lot of social media does not make it into Web archives in the first place, Memento cannot help with, for example, the recovery of lost images in tweets.
"Hany and I are working with a variety of techniques to demonstrate the usefulness of web archiving in the sharing of resources in social media. It is our hope that if we can demonstrate personal utility for archiving this material, we can overcome some of the resistance to Web archiving in general," Nelson said.
"We thought measuring the loss of social media about the Egyptian revolution would be an example that would resonate with many people, hopefully demonstrating that social media is more than cute kitten photos that don't need to be archived."