ODU Researchers Probe Disappearance of Web Resources About Historic Events
Social media traffic generated by people involved in major world events such as the current turmoil in Syria or the swine flu pandemic in 2009 can provide textual and photographic documentation that will be invaluable to historians and others who one day will assess the events.
But researchers at Old Dominion University have looked closely at six such events between 2009 and 2012 and found that interesting nuggets disseminated by social media tend to have a limited Web existence and may not be available to scholars and journalists in years to come.
Michael Nelson, associate professor of computer science at ODU and one of his graduate students, Hany M. SalahEldeen, probed social media associated with the events and found that about 27 percent of the relevant resources had disappeared from the live Web and public Web archives within two and a half years following an event.
SalahEldeen was at the International Conference on Theory and Practice of Digital Libraries in Paphos, Cyprus, Sept. 23-27 to present a paper he and Nelson wrote about their research.
The title of the paper - "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?" - reflects SalahEldeen's personal interest in the research. He is Egyptian and it was his concern about the resources mediated by Twitter, Facebook and YouTube about the Egyptian revolution in early 2011 that first led him to explore the lifespan of social media traffic.
Early this year the young Ph.D. student posted an article on the Web page of Nelson's research group reporting that about 10 percent of the resources (images, videos or Web pages, for example) conveyed by the Egyptian revolution social media traffic had vanished after one year.
For example, one tweet from Feb. 1, 2011, says "HILARIOUS sign in Tahrir Square" and links to a photo. But the link no longer produces the photo.
SalahEldeen's first report on the research got media attention throughout the world, and also convinced Nelson that more work was needed.
The researchers expanded their survey by adding five more events: the H1N1 virus outbreak, Michael Jackson's death, the Iranian elections and protests, Barack Obama's Nobel Peace Prize and the Syrian uprising.
"Social media content has grown exponentially in the recent years and the role of social media has evolved from just narrating life events to actually shaping them," the authors write in the most recent paper. Since their research about the Egyptian revolution, they add, they have explored "how many resources shared in social media are still available on the live Web or in public Web archives.
"By analyzing six different event-centric datasets of resources shared in social media in the period from June 2009 to March 2012, we found about 11 percent lost and 20 percent archived after just a year and an average of 27 percent lost and 41 percent archived after two and a half years."
Nelson and SalahEldeen also found a "nearly linear relationship between time of sharing of the resource and the percentage lost. ... From this model we conclude that after the first year of publishing, nearly 11 percent of shared resources will be lost and after that we will continue to lose 0.02 percent per day."
These latest findings have again gotten the attention of world media. For example, a columnist on the BBC website used the research as a starting point for an article that appeared Friday, Sept. 28.
What is most vulnerable, according to the columnist, "is the network of living connections into which social media is a window: the nexus of sources, resources, sounds, images and updates that together constitute the stuff of many millions of people's daily experience. One commercial firm may well be able to sell you every extant public tweet ever sent - and another may do the same for other social media services. As work like SalahEldeen and Nelson's study suggests, however, preserving these individual threads does little by itself to stop the tapestry of present history unraveling."
SalahEldeen said he is aware that in the case of the Egyptian revolution, there may have been some scrubbing of the record by people trying to protect themselves, but the young researcher believes most of the lost resources from tweets, blog text and photographs/videos disappeared for more mundane reasons that illustrate the ephemeral nature of social media.
ODU's Nelson is a prominent researcher in digital preservation and one of the creators of the Memento architecture that has been called "time travel for the Web." He developed Memento together with colleagues at the Los Alamos National Laboratory with support from the Library of Congress under the National Digital Information Infrastructure and Preservation Program.
In 2010, Memento won the Digital Preservation Award from the Institute for Conservation and Digital Preservation Coalition based in London.
This latest work has produced not only interesting results, but also some quandaries, Nelson said. "Gathering the data was difficult: there is no single clearinghouse for all videos, images, tweets, etc. We ended up gathering our data from services that provide collection development, such as Storify, and sharing, such as IamJan25, so the resources that we measured were deemed important enough to be included in these collections by at least one person."
In an interview earlier this year concerning the research about the Egyptian revolution, SalahEldeen said, "This study has brought to light many questions regarding the role of user-contributed social media documenting historical events. Thousands of people taking images and videos with their cell phones helps document the event, but that does not mean there are thousands of people who are committed to curating these resources in perpetuity. We'd like to archive these resources to protect against loss from accident or negligence, but it is not clear how archiving should apply to those who wish to withdraw their resources from the public record, especially if they fear reprisal because of the content of their images."
Nelson believes SalahEldeen's research is at the nexus of social media and Web archiving.
"Despite its prevalence in our society, social media is poorly represented in Web archives. This is due to a variety of reasons." He mentions engineering constraints such as the large files of YouTube videos; legal reasons such as those posed by videos that are posted by people who do not own the copyright; and user pushback, as shown at http://noloc.org/, by groups wary of the Library of Congress having an archive of tweets.
Memento is a protocol that allows Web browsers to easily access the holdings of different Web archives. Unfortunately, Nelson added, since a lot of social media content does not make it into Web archives in the first place, Memento cannot help with, for example, the recovery of lost images in tweets.
"Hany and I are working with a variety of techniques to demonstrate the usefulness of Web archiving in the sharing of resources in social media. It is our hope that if we can demonstrate personal utility for archiving this material, we can overcome some of the resistance to Web archiving in general," Nelson said.