Symposium on Saving The Web at the Library of Congress

Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What’s on the Internet.

Read More
Mementos in the Raw

Mementos in the Raw

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content acro...

Read More
WWW 2016 Trip Report

WWW 2016 Trip Report

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art...

Read More
Persistent URIs Must Be Used To Be Persistent

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

Read More
Acquisition of Mementos and Their Content Is More Challenging Than Expected

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that ...

Read More
Rules of Acquisition for Mementos and Their Content

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Read More
Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision. Collaborating with Wikipedia, Harihar Shankar, Herbert ...

Read More
From Student To Researcher II

From Student To Researcher II

After successfully defending my Master’s Thesis, I accepted a position as a Graduate Research Assistant at Los Alamos National Laboratory (LANL) Library’s Digital Library Research and Prototyping Team. I now work directly for Herbert Van de Sompel, in collaboration with my ad...

Read More
Avoiding Spoilers in Fan Wikis of Episodic Fiction

Avoiding Spoilers in Fan Wikis of Episodic Fiction

by Shawn M. Jones, Michael L. Nelson

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering “spoilers” ...

Web mentions

Read More
Avoiding Spoilers on Mediawiki Fan Sites Using Memento

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

by Shawn M. Jones

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers” –...

Read More