Fun with Fictional Web Sites and the Internet Archive

Fun with Fictional Web Sites and the Internet Archive

As we celebrate the 20th anniversary of the Internet Archive, I realize that using Memento and the Wayback Machine has become second nature when solving certain problems, not only in my research, but also in my life. Those who have read my Master’s Thesis, Avoiding Spoilers on...

Read More
Memento at the W3C

Memento at the W3C

We are pleased to report that the W3C has embraced Memento for versioning its specifications and its wiki. Completing this effort required collaboration between the W3C and the Los Alamos National Laboratory (LANL) Research Library Prototyping Team. Here we inform others of th...

Read More
Mementos in the Raw, Take Two

Mementos in the Raw, Take Two

In a previous post, we discussed a way to use the existing Memento protocol combined with link headers to access unaltered (raw) archived web content. Interest in unaltered content has grown as more use cases arise for web archives. Ilya Kremer and David Rosenthal had previous...

Read More
Symposium on Saving The Web at the Library of Congress

Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What’s on the Internet.

Read More
Mementos in the Raw

Mementos in the Raw

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content acro...

Read More
WWW 2016 Trip Report

WWW 2016 Trip Report

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art...

Read More
Persistent URIs Must Be Used To Be Persistent

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

Read More
Acquisition of Mementos and Their Content Is More Challenging Than Expected

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that ...

Read More
Rules of Acquisition for Mementos and Their Content

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Read More
Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision. Collaborating with Wikipedia, Harihar Shankar, Herbert ...

Read More