Shawn M. Jones | Researcher | Software Engineer

Symposium on Saving The Web at the Library of Congress

blog-post

Jun 26, 2016

Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What’s on the Internet.

blog-post

Apr 26, 2016

WS-DL Blog

Mementos in the Raw

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content acro...

blog-post

Apr 23, 2016

WS-DL Blog

WWW 2016 Trip Report

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art...

Persistent URIs Must Be Used To Be Persistent

publication poster

Apr 4, 2016

WWW 2016

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

DSHR's Blog, by David Rosenthal

Acquisition of Mementos and Their Content Is More Challenging Than Expected

blog-post

Feb 23, 2016

WS-DL Blog

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that ...

Rules of Acquisition for Mementos and Their Content

publication technical-report

Feb 19, 2016

arXiv

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

blog-post

Sep 8, 2015

WS-DL Blog

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision. Collaborating with Wikipedia, Harihar Shankar, Herbert ...

blog-post

Sep 1, 2015

WS-DL Blog

From Student To Researcher II

After successfully defending my Master’s Thesis, I accepted a position as a Graduate Research Assistant at Los Alamos National Laboratory (LANL) Library’s Digital Library Research and Prototyping Team. I now work directly for Herbert Van de Sompel, in collaboration with my ad...

Avoiding Spoilers in Fan Wikis of Episodic Fiction

publication preprint

Jun 20, 2015

arXiv

Avoiding Spoilers in Fan Wikis of Episodic Fiction

by Shawn M. Jones, Michael L. Nelson

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering “spoilers” ...

Web mentions

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

publication masters-thesis

May 30, 2015

Old Dominion University

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

by Shawn M. Jones

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers” –...

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars

Discovering Image Usage Online: A Case Study With "Flatten the Curve"

ECCV 2022 and DIRA 2022 Trip Report

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Symposium on Saving The Web at the Library of Congress

Mementos in the Raw

WWW 2016 Trip Report

Persistent URIs Must Be Used To Be Persistent

Web mentions

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Rules of Acquisition for Mementos and Their Content

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

From Student To Researcher II

Avoiding Spoilers in Fan Wikis of Episodic Fiction

Web mentions

Avoiding Spoilers on Mediawiki Fan Sites Using Memento