Publications of Shawn M. Jones

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

publication journal-article

Nov 7, 2023

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

by Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, and Diane Oyen

Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such...

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

publication conference-paper

Sep 26, 2023

TPDL 2023

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

by Shawn M. Jones, Himarsha Jayanetti, Martin Klein, Michele C. Weigle, and Michael L. Nelson

Web archives are sources of big data. When presenting human visitors with archived web pages, or mementos, web archives often apply user interface augmentations to assist them. Unfortunately, these augmentations present challenges for natural language processing, computer visi...

Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars

publication journal-article

Jul 3, 2023

ACM TWEB

Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars

by Shawn M. Jones, Martin Klein, Michele C. Weigle, and Michael L. Nelson

People often create themed collections to make sense of an ever-increasing number of archived web pages. Some of these collections contain hundreds of thousands of documents. Thousands of collections exist, many covering the same topic. Few collections include standardized met...

Discovering Image Usage Online: A Case Study With

publication poster

Jun 26, 2023

ACM/IEEE JCDL 2023

Discovering Image Usage Online: A Case Study With "Flatten the Curve"

by Shawn M. Jones and Diane Oyen

Understanding the spread of images across the web helps us understand the reuse of scientific visualizations and their relationship with the public. The “Flatten the Curve” graphic was heavily used during the COVID-19 pandemic to convey a complex concept in a simple form. It d...

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

publication conference-paper

Oct 23, 2022

DIRA 2022

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

by Shawn M. Jones, Diane Oyen

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer visio...

Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

publication conference-paper

Sep 20, 2022

TPDL 2022

Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

by Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne, Paul Koerbin, Michael L. Nelson, Michele Weigle

As web archives’ holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms. We note a plethora of different approaches to web archive collection struct...

publication conference-paper

Sep 12, 2022

iPres 2022

Robustifying Links With Zotero

by Martin Klein, Shawn M. Jones

Referencing resources on the web has become an integral part of our digital scholarship. However, the long-term availability and accessibility of these resources has rarely been the focus of significant research and development efforts. In this paper we introduce the Zotero Ro...

The DSA Toolkit Shines Light Into Dark and Stormy Archives

publication journal-article

May 9, 2022

code{4}lib Journal

The DSA Toolkit Shines Light Into Dark and Stormy Archives

by Shawn M. Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Martin Klein, Michele C. Weigle, and Michael L. Nelson

The Dark and Stormy Archives (DSA) Project applies social media storytelling to a subset of a collection to facilitate collection understanding at a glance. As part of this work, we developed the DSA Toolkit, which helps archivists and visitors leverage this capability. As par...

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

publication newsletter

Oct 12, 2021

SIGWEB Newsletter

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

In the Dark and Stormy Archives (DSA) project, we focus on storytelling techniques to summarize collections of archived web pages. Since collections can have hundreds or even thousands of seeds (initial URLs) and each seed can be recrawled many times, with each version separat...

Hypercane: Intelligent Sampling for Web Archive Collections

publication poster

Sep 29, 2021

ACM/IEEE JCDL 2021

Hypercane: Intelligent Sampling for Web Archive Collections

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

Humans can choose individual documents from a web archive collection, but doing so is difficult if they are unfamiliar with the collection. The issue is scale. Most web archive collections consist of thousands of documents. Hypercane is a tool that automates the selection of d...

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

publication conference-paper

Sep 28, 2021

ACM/IEEE JCDL 2021

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

by Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, and Michael L. Nelson

In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying ...

Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

publication phd-dissertation

Aug 26, 2021

Old Dominion University

Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

by Shawn M. Jones

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections be...

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

publication book-chapter

Jul 1, 2021

The Past Web: Exploring Web Archives

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

by Shawn M. Jones, Martin Klein, Herbert Van de Sompel, Michael L. Nelson, and Michele C. Weigle

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Automatically Selecting Striking Images for Social Cards

publication conference-paper

Jun 21, 2021

ACM Web Science 2021

Automatically Selecting Striking Images for Social Cards

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource’s title, text summary, striking image, a...

Robustifying Links To Combat Reference Rot

publication journal-article

Feb 10, 2021

code{4}lib Journal

Robustifying Links To Combat Reference Rot

by Shawn M. Jones, Martin Klein, and Herbert Van de Sompel

Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, ...

Web mentions

InfoDocket

SHARI -- An Integration of Tools to Visualize the Story of the Day

publication workshop-presentation

Aug 4, 2020

Web Archiving and Digital Libraries 2020

SHARI -- An Integration of Tools to Visualize the Story of the Day

by Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the “biggest story” for a given date. StoryGrap...

MementoEmbed and Raintale for Web Archive Storytelling

publication workshop-presentation

Aug 4, 2020

Web Archiving and Digital Libraries 2020

MementoEmbed and Raintale for Web Archive Storytelling

by Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display...

Web mentions

InfoDocket

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

publication conference-paper

Nov 3, 2019

ACM CIKM 2019

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Improving Collection Understanding in Web Archives

publication newsletter

Jan 1, 2019

TCDL Bulletin

Improving Collection Understanding in Web Archives

by Shawn M. Jones

Ever since the Internet Archive started large-scale web archiving in 1996, historians, sociologists, and journalists have found web archives to be an important source of information for their work. Archive-It, a service focused on creating collections, allows curators to gene...

publication conference-paper

Sep 20, 2018

iPres 2018

The Off-Topic Memento Toolkit

by Shawn M. Jones, Michelle C. Weigle, and Michael L. Nelson

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configu...

Web mentions

InfoDocket

publication conference-paper

Sep 20, 2018

iPres 2018

The Many Shapes of Archive-It

by Shawn M. Jones, Alexander Nwala, Michelle C. Weigle, and Michael L. Nelson

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government orga- nizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources...

Avoiding spoilers: wiki time travel with Sheldon Cooper

publication journal-article

Mar 1, 2018

International Journal on Digital Libraries

Avoiding spoilers: wiki time travel with Sheldon Cooper

by Shawn M. Jones, Michael L. Nelson, and Herbert Van de Sompel

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—inf...

publication poster

Jun 16, 2017

IIPC Web Archiving Conference 2017

Uniform Access to Raw Mementos

by Herbert Van de Sompel, Michael L. Nelson, Lyudmila Balakireva, Martin Klein, Shawn M. Jones, and Harihar Shankar

Most web archives augment Mementos when presenting them to the user, often for usability or legal purposes. Research efforts and software projects need access the original captured “raw” Mementos. So that users and software do not need to resort to archive-specific solutions, ...

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

publication journal-article

Dec 2, 2016

PLOS One

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

by Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, and Claire Grover

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they r...

Web mentions

Persistent URIs Must Be Used To Be Persistent

publication poster

Apr 4, 2016

WWW 2016

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

DSHR's Blog, by David Rosenthal

Rules of Acquisition for Mementos and Their Content

publication technical-report

Feb 19, 2016

arXiv

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Avoiding Spoilers in Fan Wikis of Episodic Fiction

publication preprint

Jun 20, 2015

arXiv

Avoiding Spoilers in Fan Wikis of Episodic Fiction

by Shawn M. Jones, Michael L. Nelson

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering “spoilers” ...

Web mentions

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

publication masters-thesis

May 30, 2015

Old Dominion University

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

by Shawn M. Jones

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers” –...

Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

publication technical-report

Jun 16, 2014

arXiv

Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

by Shawn M. Jones, Michael L. Nelson, Harihar Shankar, Herbert Van de Sompel

We have implemented the Memento MediaWiki Extension Version 2.0, which brings the Memento Protocol to MediaWiki, used by Wikipedia and the Wikimedia Foundation. Test results show that the extension has a negligible impact on performance. Two 302 status code datetime negotiatio...