Sign In

Communications of the ACM

ACM TechNews

A Glimpse of the Archives of the Future

View as: Print Mobile App Share:
preservation view visualization

A partial preservation view of the U.S. Geological Survey Record Group including different files organized in different arrangements that shows, in different colors, the different preservation risks of the files.

Credit: Maria Esteva, Weijia Xu, Suyog Dutt Jain, and Varun Jain / TACC

The National Archives and Records Administration enlisted the Texas Advanced Computing Center (TACC) to find innovative and scalable solutions to large-scale electronic records collections. The TACC researchers developed a multi-pronged approach that combines different data analysis methods into a visualization framework. Archivists try to determine the organization, contents, and characteristics of collections so they can describe them for public access.

The TACC team adapted a treemap visualization technique to render additional information dimensions, such as technical metadata, file format correlations and preservation risk levels. The renderings are specifically designed to suit the archivist's need to compare different groups of electronic records. The researchers also developed an analysis method that combines string alignment algorithms with natural language processing methods, which will help archivists determine how a group of records is organized.

The researchers are developing another analysis method that computes paragraph-to-paragraph similarity to discover stories from large collections of email messages.

From National Science Foundation
View Full Article


Abstracts Copyright © 2011 Information Inc., Bethesda, Maryland, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account