Data Visualization: One Hour of the UK Web Archive Crawl in One Minute
From the UK Web Archive Blog:
Each year we attempt to collect as much of the UK web space as we can. This typically involves millions of websites and billions of individual assets (images, pdf’s, css files etc.). We send out our robots across the interwebs looking for websites that we can archive. The bots follow links to pages that have links to follow and it keeps going until we have archived (almost) everything. But what does it look like to ‘crawl’ the web? Here we have condensed an hour of live web crawling into a one minute video:
Every circle is a different website, and every line represents a link that was followed between websites. The size of the circle represents how many pages we visited from that site, and the width of the line represents the number of links we followed.
The blog posts also notes another visualization that provides a realtime view of what the UK Web Archive is crawling (only available when the crawler is active).
Read the Complete Blog Post
See Also: Direct to UK Web Archive
Filed under: Data Files, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.