New from JSTOR: Early Journal Content Metadata+OCR Data Bundle Now Available
Via a JSTOR Tweet
From the JSTOR’s Data For Research Web Site:
We are happy to also make a data bundle for the Early Journal Content freely available to those who would like to conduct data mining or other research across the content.
The data bundle for EJC includes full-text OCR and article and title-level metadata. The Read Me file explains the data in more detail. The currently available data bundle includes all the EJC as of September 7, 2011.
Please note that use of the Early Journal Content bundle is subject to the Early Journal Content Specific Terms and Conditions of Use.
To access the data bundle, please create an account using the very brief registration form, or login if you already have a Data for Research account. We plan to update the bundle on a semi-regular basis and to alert registrants when the bundle has been updated.
The format of the data bundle is a .tar.gz archive containing a readme file explaining the format of the data files, and an XML file for each article in the Early Journal Content bundle.
Once logged in, you can download the Early Journal Content bundle here
The size of the bundle is approx. 2.3 GB compressed, and 7.2 GB inflated.
Filed under: Data Files, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.