Event

Apr 18, 2023
Researching Web Archives and the Materiality of Born-Networked Texts

This talk considers how the Web ARChive (WARC) file format has come to play a central role in the development and standardization of interoperable tools and methods used by the web archiving community. In the context of emerging big data approaches, I consider the sociotechnical relationships between material construction of data and information infrastructures for collecting and research. Through examples drawn from fieldwork observations studying two data-centered research projects, I consider how the materiality of the WARC format influences research methods and approaches to data extraction, selection, and transformation. Findings identify three modalities researchers use to configure WARC data for researcher needs: using indexes to support search queries, constructing derivative formats designed for certain types of analysis, and generating custom-designed datasets tailored for specific research purposes. Findings additionally reveal similarities in how these distinct methods approach automated data extraction by relying upon the WARC’s standardized metadata elements.

Our speaker is Emily Maemura, assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. Her research focuses on data practices and the activities of curation, description, characterization, and re-use of historical web data. She is interested in approaches and methods for working with archived web data in the form of large-scale research collections, considering diverse perspectives of the internet as a historical object and site of study.

 

Address
Max Planck Institute for the History of Science, Boltzmannstraße 22, 14195 Berlin, Germany
Room
Zoom/Online Meeting Platform
Contact and Registration

For further information and registration, please contact Kim Pham: kpham@mpiwg-berlin.mpg.de

About This Series

Brown Bag Lunch is a meeting of researchers at the MPIWG who use or want to learn more about digital research methods, broadly encompassed by the term Digital Humanities. In the Brown Bag Lunch meetings, researchers can discuss tools, share ideas and experiences (good and bad), and learn from each other. Each session explores a new topic; workshops are usually interactive, and we often invite external speakers.

2023-04-18T15:30:00SAVE IN I-CAL 2023-04-18 15:30:00 2023-04-18 16:30:00 Researching Web Archives and the Materiality of Born-Networked Texts This talk considers how the Web ARChive (WARC) file format has come to play a central role in the development and standardization of interoperable tools and methods used by the web archiving community. In the context of emerging big data approaches, I consider the sociotechnical relationships between material construction of data and information infrastructures for collecting and research. Through examples drawn from fieldwork observations studying two data-centered research projects, I consider how the materiality of the WARC format influences research methods and approaches to data extraction, selection, and transformation. Findings identify three modalities researchers use to configure WARC data for researcher needs: using indexes to support search queries, constructing derivative formats designed for certain types of analysis, and generating custom-designed datasets tailored for specific research purposes. Findings additionally reveal similarities in how these distinct methods approach automated data extraction by relying upon the WARC’s standardized metadata elements. Our speaker is Emily Maemura, assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. Her research focuses on data practices and the activities of curation, description, characterization, and re-use of historical web data. She is interested in approaches and methods for working with archived web data in the form of large-scale research collections, considering diverse perspectives of the internet as a historical object and site of study.   Max Planck Institute for the History of Science, Boltzmannstraße 22, 14195 Berlin, Germany Zoom/Online Meeting Platform Kim Pham Kim Pham Europe/Berlin public