The Infrastructures of Sequence Data in Biology

The Infrastructures of Sequence Data in Biology

Hallam Stevens

This paper will provide two examples of the very specific ways in which data is tied to the technologies and practices of computing and information technologies. The examples suggest that big data cannot exist outside of these computational infrastructures. First, it will describe Walter's Goad's work at Los Alamos that led ultimately to the development of the GenBank database. Here the use of data arose from specific kinds of computational practices that were originally developed for weapons work. Second, it will examine some of the work of making and updating the Ensembl database run by the European Bioinformatics Institute. In this case, the generation, structuring, and use of the data are tied to the technologies and practices of the World Wide Web. The examples highlight some of the novelties of big data and big data practices: not only its size, but also in how it is manipulated and used through numerical methods, relational database structures, machine learning, hyperlinking, and topological analysis. I will suggest that this novelty warrants some new social science approaches to studying data that allows us to follow it inside machines, software, and databases – that is, we need to supplement material culture approaches with data culture approaches.