The evaluation of medical tests is challenging. Traditionally, the focus of medical testing has been on the evaluation of diagnostic accuracy, under the assumption that the value of a test to medical practice will flow from its accuracy. Similarly, the sensitivity of a test (the percentage of positive results in patients with the disease tested for) and the specificity (the percentage of negative results in patients without the disease tested for) have traditionally been understood as fixed properties of the test, which remain the same in different clinical contexts. Both of these assumptions have come under sustained critique in recent years, and medical researchers are reevaluating their methodological assumptions and practices. This project provides an integrated historical and philosophical exploration of medical test evaluation methodology since the early twentieth century, in order to assist medical researchers in their work. It highlights historical debates about how medical tests should be evaluated, from their origins with serological tests for infectious diseases in the early twentieth century, through debates amongst oncologists about how to improve diagnostic testing for cancers mid-century, to debates about how to interpret an electrocardiograph towards the end of the century. The assumption that patients with the same disease status are remarkably homogeneous is shown to play a central role in these debates, with the “problem of heterogeneity” being downplayed throughout the twentieth century. The increasing emphasis on the heterogeneity of patients in the early twenty first century has inspired the reevaluation of traditional medical test evaluation methods.