Posts by Collection

publications

Gautheron L. Show abstract

“Des chiffres pour appréhender l'anti-Mélenchonisme de la presse ” in Marianne, 2017

Politics Data mining

Abstract: In this paper, I applied sentiment analysis and emotion detection to press articles and illustrations to explore differences in the journalistic treatment of various political figures.

Gautheron L. Show abstract

“Référendum ADP : les médias au service du pouvoir (Silencing Democracy: Media Blackout on the ADP Privatization Referendum) ” in Le Média, 2019

Politics Data mining

Abstract: In this article, I analyzed how signatures on a nationwide constitutional petition correlated with various socioeconomic/political variables in French cities. Low education turned out to be a strongly negative factor. I explored the possibility that this could be the result of poor media coverage. I then measured media coverage of the petition by applying a speech-to-text model to public television news archives. The article has been cited in a book, in research papers, and in an appeal to the French Constitutional Court to increase media coverage of these petitions.

Kouamouo T., Gautheron L. Show abstract

“Élections européennes : un vote de classe avant tout (European Elections: It's Always About Class!) ” in Le Média, 2019

Politics Data mining Statistical and Bayesian Inference

Abstract: We analyzed voter trajectories between the French presidential and European elections using a Bayesian ecological inference model. We assessed how these trajectories were influenced by socio-economic factors. This revealed, among other things, the rallying of the right-wing bourgeoisie behind Macron.

Gautheron L., Gence C. Show abstract

“[DATA] Les morts invisibles du coronavirus : la vérité derrière les chiffres officiels (The Hidden Toll of Coronavirus: Revealing the Truth Beyond Official Numbers) ” in Le Média, 2020

Epidemics Data mining Statistical and Bayesian Inference

Abstract: In this paper, we compared mortality data with the official death toll attributed to Covid. We showed that the number of deaths attributed to Covid significantly underestimated the actual number of deaths. We then showed that the government later reduced the discrepancy by accounting for Covid-related deaths occurring in nursing homes, but that there remained an unaccounted for excess mortality in deaths occurring at home that could be attributed to Covid. This article has been cited in research papers.

Gautheron L., Gence C. Show abstract

“[DATA] Lutte contre le COVID-19 : oui, la lenteur de l’État français a tué (Fatal Sluggishness: How France's COVID-19 Response Cost Lives) ” in Le Média, 2020

Epidemics Data mining Statistical and Bayesian Inference

Abstract: In this article, we used data from the Oxford Government Response Tracker to show that France took containment measures against Covid relatively late compared to other countries, given the timing of the epidemics. We also estimated how many deaths could have been avoided if certain measures had been taken a few days earlier, by adapting a simulation from the Imperial College.

Gautheron L., Lavechin M., Riad R., Scaff C., Cristia A. “Longform recordings : Opportunities and challenges ” in In the proceedings of LIFT 2020 - 2èmes journées scientifiques du Groupement de Recherche "Linguistique informatique, formelle et de terrain", 2020 (Writing - Original Draft)

Gautheron L., Cristia A. A python package for long-form recordings and their annotations in Bergelson Lab, Duke University, NC, United States [online], 2021

Lavechin M., Seyssel M., Gautheron L., Dupoux E., Cristia A. Show abstract

“Reverse Engineering Language Acquisition with Child-Centered Long-Form Recordings ” in Annual Review of Linguistics, 2022 (Writing - Review & Editing)

Language acquisition Literature review

Abstract: Language use in everyday life can be studied using lightweight, wearable recorders that collect long-form recordings—that is, audio (including speech) over whole days. The hardware and software underlying this technique are increasingly accessible and inexpensive, and these data are revolutionizing the language acquisition field. We first place this technique into the broader context of the current ways of studying both the input being received by children and children's own language production, laying out the main advantages and drawbacks of long-form recordings. We then go on to argue that a unique advantage of long-form recordings is that they can fuel realistic models of early language acquisition that use speech to represent children's input and/or to establish production benchmarks. To enable the field to make the most of this unique empirical and conceptual contribution, we outline what this reverse engineering approach from long-form recordings entails, why it is useful, and how to evaluate success.

Gautheron L., Rochat N., Cristia A. Show abstract

“Managing, storing, and sharing long-form recordings and their annotations ” in Language Resources and Evaluation, 2022 (Conceptualization, Software, Writing - Original Draft)

Language acquisition Data management Software

Abstract: The technique of long-form recordings via wearables is gaining momentum in different fields of research, notably linguistics and neurology. This technique, however, poses several technical challenges, some of which are amplified by the peculiarities of the data, including their sensitivity and their volume. In this paper, we begin by outlining key problems related to the management, storage, and sharing of the corpora that emerge when using this technique. We continue by proposing a multi-component solution to these problems, specifically in the case of daylong recordings of children. As part of this solution, we release ChildProject, a Python package for performing the operations typically required by such datasets and for evaluating the reliability of annotations using a number of measures commonly used in speech processing and linguistics. This package builds upon an annotation management system, which allows the importation of annotations from a wide range of existing formats, as well as upon data validation procedures, which assert the conformity of the data, or, alternatively, produce detailed and explicit error reports. Our proposal could be generalized to populations other than children and beyond linguistics.

Gautheron L. “Who trusts supersymmetry? Probing quantitative methods for investigating research orientations in High-Energy Physics ” in 4^th International Spring School of the Epistemology of the Large Hadron Collider: The History, Philosophy and Sociology of Large Scale Experiments, Wuppertal, Germany, 2022

Gautheron L. “The many faces of supersymmetry: Supersymmetry across subcultures of High-Energy Physics, 1971–2019 ” in 2022 History of Science Society Annual Meeting: group session on Historical Epistemology of Particle Physics and Quantum Gravity, Chicago, IL, United States, 2022

Cristia A., Gautheron L., Colleran H. Show abstract

“Vocal input and output among infants in a multilingual context: Evidence from long-form recordings in Vanuatu ” in Developmental Science, 2023 (Data Curation, Formal analysis, Writing - Review & Editing)

Language acquisition Statistical and Bayesian Inference

Abstract: What are the vocal experiences of children growing up on Malakula island, Vanuatu, where multilingualism is the norm? Long-form audio-recordings captured spontaneous speech behavior by, and around, 38 children (5–33 months, 23 girls) from 11 villages. Automated analyses revealed most children's vocal input came from female adults and other children's voices, with small contributions from male adult voices. The greatest changes with age involved an increase in the input vocalizations from other children. Total input (collapsing across child-directed and overheard speech, and across languages) was ∼11 min per hour, which was at least 5 min (31%) lower than that found in other populations studied using comparable methods in previous literature, as well as in archival American data analyzed with the same algorithm. In contrast, children's own vocalization counts were two to four times higher than previous reports for North-American English-learning monolingual infants at matched ages, and comparable to estimates from archival American data, consistent with a resilient language-learning cognitive system for this aspect of vocal development. The strongest association between input and output was with vocalizations by other children, rather than those by adults, which is consistent with research in anthropology but less so with current theoretical trends in developmental psychology. These results invite further research in populations that are under-represented in developmental science.

Gautheron L. “La désunité de la physique des hautes-énergies ” in XIV^e Congrès de la Société française d'histoire des sciences et des techniques: symposium "La physique de l'après Seconde guerre mondiale, entre ruptures et continuités", Bordeaux, France, 2023

Gautheron L. “Probing Socio-Epistemic Dynamics in High-Energy Physics Using the Inspire HEP Database ” in Conference "Big Data & History and Philosophy of Science", 2023

Gautheron L. Too beautiful to be false, or too beautiful to be true: supersymmetry and the future of high-energy physics in 2JM seminar, Sciences Po, Paris, France, 2023

Gautheron L. “Impérialisme scientifique en physique des hautes-énergies: faut-il écouter les théoriciens? ” in Congrès de la Société française de Philosophie des Sciences, Nanterre, France, 2023

Gautheron L. “From Colliders to Cosmos: Dynamics of Cooperation and Collaboration in High-Energy Physics ” in Summer school "Collaboration and Interdisciplinarity in Science and Technology", Wuppertal, Germany, 2023

Gautheron L. Balancing Specialization and Adaptation in a Transforming Scientific Landscape: Modelling scientists' behavior with Natural Language Processing in NLP Seminar, LATTICE, Montrouge, France, 2023

Gautheron L., Omodei E. Show abstract

“How research programs come apart: The example of supersymmetry and the disunity of physics ” in Quantitative Science Studies, 2023 (Conceptualization, Methodology, Software, Formal analysis, Data Curation, Writing - Original Draft, Visualization)

Science and Collective Intelligence Natural language processing Networks

Abstract: According to Peter Galison, the coordination of different “subcultures” within a scientific field happens through local exchanges within “trading zones.” In his view, the workability of such trading zones is not guaranteed, and science is not necessarily driven towards further integration. In this paper, we develop and apply quantitative methods (using semantic, authorship, and citation data from scientific literature), inspired by Galison’s framework, to the case of the disunity of high-energy physics. We give prominence to supersymmetry, a concept that has given rise to several major but distinct research programs in the field, such as the formulation of a consistent theory of quantum gravity or the search for new particles. We show that “theory” and “phenomenology” in high-energy physics should be regarded as distinct theoretical subcultures, between which supersymmetry has helped sustain scientific “trades.” However, as we demonstrate using a topic model, the phenomenological component of supersymmetry research has lost traction and the ability of supersymmetry to tie these subcultures together is now compromised. Our work supports that even fields with an initially strong sentiment of unity may eventually generate diverging research programs and demonstrates the fruitfulness of the notion of trading zones for informing quantitative approaches to scientific pluralism.

Gautheron L. “A dialogue between philosophy of science and computational studies of science illuminates the crisis of fundamental physics ” in Workshop "Philosophy of Science meets Quantitative Science Studies"', Turin, Italy, 2024

Gautheron L. “Algorithmic bias in correlational analyses of the effects of caregivers' speech behaviour on children's speech production ” in Daylong Audio Recordings of Children's Linguistic Environments (DARCLE), online, 2024

Gautheron L. “Correlational analyses of the effects of caregivers’ speech behaviour on children’s speech production ” in Department of Linguistics, UCLA, Los Angeles, CA, United States, 2024

Gautheron L. Balancing Specialization and Adaptation in a Transforming Scientific Landscape in 10^th International Conference on Computational Social Science (IC2S2), Philadelphia, PA, United States, 2024

Gautheron L. Inferring conventions' function and origin from behavioral data: the example of the metric signature in high-energy physics in Santa Fe Institute Graduate Workshop in Computational Social Science Modeling and Complexity, 2024

Gautheron L. Show abstract

“Balancing Specialization and Adaptation in a Transforming Scientific Landscape ” in arXiv, 2024

Science and Collective Intelligence Natural language processing Networks Statistical and Bayesian Inference Inverse problems

Abstract: How do scientists navigate between the need to capitalize on their prior knowledge through specialization, and the urge to adapt to evolving research opportunities? Drawing from diverse perspectives on adaptation, this paper proposes an unsupervised Bayesian approach motivated by Optimal Transport of the evolution of scientists' research portfolios in response to transformations in their field. The model relies on $186,162$ scientific abstracts and authorship data to evaluate the influence of intellectual, social, and institutional resources on scientists' trajectories within a cohort of $2,195$ high-energy physicists between 2000 and 2019. Using Inverse Optimal Transport, the reallocation of research efforts is shown to be shaped by learning costs, thus enhancing the utility of the scientific capital disseminated among scientists. Two dimensions of social capital, namely "diversity" and "power", have opposite associations with the magnitude of change in scientists' research interests: while "diversity" disrupts and expands research interests, "power" is associated with more stable research agendas. Social capital plays a more crucial role in shifts between cognitively distant research areas. More generally, this work suggests new approaches for understanding, measuring and modeling collective adaptation using Optimal Transport.

Cristia A., Gautheron L., Zhang Z., Schuller B., Scaff C., Rowland C., Räsänen O., Peurey L., Lavechin M., Havard W., Fausey C., Cychosz M., Bergelson E., Anderson H., Al N., Soderstrom M. Show abstract

“Establishing the reliability of metrics extracted from long-form recordings using LENA and the ACLEW pipeline ” in Behavior Research Methods, 2024 (Data curation,Software,Writing – review & editing)

Language acquisition Statistical and Bayesian Inference

Abstract: Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children’s language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-Spanish, and Quechua-Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity (Child ICC), was <50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.

Gautheron L. “Social dilemmas in high-energy physics ” in Workshop "Methodological Transformations in Fundamental Physics", Wuppertal, Germany, 2024

Lucas Gautheron

Posts by Collection

publications