Detailed information about the course

[ Back ]

Digging into Data: Methods for the Collection and Mining of Various Types of Linguistic Data


5-6 September 2023


M. Mark Iten, UNIL

Mme Beatriz Duarte Wirth, UNIL

Mme Andrea Grütter, UZH


Prof. Merja Kytö, Uppsala University

Prof. Barbara Köpke, Université de Toulouse

Dr. Pablo Diaz (UNIL)

Prof. Reinhild Vandekerckhove, University of Antwerp





This workshop will focus on different methods for data collection and data mining in linguistic research, including the challenges, advantages, shortcomings, complementary value, etc. and their ethical implications. Selected questions that will be addressed during this workshop are as follows: 1. How have recent developments in digital humanities (e.g., large scale digitization of hitherto unknown manuscript material (diachronic) or social media data (synchronic) affected the way in which we approach data from a linguistic perspective? More specifically, do these developments introduce 'bad data' challenges and/or resolve existing 'bad data' problems? 2. What impact does copyright, ethics, privacy, anonymization, reusability, server storage, terms of condition, etc. have on the planification and execution of data collection and mining? 3. How have linguistic interviews, experiments, questionnaires, ethical considerations evolved and changed the approaches to data collection in linguistic research as a result of the recent (mandatory) move towards digitization/hybrid/online research? This workshop is primarily aimed at junior researchers (MA, PhD, Post-Doc) working in the field of linguistics with a particular focus on psycholinguistics, historical (socio-)linguistics and computer-mediated communication in any language, but will be of interest to more senior scholars alike.

Call for papers: Doctoral students are also invited to give a 20-minute talk on an aspect of their research, which can range from a fully-fledged paper to a work-in-progress. If interested in giving a talk, please send an email to [email protected] with a title and a brief abstract (c. 150 words) by Wednesday, August 16th.

Note that the organizers (Andrea, Beatriz, and Mark) are part of CLARIN-CH (, which is the national consortium for the "pan-European research infrastructure aiming to render accessible all digital language resources and tools from all over Europe through a single sign-on online environment".






Deadline for registration 05.09.2023
short-url short URL

short-url URL onepage