The formula to validate, validate, validate“ findings from quantitative corpus analysis (Grimmer/Stewart 2013) has become a staple in discussion on the potentials and perils of quantitative approaches to text and is also central to the SSHOC project (Social Sciences & Humanities Open Cloud). However, technically, restrictions to implement validation remain quite high. Usually, dedicated resources for setting up and maintaining a server-based environment with a graphical user interface are still required. Lowering the costs of integrating quantitative and qualitative steps in workflows using corpora in social science research designs is a major objective of Research Infrastructures such as CLARIN, and of the polmineR package, an open source R package available at the Comprehensive R Archive Network / CRAN.
In this workshop, we introduce the polmineR package and explore three basic scenarios using it:
We will discuss whether to potentially combine the scenarios with semi-supervised learning, and how to leverage of machine learning (MI) approaches. As the dataset and tool combination, we will use the polmineR R package in combination with a multilingual corpus of the UN General Assembly.
The workshop is intended for political and social scientists who are interested in using large text collections in their research. No programming skills are needed but a general familiarity with basic statistical operations on texts will be helpful. Please bring your own laptop for the hands-on session.
This workshop addresses the challenges that specific user communities experience when contributing to SSHOC, the availability of procedures, tools and services to address these challenges, and the extent that these procedures, tools and services are sufficiently applicable for specific user communities, which is one of the major goals of the SSHOC project, tackled within WP9.
University of Leipzig
The Paulinum – Assembly Hall and University Church of St. Paul, Augustusplatz 10