Master’s thesis on the thematic classification of participation contributions with Active Learning

As part of his Master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Boris Thome dealt with the classification of participation contributions according to the topics they contain. This thesis continues the work of Julia Romberg and Tobias Escher by examining a finer classification of contributions according to subcategories.

Summary

Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.

While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. In our previous work, we were able to show that active learning can significantly reduce the manual annotation effort for coding top-level categories. In this work, we subsequently investigated whether this advantage is still given when the top-level categories are subdivided into subcategories. A particular challenge arises from the fact that some of the subcategories can be very rare and therefore only cover a few contributions.

In the evaluation of various methods, data from online participation processes in three German cities was used. The results show that the automatic classification of subcategories is significantly more difficult than the classification of the main categories. This is due to the high number of possible subcategories (30 in the dataset under consideration), which are very unevenly distributed. In conclusion, further research is required to find a practical solution for the flexible assignment of subcategories using machine learning.

Publication

Thome, Boris (2022): Thematische Klassifikation von Partizipationsverfahren mit Active Learning. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Master’s thesis on the automated classification of arguments in participation contributions

As part of her master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman dealt with the classification of argumentation components in participation contributions. This thesis continues our team’s previous work by looking at cases in which argumentative sentences can contain both a premise and a conclusion.

Summary

Public participation processes allow citizens to engage in municipal decision-making processes by expressing their opinions on specific issues. Municipalities often only have limited resources to analyze a possibly large amount of textual contributions that need to be evaluated in a timely and detailed manner. Automated support for the evaluation is therefore essential, e.g. to analyze arguments.

When classifying argumentative sentences according to type (here: premise or conclusion), it can happen that one sentence contains several components of an argument. In this case, there is a need for multi-label classification, in which more than one category can be assigned.

To solve this problem, different methods for multi-label classification of argumentation components were compared (SVM, XGBoost, BERT and DistilBERT). The results showed that BERT models can achieve a macro F1 score of up to 0.92. The models exhibit robust performance across different datasets – an important indication of the practical usability of such methods.

Publication

Padjman, Suzan (2022): Mining Argument Components in Public Participation Processes. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Project work on the automated recognition of locations in participation contributions

As part of her project work in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman worked on the development of methods for the automated recognition of textually described location information in participation procedures.

Summary

In the context of the mobility transition, consultative processes are a popular tool for giving citizens the opportunity to represent and contribute their interests and concerns. Especially in the case of mobility-related issues, an important analysis aspect of the collected contributions is which locations (e.g. roads, intersections, cycle paths or footpaths) are problematic and in need of improvement in order to promote sustainable mobility. Automated identification of such locations has the potential to support the resource-intensive manual evaluation.

The aim of this work was therefore to find an automated solution for identifying locations using methods from natural language processing (NLP). For this purpose, a location was defined as the description of a specific place of a proposal, which could be marked on a map. Examples of locations are street names, city districts and clearly assignable places, such as “in the city center” or “at the exit of the main train station”. Pure descriptions without reference to a specific place were not considered as locations. Methodologically, the task was regarded as a sequence labeling task, as locations often consist of several consecutive tokens, so-called word sequences.

A comparison of different models (spaCy NER, GermanBERT, GBERT, dbmdz BERT, GELECTRA, multilingual BERT, multilingual XLM-RoBERTa) on two German-language participation datasets on cycling infrastructure in Bonn and Cologne Ehrenfeld showed that GermanBERT achieves the best results. This model can recognize tokens that are part of a textual location description with a promising macro F1 score of 0.945. In future work, it is planned to convert the recognized text phrases into geocoordinates in order to depict the recognized location of citizens’ proposals on a map.

Publication

Padjman, Suzan (2021): Unterstützung der Auswertung von verkehrsbezogenen Bürger*innenbeteiligungsverfahren durch die automatisierte Erkennung von Verortungen. Projektarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

MA-thesis on participation of pupils during the Corona pandemic

In her thesis for the MA Social Sciences: Social Structures and Democratic Governance at Heinrich-Heine-University Düsseldorf, Maria Antonia Dausner has investigated the possibilities of pupil participation during the Covid-19-related school closures, focusing on an analysis of selected elementary schools in North Rhine-Westphalia.

More information is available in German.

Interdisciplinary course on exploring social status and language

This term we are offering a master course in which we use proposals from online consultation processes in conjunction with individual-level survey data to analyse if social status of participants is reflected in the language they use in their written proposals. To this end, we utilize AI-based methods of Natural Language Processing.

More information is available in German.