Annotation and Provision of Datasets

As part of our project, we worked on the manual annotation of a large number of datasets with the aim of supporting the development of AI methods for evaluating public participation contributions.

Supervised machine learning requires training datasets in order to learn patterns related to the respective codings. In the area of citizen participation, there is a lack of comprehensively coded German-language datasets. In order to meet this need, we have therefore worked on annotating German-language participation processes from the field of mobility according to four dimensions:

  • Firstly, we have thematically classified contributions according to modes of transportation, other requirements for public space, and defects that need to be fixed immediately.
  • Second, we coded processes by argumentative sentences and divided them into premises and conclusions.
  • Thirdly, we have assigned argumentative units of meaning to how concrete they are.
  • Fourthly, we have coded textual location information.

A more detailed description of the datasets – as of June 2022 – can be found in our publication: Romberg, Julia; Mark, Laura; Escher, Tobias (2022, June). A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification. Since then, we have continued to work on the thematic coding of the datasets and revised our scheme of modes of transport.

The following table shows the current status of annotation and is updated on an ongoing basis (in German):

In accordance with our open source policy, the annotated datasets are made available to the public under Creative Commons CC BY-SA License when possible.

A number of publications have been produced based on these data sets. These can be found at https://www.cimt-hhu.de/gruppe/romberg/romberg-veroeffentlichungen/.

Master’s thesis on the thematic classification of participation contributions with Active Learning

As part of his Master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Boris Thome dealt with the classification of participation contributions according to the topics they contain. This thesis continues the work of Julia Romberg and Tobias Escher by examining a finer classification of contributions according to subcategories.

Summary

Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.

While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. In our previous work, we were able to show that active learning can significantly reduce the manual annotation effort for coding top-level categories. In this work, we subsequently investigated whether this advantage is still given when the top-level categories are subdivided into subcategories. A particular challenge arises from the fact that some of the subcategories can be very rare and therefore only cover a few contributions.

In the evaluation of various methods, data from online participation processes in three German cities was used. The results show that the automatic classification of subcategories is significantly more difficult than the classification of the main categories. This is due to the high number of possible subcategories (30 in the dataset under consideration), which are very unevenly distributed. In conclusion, further research is required to find a practical solution for the flexible assignment of subcategories using machine learning.

Publication

Thome, Boris (2022): Thematische Klassifikation von Partizipationsverfahren mit Active Learning. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Master’s thesis on the automated classification of arguments in participation contributions

As part of her master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman dealt with the classification of argumentation components in participation contributions. This thesis continues our team’s previous work by looking at cases in which argumentative sentences can contain both a premise and a conclusion.

Summary

Public participation processes allow citizens to engage in municipal decision-making processes by expressing their opinions on specific issues. Municipalities often only have limited resources to analyze a possibly large amount of textual contributions that need to be evaluated in a timely and detailed manner. Automated support for the evaluation is therefore essential, e.g. to analyze arguments.

When classifying argumentative sentences according to type (here: premise or conclusion), it can happen that one sentence contains several components of an argument. In this case, there is a need for multi-label classification, in which more than one category can be assigned.

To solve this problem, different methods for multi-label classification of argumentation components were compared (SVM, XGBoost, BERT and DistilBERT). The results showed that BERT models can achieve a macro F1 score of up to 0.92. The models exhibit robust performance across different datasets – an important indication of the practical usability of such methods.

Publication

Padjman, Suzan (2022): Mining Argument Components in Public Participation Processes. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Project work on the automated recognition of locations in participation contributions

As part of her project work in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman worked on the development of methods for the automated recognition of textually described location information in participation procedures.

Summary

In the context of the mobility transition, consultative processes are a popular tool for giving citizens the opportunity to represent and contribute their interests and concerns. Especially in the case of mobility-related issues, an important analysis aspect of the collected contributions is which locations (e.g. roads, intersections, cycle paths or footpaths) are problematic and in need of improvement in order to promote sustainable mobility. Automated identification of such locations has the potential to support the resource-intensive manual evaluation.

The aim of this work was therefore to find an automated solution for identifying locations using methods from natural language processing (NLP). For this purpose, a location was defined as the description of a specific place of a proposal, which could be marked on a map. Examples of locations are street names, city districts and clearly assignable places, such as “in the city center” or “at the exit of the main train station”. Pure descriptions without reference to a specific place were not considered as locations. Methodologically, the task was regarded as a sequence labeling task, as locations often consist of several consecutive tokens, so-called word sequences.

A comparison of different models (spaCy NER, GermanBERT, GBERT, dbmdz BERT, GELECTRA, multilingual BERT, multilingual XLM-RoBERTa) on two German-language participation datasets on cycling infrastructure in Bonn and Cologne Ehrenfeld showed that GermanBERT achieves the best results. This model can recognize tokens that are part of a textual location description with a promising macro F1 score of 0.945. In future work, it is planned to convert the recognized text phrases into geocoordinates in order to depict the recognized location of citizens’ proposals on a map.

Publication

Padjman, Suzan (2021): Unterstützung der Auswertung von verkehrsbezogenen Bürger*innenbeteiligungsverfahren durch die automatisierte Erkennung von Verortungen. Projektarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Effects of online citizen participation on legitimacy beliefs

In this article in the journal Policy & Internet, Tobias Escher and Bastian Rottinghaus explore the question of how participation in local consultation processes (on planning of cycling infrastructure) affects attitudes towards local politics. To this end, in 2018 they examined a total of three participation procedures in which the cities of Bonn, Cologne (district Ehrenfeld) and Moers consulted their citizens on local cycling infrastructure. In each case, for five weeks citizens were able to submit, comment on and evaluate proposals through an online platform. In total, more than 3,000 proposals were collected which were to be incorporated into the subsequent cycling planning (see further information on the Cycling Dialogues project).

Abstract

In order to generate legitimacy for policies and political institutions, governments regularly involve citizens in the decision-making process, increasingly so via the Internet. This research investigates if online participation does indeed impact positively on legitimacy beliefs of those citizens engaging with the process, and which particular aspects of the participation process, the individual participants and the local context contribute to these changes. Our surveys of participants in almost identical online consultations in three German municipalities show that the participation process and its expected results have a sizeable effect on satisfaction with local political authorities and local regime performance. While most participants report at least slightly more positive perceptions that are mainly output-oriented, for some engagement with the process leads not to more, but in fact to less legitimacy. We find this to be the case both for those participants who remain silent and for those who participate intensively. Our results also confirm the important role of existing individual resources and context-related attitudes such as trust in and satisfaction with local (not national) politics. Finally, our analysis shows that online participation is able to enable constructive discussion, deliver useful results and attract people who would not have participated offline to engage.

Key findings

  • The participation processes we studied and to which citizens were invited by their respective councils do indeed have an influence on the attitudes of those who participate in such consultations.
  • For many of the participants, the positive effect that was hoped for does indeed occur: they are more positive about the local institutions (mayor, administration) and local politics as a whole. The decisive factor for the assessment is whether one expects local politics to take the citizens’ proposals seriously and act upon them. In other words, the result of the process is more important for attitudes than the process itself.
  • It is noteworthy that this holds true also for those who have rather negative views of local politics to beginn with. However, previous experience with local politics also plays a role: those who already have a higher level of satisfaction and trust in the municipality are becoming more positive by participation.
  • At the same time, participation can also lead to less satisfaction. We were able to show this, on the one hand, for those who were intensively involved in the participation process and made a lot of proposals. On average, this group was less satisfied in the end, probably because their expectations of the impact of their efforts were disappointed. Those who did not actively participate but only visited the online procedure without making suggestions themselves were also more dissatisfied. These people were apparently mainly concerned about the fact that the process took place exclusively online.
  • Overall, however, our results show that such online participation processes not only enable constructive participation, but that they also reach additional groups: Almost half of the respondents would not have participated if the process had only been conducted with on-site formats requiring physical presence.

Publication

Escher, Tobias; Rottinghaus, Bastian (2023): Effects of online citizen participation on legitimacy beliefs in local government. Evidence from a comparative study of online participation platforms in three German municipalities. In: Policy & Internet, Artikel poi3.371. DOI: 10.1002/poi3.371.

MA-thesis on participation of pupils during the Corona pandemic

In her thesis for the MA Social Sciences: Social Structures and Democratic Governance at Heinrich-Heine-University Düsseldorf, Maria Antonia Dausner has investigated the possibilities of pupil participation during the Covid-19-related school closures, focusing on an analysis of selected elementary schools in North Rhine-Westphalia.

More information is available in German.

Overview of Methods for Computational Text Analysis to Support the Evaluation of Contributions in Public Participation

In this publication in Digital Government: Research and Practice Julia Romberg and Tobias Escher offer a review of the computational techniques that have been used in order to support the evaluation of contributions in public participation processes. Based on a systematic literature review, they assess their performance and offer future research directions.

Abstract

Public sector institutions that consult citizens to inform decision-making face the challenge of evaluating the contributions made by citizens. This evaluation has important democratic implications but at the same time, consumes substantial human resources. However, until now the use of artificial intelligence such as computer-supported text analysis has remained an under-studied solution to this problem. We identify three generic tasks in the evaluation process that could benefit from natural language processing (NLP). Based on a systematic literature search in two databases on computational linguistics and digital government, we provide a detailed review of existing methods and their performance. While some promising approaches exist, for instance to group data thematically and to detect arguments and opinions, we show that there remain important challenges before these could offer any reliable support in practice. These include the quality of results, the applicability to non-English language corpora and making algorithmic models available to practitioners through software. We discuss a number of avenues that future research should pursue that can ultimately lead to solutions for practice. The most promising of these bring in the expertise of human evaluators, for example through active learning approaches or interactive topic modelling.

Key findings

  • There are a number of tasks in the evaluation processes that could be supported through Natural Language Processing (NLP). Broadly speaking, these are i) detecting (near) duplicates, ii) grouping of contributions by topic and iii) analyzing the individual contributions in depth. Most of the literature in this review focused on the automated recognition and analysis of arguments, one particular aspect of the task of in-depth analysis of contribution.
  • We provide a comprehensive overview of the datasets used as well as the algorithms employed and aim to assess their performance. Generally, despite promising results so far the significant advances of NLP techniques in recent years have barely been exploited in this domain.
  • A particular gap is that few applications exist that would enable practitioners to easily apply NLP to their data and reap the benefits of these methods.
  • The manual labelling efforts required for training machine learning models risk any efficiency gains from automation.
  • We suggest a number of fruitful future research avenues, many of which draw upon the expertise of humans, for example through active learning or interactive topic modelling.

Publication

Romberg, Julia; Escher, Tobias (2023): Making Sense of Citizens’ Input through Artificial Intelligence. In: Digital Government: Research and Practice, Artikel 3603254. DOI: 10.1145/3603254.

Expert evidence: State of research on opportunities, challenges and limitations of digital participation

As set out in the German Site Selection Act (StandAG), the Federal Office for the Safety of Nuclear Waste Management (BASE) is charged with the comprehensive information and participation of the public in regards procedure for the search and selection of a repository site for the final disposal of high-level radioactive waste. In this context, in February 2022 BASE commissioned an expert report on the “Possibilities and limits of digital participation tools for public participation in the repository site selection procedure (DigiBeSt)” from the Düsseldorf Institute for Internet and Democracy (DIID) at Heinrich Heine University Düsseldorf in cooperation with the nexus Institute Berlin. For this purpose, lead by Tobias Escher a review of the state of research and current developments (work package 2) was prepared has been summarised in a detailed report that will be publicly available from late 2023 onwards.

Selected findings from the report are:

  • Social inequalities in digital participation are mainly based on the second-level digital divide, i.e. differences in the media- and content-related skills required for independent and constructive use of the internet for political participation.
  • Knowledge about the effectiveness of activation factors is still often incomplete and anecdotal, making it difficult for initiators to estimate the costs and benefits of individual measures.
  • Personal invitations have been proven to be suitable for (target group-specific) mobilisation, but the established mass media also continue to play an important role.
  • Broad and inclusive participation requires a combination of different digital and analogue participation formats.
  • Participation formats at the national level face particular challenges due to the complexity of the issues at stake and the size of the target group. Therefore, these require the implementation of cascaded procedures (interlocking formats of participation at different political levels) as well as the creation of new institutions.

Enriching Machine Prediction with Subjectivity Using the Example of Argument Concreteness in Public Participation

In this publication in the Workshop on Argument Mining, Julia Romberg develops a method to incorporate human perspectivism in machine prediction. The method is tested on the task of argument concreteness in public participation contributions.

Abstract

Although argumentation can be highly subjective, the common practice with supervised machine learning is to construct and learn from an aggregated ground truth formed from individual judgments by majority voting, averaging, or adjudication. This approach leads to a neglect of individual, but potentially important perspectives and in many cases cannot do justice to the subjective character of the tasks. One solution to this shortcoming are multi-perspective approaches, which have received very little attention in the field of argument mining so far.

In this work we present PerspectifyMe, a method to incorporate perspectivism by enriching a task with subjectivity information from the data annotation process. We exemplify our approach with the use case of classifying argument concreteness, and provide first promising results for the recently published CIMT PartEval Argument Concreteness Corpus.

Key findings

  • Machine learning often assumes a single ground truth to learn from, but this does not hold for subjective tasks.
  • PerspectifyMe is a simple method to incorporate perspectivism in existing machine learning workflows by complementing an aggregated label with a subjectivity score.
  • An example of a subjective task is the classification of the concreteness of an argument (low, medium, high), a task whose solution can also benefit the machine-assisted evaluation of public participation processes.
  • First approaches to classifying the concreteness of arguments (aggregated label) show an accuracy of 0.80 and an F1 value of 0.67.
  • The subjectivity of concreteness perception (objective vs. subjective) can be predicted with an accuracy of 0.72 resp. an F1 value of 0.74.

Publication

Romberg, Julia (2022, October). Is Your Perspective Also My Perspective? Enriching Prediction with Subjectivity. In Proceedings of the 9th Workshop on Argument Mining (pp.115-125), Gyeongju, Republic of Korea. Association for Computational Linguistics. https://aclanthology.org/2022.argmining-1.11

Automated Topic Categorization of Citizens’ Contributions: Reducing Manual Labeling Efforts Through Active Learning

In this publication in Electronic Government, Julia Romberg and Tobias Escher investigate the potential of active learning for reducing the manual labeling efforts in categorizing public participation contributions thematically.

Abstract

Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.

While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. While this semi-supervised procedure has proliferated in recent years, these promising approaches have never been applied to the evaluation of participation contributions.

Therefore we utilize data from online participation processes in three German cities, provide classification baselines and subsequently assess how different active learning strategies can reduce manual labeling efforts while maintaining a good model performance. Our results show not only that supervised machine learning models can reliably classify topic categories for public participation contributions, but that active learning significantly reduces the amount of training data required. This has important implications for the practice of public participation because it dramatically cuts the time required for evaluation from which in particular processes with a larger number of contributions benefit.

Key findings

  • We compare a variety of state-of-the-art approaches for text classification and active learning on a case study of three nearly identical participation processes for cycling infrastructure in the German municipalities of Bonn, Ehrenfeld (a district of Cologne) and Moers.
  • We find that BERT can predict the correct topic(s) for about 77% of the cases.
  • Active learning significantly reduces manual labeling efforts: it was sufficient to manually label 20% to 50% of the datasets to maintain the level of accuracy. Efficiency-improvements grow with the size of the dataset.
  • At the same time, the models operate within an efficient runtime.
  • We therefore hypothesize that active learning should significantly reduce human efforts in most use cases.

Publication

J. Romberg and T. Escher. Automated topic categorisation of citizens’ contributions: Reducing manual labelling efforts through active learning. In M. Janssen, C. Csáki,I. Lindgren, E. Loukis, U. Melin, G. Viale Pereira, M. P. Rodríguez Bolívar, and E. Tambouris, editors,Electronic Government, pages 369–385, Cham, 2022. SpringerInternational Publishing. ISBN 978-3-031-15086-9