In this publication in Electronic Government, Julia Romberg and Tobias Escher investigate the potential of active learning for reducing the manual labeling efforts in categorizing public participation contributions thematically.
Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.
While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. While this semi-supervised procedure has proliferated in recent years, these promising approaches have never been applied to the evaluation of participation contributions.
Therefore we utilize data from online participation processes in three German cities, provide classification baselines and subsequently assess how different active learning strategies can reduce manual labeling efforts while maintaining a good model performance. Our results show not only that supervised machine learning models can reliably classify topic categories for public participation contributions, but that active learning significantly reduces the amount of training data required. This has important implications for the practice of public participation because it dramatically cuts the time required for evaluation from which in particular processes with a larger number of contributions benefit.
- We compare a variety of state-of-the-art approaches for text classification and active learning on a case study of three nearly identical participation processes for cycling infrastructure in the German municipalities of Bonn, Ehrenfeld (a district of Cologne) and Moers.
- We find that BERT can predict the correct topic(s) for about 77% of the cases.
- Active learning significantly reduces manual labeling efforts: it was sufficient to manually label 20% to 50% of the datasets to maintain the level of accuracy. Efficiency-improvements grow with the size of the dataset.
- At the same time, the models operate within an efficient runtime.
- We therefore hypothesize that active learning should significantly reduce human efforts in most use cases.
J. Romberg and T. Escher. Automated topic categorisation of citizens’ contributions: Reducing manual labelling efforts through active learning. In M. Janssen, C. Csáki,I. Lindgren, E. Loukis, U. Melin, G. Viale Pereira, M. P. Rodríguez Bolívar, and E. Tambouris, editors,Electronic Government, pages 369–385, Cham, 2022. SpringerInternational Publishing. ISBN 978-3-031-15086-9