Annotation and Provision of Datasets

As part of our project, we worked on the manual annotation of a large number of datasets with the aim of supporting the development of AI methods for evaluating public participation contributions.

Supervised machine learning requires training datasets in order to learn patterns related to the respective codings. In the area of citizen participation, there is a lack of comprehensively coded German-language datasets. In order to meet this need, we have therefore worked on annotating German-language participation processes from the field of mobility according to four dimensions:

  • Firstly, we have thematically classified contributions according to modes of transportation, other requirements for public space, and defects that need to be fixed immediately.
  • Second, we coded processes by argumentative sentences and divided them into premises and conclusions.
  • Thirdly, we have assigned argumentative units of meaning to how concrete they are.
  • Fourthly, we have coded textual location information.

A more detailed description of the datasets – as of June 2022 – can be found in our publication: Romberg, Julia; Mark, Laura; Escher, Tobias (2022, June). A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification. Since then, we have continued to work on the thematic coding of the datasets and revised our scheme of modes of transport.

The following table shows the current status of annotation and is updated on an ongoing basis (in German):

In accordance with our open source policy, the annotated datasets are made available to the public under Creative Commons CC BY-SA License when possible.

A number of publications have been produced based on these data sets. These can be found at https://www.cimt-hhu.de/gruppe/romberg/romberg-veroeffentlichungen/.