Wie Deep-Learning hilft Bäume zu identifizieren – Ein Beitrag von Ankit Kariryaa

am

Aufgrund der Corona-Pandemie und der damit verbundenen Sicherheitsvorkehrungen musste nicht nur die Ausstellung Survival of the Fittest vorübergehend schließen. Auch die für Mitte März 2020 geplante interdisziplinäre Tagung Survival of the Fittest? Zum zukünftigen Verhältnis von Natur und Hightech konnte leider nicht stattfinden.
Doch nicht alle Vorbereitungen waren umsonst. Die Beiträge zweier Teilnehmer*innen liegen uns nun in Textform vor. Hier möchten wir sie mit euch teilen.
In ihren Aufsätzen erörtern der KI-Forscher Ankit Kariryaa und die Kunsthistorikerin und Medientheoretikerin Ingeborg Reichle Fragen nach Veränderungen, die wir mittels Technologie an Ökosystemen oder der biologischen Evolution vornehmen. Auf die Rolle neuester Computertechnik und künstlicher Intelligenz sowie die fortschreitende Digitalisierung fällt dabei ein besonderes Augenmerk. Beide Beiträge präsentieren wir euch nacheinander auf unseren Blog.

Ankit Kariryaa ist Doktorand an der Uni Bremen. Nach seinem Informatik-Bachelor in Indien am NIT in Hamirpur, kam er für seinen Master im Fachbereich „Intelligente Systeme“ nach Deutschland an die Universität Bielefeld. Aktuell arbeitet er im Rahmen seiner Promotion an der Universität Bremen an Deep-Learning-Modellen zur Segmentierung und Identifizierung von Bäumen. Mit seinen neuesten Modellen ist es ihm gelungen, Analysen auf globaler Ebene durchzuführen. In seinem Beitrag MaskIt: Masking for efficient utilization of incomplete public datasets for training deep learning models erläutert der KI-Forscher konkret, wie er ein Modell anhand von Satelliten-Daten trainiert hat, Bäume im Straßenbild von Hamburg zu erkennen. Dazu wurden im Deep-Learning-Verfahren Datensätze zu den Stadtbäumen und Luftbildern von Hamburg verwendet und mit OpenStreetMap maskiert. Mit seinem Modell möchte der Forscher einen Beitrag leisten, um in naher Zukunft Baumarten mit Hilfe von KI zu differenzieren und ihre Bedeutung für die globale Biodiversität herauszustellen. Zentrales Anliegen von Ankit Kariryaas Forschung ist dabei die Nutzbarmachung von KIs für den Kampf gegen den Klimawandel.

 

Ankit Kariryaa MaskIt: Masking for efficient utilization of incomplete public datasets for training deep learning models

ABSTRACT
A major challenge in training deep learning models is the lack of high quality and complete datasets. In the paper, we present a masking approach for training deep learning models from a publicly available but incomplete dataset. To train a U-Net model, we use the street trees and aerial image dataset (0.2m/pixel resolution) from the city of Hamburg, Germany. The mask is created from the road network downloaded from OpenStreetMap, and it displays the areas where the training data is available. The mask is passed to the model as one of the inputs and it also coats the output. Our model learns to successfully predict trees only in the masked region with 78.4% accuracy.

MAIN
Biodiversity and the wild population of plants and animals are rapidly decreasing throughout the world [4]. Recent articles have suggested that the sixth mass extension on earth is underway [2]. This phenomenon is not just limited to the large animals, similar reports have emerged for insects [6] and plant species. While human overpopulation and human overconsumption are primarily blamed for this crisis, many different factors affect the various wild populations. For example, industrial-scale use of pesticides and insecticides in farming is often described as disastrous for the insects. Recent reports have suggested the insect population is dwindling to as low as 25% in the last 25 years [6]. Since insects are the prey for birds and wild animals, the decrease in their population is bound to affect various food chains.

As a primary step in monitoring biodiversity at a global scale, a system is required to determine the type of tree and plant species in various environments. In the past, satellite imagery has been successfully applied to monitor the forests dynamics [7]. Recent advancements in deep learning along with increased availability of high-resolution satellite imagery (<1 m/pixel), have opened up the door of possibilities for detecting individual tree, plant or crop species with relatively high accuracy [10]. However, a common problem for such a system is the lack of high-quality training data. Public agencies such as city official and forest service often maintain valuable records of public attributes such as road signs, parking areas and trees. However, these datasets are not designed for the use case of the training deep learning models in mind. These datasets are often limited to public areas such as roads and public parks, thus training deep learning models on these datasets can be a challenging task. We here propose a masking approach with which one can efficiently train a deep learning model from incomplete datasets.

Dataset

Figure 1. Left panel: Aerial image of an area in Hamburg with street trees marked in red. Right panel: Road network of the same area, we only used the main roads for creating the mask.

We use the aerial images and street tree dataset from Hamburg, Germany for training our model. The aerial images had 3 channels (RGB) with 0.2m/pixel resolution and they were downloaded from the Geoportal.de¹ . As seen in the left panel of Figure 1, the individual features such as trees and cars parked on the streets are visible to the human eyes.

Authority for Environment and Energy of the city of Hamburg maintains a list of all street trees² . The dataset contains various attributes of individual trees such as location, height, width, species, age, and condition. However, as the name suggests, this information is only limited to the trees on the street of Hamburg and do not contain any information about the trees in private areas and parks. Left panel of Figure 1 shows a sample of the information available in this dataset. The trees are marked in red ovals.

To make use of this dataset, we create a mask based upon the street network of Hamburg. We use the OSMnx python package [3] to download the street networks from OpenStreetMap. The downloaded road network is then transformed and drawn on the aerial image with the help of Rasterio library [5]. We only use the main drivable roads for creating the road mask and add a buffer of 5m on both sides to cover the areas next to roads. We manually annotated 1371 trees crowns along the road network. The tree crowns can also be created from the street tree data, albeit with some noise. The goal is to predict these annotations using the aerial images and road network mask as the input.

Figure 2: The U-Net architecture. As input we pass the road mask and the aerial image. The model is trained to the predicted the tree only in the masked region.

Mapping trees
We use a state-of-the-art deep learning model to detect tree crowns in the input images. Our deep learning model was based on the U-Net architecture [9] which was developed for medical image segmentation and is one of the most widely used architectures for semantic segmentation tasks. It is argued that due to the lateral connections from the start to the end layers in the network, the network well preserves the syntactic information in the image. It is known to generalize well on relatively small datasets [8]. Figure 2 shows the architecture of the U-Net model. As input to the model, we pass patches of aerial image and the street network mask, thus a total of 4 channels (RGB + mask).

Training
For training the models we extracted 2888 patches of 256*256*4-pixel from a 1km* 2km annotated area. The patches were sequentially extracted with a step of 128 pixels in both directions. They were then randomly divided into 60% training, 20% validation and 20% test patches. The patches were zero padded, if they did not fall completely in the annotated area.

Results
Figure 3 shows the result of our approach. The model achieved a per pixel accuracy of 78.4% and successfully detects trees in the aerial imagery. The mask can be seen as the AND operator and the model learns to predict only those pixels where corresponding value in mask is equal to one. During deployment the models can be used to predict trees in areas beyond the road network by simply passing a mask of ones. The model accuracy can be further improved using higher quality dataset and data augmentation. The clumped trees can be separated using a weighted loss on the edges [9].

Discussion
Here we show that the masking technique can be used to effectively train on incomplete datasets. However, this approach also has some shortcomings. For example, in our case the model is only trained with trees along the streets, and the model did not observe trees in the other conditions such as in a grass field, along a water body or in a park. Thus, its accuracy may be lower in these conditions where background is different than a paved road or a building. Indeed, we observe that model has trouble distinguishing grass patches from trees (see last row in the Figure 3). A solution to this problem could be to use an additional training dataset covering these conditions. The additional dataset can be oversampled or given extra weight for effective training.

Figure 3: Predicting trees. The first two columns show the model input, the third column shows the expected output and the final column shows the predicted output. The model predicted trees with 78.4% accuracy. The yellow marks the masked area along the road network (second column). The tree class in the expected and predicted output is also shown in yellow. If the mask is empty, the model does not predict in that area as seen in fourth row. Since the model did not observe trees with grass in the background, its performance is lower in such conditions (for example the last row).

We believe, that in the future this approach can contribute to the overall effort of mapping individual trees. In the future, the greatest opportunity that deep learning and satellite imagery might offer is for large scale citizen science platforms. One can imagine a global platform for monitoring biodiversity, where the data is populated by deep learning pipelines and further refined by citizen scientists. In the future, these platforms may play a crucial role in tackling global challenges including climate change, extinction of species, and continuously shrinking biodiversity. This notion is also shared by several researchers in the field of remote sensing and sustainability, where the number of calls for research in this direction has been growing over the year [10].

Code availability
The tree detection framework based on U-Net is made publicly available at https://gitlab.com/Kariryaa/maskit. Please contact the author for support and more information.

Acknowledgements
I would like to thanks Sanjeev Sharma for discussions on this idea, Gian-Luca Savino for help with OSMnx library, Daniel Diethei for reviewing the manuscript, Tetiana Gren for support during the project and Johannes Schöning for the general guidance. This research was supported in part by the Volkswagen Foundation through a Lichtenberg Professorship.

Footnotes:

1. https://www.geoportal.de/portal/main/
2. http://suche.transparenz.hamburg.de/dataset/strassenbaumkataster-hamburg7

 

References:
1. Michele Acuto, Susan Parnell, and Karen C. Seto. 2018. Building a global urban science. Nature Sustainability 1, 1: 2–4.
2. Anthony D. Barnosky, Nicholas Matzke, Susumu Tomiya, Guinevere OU Wogan, Brian Swartz, Tiago B. Quental, Charles Marshall, Jenny L. McGuire, Emily L. Lindsey, and Kaitlin C. Maguire. 2011. Has the Earth’s sixth mass extinction already arrived? Nature 471, 7336: 51–57.
3. Geoff Boeing. 2017. OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems 65: 126–139.
4. Gerardo Ceballos, Paul R. Ehrlich, and Rodolfo Dirzo. 2017. Biological annihilation via the ongoing sixth mass extinction signaled by vertebrate population losses and declines. Proceedings of the national academy of sciences 114, 30: E6089–E6096.
5. Sean Gillies, B. Ward, and A. S. Petersen. 2013. Rasterio: geospatial raster I/O for Python programmers. URL https://github. com/mapbox/rasterio.
6. Caspar A. Hallmann, Martin Sorg, Eelke Jongejans, Henk Siepel, Nick Hofland, Heinz Schwan, Werner Stenmans, Andreas Müller, Hubert Sumser, and Thomas Hörren. 2017. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PloS one 12, 10: e0185809.
7. Chunbo Huang, Zhixiang Zhou, Di Wang, and Yuanyong Dian. 2016. Monitoring forest dynamics with multi-scale and time series imagery. Environmental monitoring and assessment 188, 5: 273.
8. Thorbjørn Louring Koch, Mathis Perslev, Christian Igel, and Sami Sebastian Brandt. 2019. Accurate segmentation of dental panoramic radiographs with U-NETS. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), 15–19.
9. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241.
10. Jan D. Wegner, Steven Branson, David Hall, Konrad Schindler, and Pietro Perona. 2016. Cataloging Public Objects Using Aerial and Street-Level Images – Urban Trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6014–6023.

Beitragsbild: Alexandra Daisy Ginsberg Designing for the Sixth Exrinction, 2015, Detail, Courtesy the artist, Installationsansicht Kunstpalais, Foto: Kilian Reil

Hinterlasse einen Kommentar