How machine learning can help coffee farmers fight disease

Coffee is one of the most popular and valuable beverages in the world, with an estimated global market value of over 100 billion US dollars. However, coffee production faces many challenges, such as climate change, pests, diseases, and market fluctuations.

One of the most serious threats to coffee crops is the coffee berry disease (CBD), a fungal infection that causes the berries to rot and fall off the plant. CBD can reduce the yield and quality of coffee by up to 80%, and affects millions of smallholder farmers in Africa, Asia, and Latin America.

To prevent and control CBD, farmers need to monitor their plants regularly and apply fungicides or biological agents. However, this can be costly, time-consuming, and environmentally harmful. Moreover, farmers may not have access to reliable information or tools to diagnose CBD accurately and timely. This is where machine learning, a branch of artificial intelligence that enables computers to learn from data and make predictions, can offer a solution.

In a research project at RISE, we have developed an efficient machine learning based solution for detecting CBD in images of coffee plants. The project aims to provide a low-cost, scalable, and accurate tool for farmers to monitor their coffee plants and take appropriate actions to prevent or treat CBD.

Dataset

The image dataset used was collected by our collaborating partners Mpendakazi Agribusiness in Tanzania. The annotation process is one of the most challenging and time-consuming tasks in machine learning; in the case of detecting and counting objects in images, it requires human experts to manually draw bounding boxes around the objects in each image. To reduce this burden, the project uses a novel method that combines weak and strong labels. Weak labels are point labels that mark the center of an object, while strong labels are box labels that enclose the entire object. The project has incorporated open-set detectors, machine learning models that can detect arbitrary objects without specific training, to generate proposals for ground truth bounding boxes in each image. The human annotators then only need to annotate point labels for the remaining objects in each image, which is much faster and easier than drawing boxes.

Modeling

Machine learning models that can learn from the annotated images and predict the bounding boxes and labels of CBD in new images were implemented. The project based its work on the YOLOv8 framework, which is a state-of-the-art model for object detection, and has modified it to work with the mixed weak and strong labels. The project has investigated two models for this task: Point-guided loss suppression (PLS) and mixed Point-Teaching (MPT).

The PLS model is a simple adaptation of YOLOv8, which uses a loss function that penalizes the model for predicting boxes that do not match the point labels. The MPT framework consists of two models, one that generates boxes that the other uses as pseudo labels during training. The pseudo labels are synthetic labels that are generated by a teacher model and used to guide the learning of a student model. The MPT framework aims to leverage the complementary strengths of the two models and improve their performance.

Evaluation

The performance of the developed models were evaluated on a test split of the collected CBD dataset. The models were compared with the baseline YOLOv8 model and the semi-supervised YOLOv8 model, which uses only box labels for training.

The evaluation results show that the PLS model gives a slight improvement in performance on the CBD dataset, compared to the semi-supervised model. The MPT framework generally performs worse, only performing above the baseline in a few cases. The exact efficiency of using point labels is difficult to determine, but the results indicate that there are potential use cases where annotating points is more efficient than boxes, especially with further development of the models.

Discussion

The research project at RISE has developed an efficient machine learning solution for detecting CBD in images of coffee plants, and compared different methods of annotating datasets with weak and strong labels. The project has in this early state made several contributions and implications for the field of machine learning and agriculture, such as:

Providing a low-cost, scalable, and accurate tool for farmers to monitor their coffee plants and take appropriate actions to prevent or treat CBD, which can improve their livelihoods and the sustainability of coffee production.
Exploring the use of open-set detectors to generate proposals for ground truth bounding boxes, which can reduce the human effort and time required for annotating datasets.
Proposing novel models that can learn from mixed weak and strong labels, which can increase the flexibility and efficiency of machine learning pipelines.

The project has so far identified a number of activities which will be investigated in future work:

Generalizing the models to other crops and diseases, which may require different data collection and annotation methods, and different model architectures and parameters.
Ensuring the reliability and robustness of the models in real-world scenarios, which may involve varying lighting, weather, and camera conditions, and different types of noise and errors.
Deploying the models to the end-users, which may require user-friendly interfaces, accessible devices, and secure and stable networks.
Expanding the data collection by involving more farmers and regions, and adding more metadata and features, such as the plant variety, the growth stage, the environmental factors, and time of year.
Exploring other applications of machine learning for agriculture, such as crop classification, yield estimation, and pest management.

The research project at RISE demonstrates how machine learning can help coffee farmers deal with coffee berry disease, and how this technology can be further developed to benefit agriculture in general. The project hopes to inspire more research and innovation in this field, and to contribute to the global goals of food security, poverty reduction, and environmental protection.

Summary

Project name

Machine learning for coffee berry diseas

Status

Active

RISE role in project

Koordinator

Project start

2023-01-01

Duration

2 years

Partner

Mpendakazi Agribusiness

Project members

Olof Mogren Aleksis Pirinen

Supports the UN sustainability goals

1. No poverty

2. Zero hunger

12. Responsible consumption and production

13. Climate action

Contact person

Olof Mogren

Senior Researcher

+46 73 023 56 09