Clustering and random forest classification of Canadian egg farms based on life cycle environmental impacts

Main Presenter: Ian Turner 

Co-Authors: Sayyed Ahmad Khadem Jeffrey L. Andrews Nathan Pelletier

The life cycle environmental impacts of egg production systems have decreased compared to historical levels (Pelletier 2018). Key to further reductions is an understanding of differences in impacts between farms, and what variables drive these differences. This is particularly true of non-traditional LCI data (i.e., data describing farm characteristics and management practices, rather than input/output data) which may be predictors of environmental impacts. To gain a better understanding of this, this work seeks to identify clusters of egg farms with similar environmental impacts, and develop a classification model capable of assigning farms to clusters based on non-traditional LCI data.

159 individual farm-level LCI models of Canadian egg farms were created based on data previously collected (Turner et al. 2022) using the olca-ipc (GreenDelta 2021), and individual LCIA calculations were performed. Principal component analysis (PCA) was performed to address correlation in LCIA results. Three principal components were retained for clustering representing 97% of the variability in impacts. K-means clustering (k = 3) was performed on the components. High silhouette scores (avg: 0.74) indicated that cluster membership could be used as a response variable for classification modelling. Two random forest classification models were built. The first included variables related to housing and manure management systems, while the second considered variables unrelated to housing and manure management systems. Models were developed using a 75/25% train/test split, with model performance assessed based on misclassification rate. PCA, clustering, and modelling was done using the
sklearn Python package (Pedregosa et al. 2011).

All organic farms in the sample were sorted into a low-impact cluster, while the high-impact cluster contained only conventional farms with liquid manure management systems. All other farms were sorted into the average-impact cluster. Both classification models performed well, with the first model perfectly classified the testing data, while the second model classified the testing data with 82.5% accuracy. Housing and manure management systems were the most important variables for predicting cluster membership, followed by percentage of eggs discarded on farm, and mortality rate.

This study is the first application of random forest classification based on life cycle environmental impacts. The developed models could provide egg farmers with an easy way to estimate the life cycle environmental impacts of their farms relative to their peers with ease using non-traditional LCI data. Future research must continue to develop methods to ensure the use of methodological best practices from both an LCA and machine learning perspective.

Get access to the recording of this presentation:

©2022 Forum for Sustainability through Life Cycle Innovation e.V. | Contact Us | Legal Info


If you would like to get in touch with us, please feel free to send us a message. Thank you very much in advance.

Subscribe To Our Newsletter
Join our newsletter and stay up-to-date on everything happening in the life cycle community! We'll send you 3-4 newsletters per year
Subscribe Now!

Log in with your credentials


Forgot your details?

Create Account