Clustering and random forest classification of Canadian egg farms based on life cycle environmental impacts
Main Presenter: Ian Turner
Co-Authors: Sayyed Ahmad Khadem Jeffrey L. Andrews Nathan Pelletier
The life cycle environmental impacts of egg production systems have decreased compared to historical levels (Pelletier 2018). Key to further reductions is an understanding of differences in impacts between farms, and what variables drive these differences. This is particularly true of non-traditional LCI data (i.e., data describing farm characteristics and management practices, rather than input/output data) which may be predictors of environmental impacts. To gain a better understanding of this, this work seeks to identify clusters of egg farms with similar environmental impacts, and develop a classification model capable of assigning farms to clusters based on non-traditional LCI data.
159 individual farm-level LCI models of Canadian egg farms were created based on data previously collected (Turner et al. 2022) using the olca-ipc (GreenDelta 2021), and individual LCIA calculations were performed. Principal component analysis (PCA) was performed to address correlation in LCIA results. Three principal components were retained for clustering representing 97% of the variability in impacts. K-means clustering (k = 3) was performed on the components. High silhouette scores (avg: 0.74) indicated that cluster membership could be used as a response variable for classification modelling. Two random forest classification models were built. The first included variables related to housing and manure management systems, while the second considered variables unrelated to housing and manure management systems. Models were developed using a 75/25% train/test split, with model performance assessed based on misclassification rate. PCA, clustering, and modelling was done using the
sklearn Python package (Pedregosa et al. 2011).
All organic farms in the sample were sorted into a low-impact cluster, while the high-impact cluster contained only conventional farms with liquid manure management systems. All other farms were sorted into the average-impact cluster. Both classification models performed well, with the first model perfectly classified the testing data, while the second model classified the testing data with 82.5% accuracy. Housing and manure management systems were the most important variables for predicting cluster membership, followed by percentage of eggs discarded on farm, and mortality rate.
This study is the first application of random forest classification based on life cycle environmental impacts. The developed models could provide egg farmers with an easy way to estimate the life cycle environmental impacts of their farms relative to their peers with ease using non-traditional LCI data. Future research must continue to develop methods to ensure the use of methodological best practices from both an LCA and machine learning perspective.