This project created a modelling approach that predicts block-level yields based on weather. Models are trained with historical vineyard data, provided by growers, matched with historical weather data. To make predictions, current season vineyard and weather data are used with the trained models to produce yield predictions. The strongest predictors for yields were the average yield of the block and weather at flowering. The accuracy of the models depended heavily on the sources of data. The most accurate predictions for a single grower were <10% error; the least accurate were >20% error.
Although prediction errors were higher than the project goal of 5%, the approach of using machine learning to predict vineyard yields based on weather data is promising. The main obstacle to higher prediction accuracy is the quality of historical grower data for training models. Also, reformatting grower data to be machine readable and consistent is prohibitively time consuming in many cases. In the future, data formatting problems could be resolved by growers using standard data formats, or at least consistent, well designed formats within each organisation.
The project demonstrated the feasibility of modelling block-level yields using historical vineyard data and weather data. It produced four types of models, two
that produce predictions with errors less than 15% on average, and methods that automate the screening process for selecting the best weather variable
combinations to use in models.
The two best performing models used only three weather variables as predictors, which illustrates the effectiveness of using weather to predict yield.
It was expected that prediction accuracy would increase with the inclusion of additional weather variables, made possible by the inclusion of more data than our preliminary models. Smaller amount of higher quality data with simple models outperformed larger amounts of data of variable quality used with more complex models.
The quality of grower data was likely affected by several factors that interfered with model accuracy. For example, blocks that were harvested differently in different years by combining or splitting blocks. Similarly, inconsistent block names year to-year were manually, but potentially incorrectly, cleaned.
Unaccounted losses from fruit damage such as frost and disease also affected data quality. We excluded years when growers noted substantial damage, but some growers did not note damage. Lastly, there were inconsistencies across growers in how dates of flowering and budburst were defined and recorded.
An additional potential source of grower-level model accuracy variation is the accuracy of weather data. We used weather data from the BOM station nearest each block. The best predictions were from flat regions and the worst were from hilly regions, which might have been coincidental, but could have reflected larger differences between actual weather data and BOM data in hilly regions.