Aginic provided Cricket Australia with insights derived from a pilot data science project aiming to identify the characteristics in junior cricketers that are related to elite ODI batting performance. To build this data-driven decision making into their selection processes, we recommended building a management dashboard that surfaces the relevant predictive metrics for each junior player to enable selection decisions for representative squads. To improve machine learning predictions, we also recommended identifying a more homogenous group of players who face more comparable playing conditions to those faced by elite players.
Critical to the project’s success was early engagement with the cricket subject matter experts (SMEs) to agree on the business problems, and also the technical SMEs to understand the data and limitations. It was also beneficial to have deep and early engagement with Cricket Australia’s data scientists to understand their previous work and feature selection and engineering to build off and prevent repetition of previous work.
2 We have used random forest models for “feature selection”. Random forest models are a powerful type of ensemble model, which combines the results of a number of decision-tree models, using independent random samples of the data (bootstrapping).