IMPACT OF TRAINING WINDOW FORMATION STRATEGY ON XGBOOST MODEL RETRAINING QUALITY UNDER CONCEPT DRIFT IN B2B TRANSACTIONS
DOI:
https://doi.org/10.32782/3041-2080/2026-7-4Keywords:
machine learning, XGBoost, concept drift, training window, model retraining, B2B forecasting, prequential evaluation, ERP systems, order classification, non-stationary environment, catastrophic retentionAbstract
This paper presents a systematic comparative study of eleven training window strategies for the XGBoost algorithm in a binary classification task of B2B order success prediction under concept drift. The study is conducted on a depersonalized dataset of 86,786 transactions from a real industrial ERP system spanning from July 2017 to November 2024. Evaluation follows a prequential protocol with monthly model retraining over a 76-month streaming horizon. Four strategy categories are compared: cumulative (full history), fractional (1/2, 1/3, 1/4 of history), temporal (12, 6, 4, 3 months), and count-based (10,000; 5,000; 3,000 samples). The “6-month time window” strategy achieved the highest global ROC-AUC of 0.9440 and the lowest LogLoss of 0.2732, while simultaneously exhibiting the lowest coefficient of variation (CV – 0.0269). The “12-month time window” strategy showed the worst result among the time configurations (ROC-AUC = 0.9302). The cumulative strategy, despite access to over 80,000 records in later iterations, systematically lost to more optimally sized windows due to the author-proposed “Catastrophic Retention” effect — the excessive preservation of irrelevant, outdated patterns, symmetric to catastrophic forgetting in neural networks. Statistical significance of differences is confirmed by the Friedman test (χ2 = 99,89; p < 10-16; W = 0,131) and pairwise Wilcoxon signed-rank tests with Bonferroni correction (17 of 55 pairs significant at α–0.05). Replacing the cumulative strategy with a “6-month time window” strategy reduces retraining computational costs by half, implementing Green AI principles in industrial MLOps pipelines.
References
Grabski S. V., Leech S. A., Schmidt P. J. A Review of ERP Research: A Future Agenda for Accounting Information Systems. Journal of Information Systems. 2011. Vol. 25, No. 1. P. 37–78. DOI: 10.2308/jis.2011.25.1.37.
Jawad Z. N., Balázs V. Machine learning-driven optimization of enterprise resource planning (ERP) systems: a comprehensive review. Beni-Suef University Journal of Basic and Applied Sciences. 2024. Vol. 13. Art. 4. DOI: 10.1186/s43088-023-00460-y.
Romero D., Vernadat F. Enterprise information systems state of the art: Past, present and future trends. Computers in Industry. 2016. Vol. 79. P. 3–13. DOI: 10.1016/j.compind.2016.03.001.
Zdravković M., Panetto H. Artificial intelligence-enabled enterprise information systems. Enterprise Information Systems. 2022. Vol. 16, No. 5. Art. 1973570. DOI: 10.1080/17517575.2021.1973570.
Duan Y., Edwards J. S., Dwivedi Y. K. Artificial intelligence for decision making in the era of Big Data – evolution, challenges and research agenda. International Journal of Information Management. 2019. Vol. 48. P. 63–71. DOI: 10.1016/j.ijinfomgt.2019.01.021.
Villegas-Ch G., García-Ortiz D., Luján-Mora S. Machine Learning and Deep Learning Models for Demand Forecasting in Supply Chain Management: A Critical Review. Applied System Innovation. 2024. Vol. 7, No. 5. P. 93. DOI: 10.3390/asi7050093.
Gama J., Žliobaitė I., Bifet A., Pechenizkiy M., Bouchachia A. A survey on concept drift adaptation. ACM Computing Surveys. 2014. Vol. 46, No. 4. P. 1–37. DOI: 10.1145/2523813.
Lu J., Liu A., Dong F., Gu F., Gama J., Zhang G. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering. 2019. Vol. 31, No. 12. P. 2346–2363. DOI: 10.1109/TKDE.2018.2876857.
Iwashita A. S., Papa J. P. Evolving strategies in machine learning: A systematic review of concept drift detection. Information. 2024. Vol. 15, No. 12. Art. 786. DOI: 10.3390/info15120786.
Abdul Razak M. S., Nirmala C. R., Sreenivasa B. R., Lahza H., Lahza H. F. M. A survey on detecting healthcare concept drift in AI/ML models from a finance perspective. Frontiers in Artificial Intelligence. 2023. Vol. 5. Art. 955314. DOI: 10.3389/frai.2022.955314.
Ditzler G., Roveri M., Alippi C., Polikar R. Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine. 2015. Vol. 10, No. 4. P. 12–25. DOI: 10.1109/MCI.2015.2471196.
Kraus A., van der Aa H. Machine learning-based detection of concept drift in business processes. Process Science. 2025. Vol. 2. P. 5. DOI: 10.1007/s44311-025-00012-w.
Gama J., Medas P., Castillo G., Rodrigues P. Learning with drift detection. Advances in Artificial Intelligence – SBIA 2004 : proceedings of the 17th Brazilian Symposium on Artificial Intelligence (Sao Luis, Maranhao, Brazil, Sept. 29 – Oct. 1, 2004) / ed. by A. L. C. Bazzan, S. Labidi. Springer, 2004. P. 286–295. (Lecture Notes in Computer Science ; vol. 3171). DOI: 10.1007/978-3-540-28645-5_29.
Baena-García M., Del Campo-Ávila J., Fidalgo R., Bifet A., Gavaldà R., Morales-Bueno R. Early drift detection method. Proceedings of the 4th International Workshop on Knowledge Discovery from Data Streams (IWKDDS 2006). Berlin, 2006. P. 77–86.
Bifet A., Gavaldà R. Learning from time-changing data with adaptive windowing. Proceedings of the 2007 SIAM International Conference on Data Mining (SDM 2007). SIAM, 2007. P. 443–448. DOI: 10.1137/1.9781611972771.42.
Bayram F., Ahmed B. S., Kassler A. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowledge-Based Systems. 2022. Vol. 245. Art. 108632. DOI: 10.1016/j.knosys.2022.108632.
Gomes H. M., Barddal J. P., Enembreck F., Bifet A. A survey on ensemble learning for data stream classification. ACM Computing Surveys. 2017. Vol. 50, No. 2. P. 1–36. DOI: 10.1145/3054925.
Suárez-Cetrulo A. L., Quintana D., Cervantes A. A survey on machine learning for recurring concept drifting data streams. Expert Systems with Applications. 2023. Vol. 213. Art. 118934. DOI: 10.1016/j.eswa.2022.118934.
Software engineering for machine learning: A case study / S. Amershi et al. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP 2019). IEEE, 2019. P. 291–300. DOI: 10.1109/ICSE-SEIP.2019.00042.
Žliobaitė I., Gabrys B. Analysis of descriptors of concept drift and their impacts. Informatics. 2025. Vol. 12,No. 1. Art. 13. DOI: 10.3390/informatics12010013.
Agrahari S., Singh A. K. Concept drift detection in data stream mining: a literature review. Journal of King Saud University – Computer and Information Sciences. 2022. Vol. 34, No. 10. P. 9523–9540. DOI:10.1016/j.jksuci.2021.11.006.
Žliobaitė I. Combining similarity in time and space for training set formation under concept drift. Intelligent Data Analysis. 2011. Vol. 15, No. 4. P. 589–611. DOI: 10.3233/IDA-2011-0484.
Kozal J., Guzy F., Woźniak M. Employing chunk size adaptation to overcome concept drift. Journal of Universal Computer Science. 2022. Vol. 28, No. 3. P. 249–268. DOI: 10.3897/jucs.80735.
Nikolaidis D., Doumpos M. Credit Scoring with Drift Adaptation Using Local Regions of Competence. Operations Research Forum. 2022. Vol. 3. P. 67. DOI: 10.1007/s43069-022-00177-1.
Webb G. I., Hyde R., Cao H., Nguyen H. L., Petitjean F. Characterizing concept drift. Data Mining and Knowledge Discovery. 2016. Vol. 30. P. 964–994. DOI: 10.1007/s10618-015-0448-4.
Iwashita A. S., Papa J. P. CODE: A moving-window-based framework for detecting concept drift.
Symmetry. 2022. Vol. 14, No. 12. Art. 2508. DOI: 10.3390/sym14122508.
Mestiri S. Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization. International Journal of Electrical and Computer Engineering. 2021. Vol. 11, No. 6. P. 5477–5487. DOI: 10.11591/ijece.v11i6.pp5477-5487.
Ashofteh A., Bravo J. M. A conservative approach for online credit scoring. Expert Systems with Applications. 2021. Vol. 176. Art. 114835. DOI: 10.1016/j.eswa.2021.114835.
Gama J., Sebastião R., Rodrigues P. P. On evaluating stream learning algorithms. Machine Learning. 2013. Vol. 90, No. 3. P. 317–346. DOI: 10.1007/s10994-012-5320-9.
González Hidalgo J. I., Maciel B. I. F., Barros R. S. M. Experimenting with prequential variations for data stream learning evaluation. Computational Intelligence. 2019. Vol. 35. P. 670–692. DOI: 10.1111/coin.12208.
Dawid A. P. Statistical theory: The prequential approach (with discussion). Journal of the Royal Statistical Society: Series A. 1984. Vol. 147, No. 2. P. 278–292. DOI: 10.2307/2981683.
Chen T., Guestrin C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, 2016. P. 785–794. DOI:10.1145/2939672.2939785.
Demšar J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research. 2006. Vol. 7, No. 1. P. 1–30. DOI: 10.5555/1248547.1248548.
Woolson R. F. Wilcoxon signed-rank test. Wiley Encyclopedia of Clinical Trials. 2007. DOI: 10.1002/9780471462422.eoct979.
Stratigakos A., Velásquez J., Johansson C. On the retraining frequency of global models in retail demand forecasting. Machine Learning with Applications. 2025. Vol. 19. Art. 100769. DOI: 10.1016/j.mlwa.2025.100769.
Overcoming catastrophic forgetting in neural networks / J. Kirkpatrick et al. Proceedings of the National Academy of Sciences. 2017. Vol. 114, No. 13. P. 3521–3526. DOI: 10.1073/pnas.1611835114.




