ОПТИМІЗАЦІЯ ГІПЕРПАРАМЕТРІВ XGBOOST ДЛЯ ІНТЕЛЕКТУАЛЬНИХ СИСТЕМ ПРОГНОЗУВАННЯ B2B-ЗАМОВЛЕНЬ

Serhii Miroshnychenko

doi:10.32782/3041-2080/2025-4-13

Authors

Serhii Miroshnychenko METINVEST POLYTECHNIC TECHNICAL UNIVERSITY LLC https://orcid.org/0009-0007-4868-3006

DOI:

https://doi.org/10.32782/3041-2080/2025-4-13

Keywords:

optimization, machine learning, XGBoost, hyperparameters, B2B orders, AUC-PR, Optuna TPE, time constraints

Abstract

This paper focuses on developing a methodology for automated XGBoost hyperparameter optimization in intelligent systems for B2B order success prediction under time constraints. The research is based on comprehensive comparison of six optimization algorithms, including Random Search, Optuna TPE, Hyperopt TPE, genetic algorithm, particle swarm optimization, and sequential optimization. Experimental validation was conducted on two historical datasets with 86,794 records each, using AUC-PR metric as the optimization objective function and fivefold stratified cross-validation. Results demonstrate that Optuna TPE algorithm achieves the highest efficiency with maximum AUC-PR values of 0.9661 and 0.9780 for the studied datasets respectively. The optimal time interval for algorithm operation was established within 240–360 seconds, after which further optimization does not provide improvement and may lead to model quality degradation. Application of optimized hyperparameters ensured a reduction in classification errors by 5.3–6.2 % compared to default XGBoost settings. The study includes detailed analysis of hyperparameter search space and development of a time constraint control system. The methodology incorporates robust preprocessing techniques, including median imputation for missing values, interquartile range outlier detection with winsorization, and robust scaling for numerical features. The developed methodology has practical significance for creating automated decision support systems in the B2B sector and can be integrated into computer-integrated enterprise management technologies.

References

Miroshnychenko S. O. Improvement of the existing functionality for forecasting finished goods inventory at the enterprise warehouses based on the stock forecasted report ERP ODOO. MININGMETALTECH 2024 – THE MINING AND METALS SECTOR: INTEGRATION OF BUSINESS, TECHNOLOGY AND EDUCATION. Volume 2. 2024. P. 48–51. URL: https://doi.org/10.30525/978-9934-26-506-8-133

Nagassou M., Mwangi R.W., Nyarige E. A Hybrid Ensemble Learning Approach Utilizing Light Gradient Boosting Machine and Category Boosting Model for Lifestyle-Based Prediction of Type-II Diabetes Mellitus. Journal of Data Analysis and Information Processing. 2023. Vol. 11, no. 04. P. 480–511. https://doi.org/10.4236/jdaip.2023.114025

Miroshnychenko S. O., Koyfman O. O., Miroshnychenko V. I. The relevance of implementing an intelligent decision support system for enterprise resource optimization. MININGMETALTECH 2024 – THE MINING AND METALS SECTOR: INTEGRATION OF BUSINESS, TECHNOLOGY AND EDUCATION. Volume 2. 2024. P. 45–47. https://doi.org/10.30525/978-9934-26-506-8-132

Jawad Z. N., Balázs V. Machine learning-driven optimization of enterprise resource planning (ERP) systems: a comprehensive review. Beni-Suef University Journal of Basic and Applied Sciences. 2024. Vol. 13, no. 1. https://doi.org/10.1186/s43088-023-00460-y

Sibindi R., Mwangi R. W., Waititu A. G. A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices. Engineering Reports. 2022. https://doi.org/10.1002/eng2.12599

Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity / Y. A. Ali et al. Processes. 2023. Vol. 11, no. 2. P. 349. https://doi.org/10.3390/pr11020349

Bayesian Optimization with Additive Kernels for a Stepwise Calibration of Simulation Models for Cost-Effectiveness Analysis / D. Gómez-Guillén et al. International Journal of Computational Intelligence Systems. 2024. Vol. 17, no. 1. https://doi.org/10.1007/s44196-024-00646-x

Chen T., Guestrin C. XGBoost. KDD ‘16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA. New York, NY, USA, 2016. https://doi.org/10.1145/2939672.2939785

Luo W. Prediction of Flight Delays Based on the XGboost Model. 2024 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 26–28 August 2024. 2024. P. 104–109. https://doi.org/10.1109/cipae64326.2024.00024

Abdullhadi S., Al-Qudah D. A., Abu-Salih B. Time-aware forecasting of search volume categories and actual purchase. Heliyon. 2024. Vol. 10, no. 3. P. e25034. https://doi.org/10.1016/j.heliyon.2024.e25034

De T.S., Singh P., Patel A. A Machine learning and Empirical Bayesian Approach for Predictive Buying in B2B E-commerce. ICMLSC 2024: 2024 The 8th International Conference on Machine Learning and Soft Computing, Singapore Singapore. New York, NY, USA, 2024. https://doi.org/10.1145/3647750.3647754

Enhancing customer retention in telecom industry with machine learning driven churn prediction / A. Sikri et al. Scientific Reports. 2024. Vol. 14, no. 1. https://doi.org/10.1038/s41598-024-63750-0

Saito T., Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE. 2015. Vol. 10, no. 3. P. e0118432. https://doi.org/10.1371/journal.pone.0118432

Applications of Machine Learning in Real-Time Control Systems: A Review / X. Zhao et al. Measurement Science and Technology. 2024. https://doi.org/10.1088/1361-6501/ad8947

Optimizing Machine Learning Algorithms for Landslide Susceptibility Mapping along the Karakoram Highway, Gilgit Baltistan, Pakistan: A Comparative Study of Baseline, Bayesian, and Metaheuristic Hyperparameter Optimization Techniques / F. Abbas et al. Sensors. 2023. Vol. 23, no. 15. P. 6843. https://doi.org/10.3390/s23156843

Bassi D., Singh H. A Comparative Study on Hyperparameter Optimization Methods in Software Vulnerability Prediction. 2021 2nd International Conference on Computational Methods in Science & Technology (ICCMST), Mohali, India, 17–18 December 2021. 2021. https://doi.org/10.1109/iccmst54943.2021.00046

RandomizedSearchCV. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html (date of access: 29.06.2025).

optuna.samplers.TPESampler – Optuna 4.4.0 documentation. Optuna: A hyperparameter optimization framework – Optuna 4.4.0 documentation. URL: https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html (date of access: 29.06.2025).

Hyperopt Documentation. hyperopt.github.io ~ Hyperopt Project Home. URL: https://hyperopt.github.io/hyperopt/ (date of access: 29.06.2025).

DEAP documentation – DEAP 1.4.3 documentation. DEAP documentation – DEAP 1.4.3 documentation. URL: https://deap.readthedocs.io/en/master/ (date of access: 29.06.2025).

Welcome to PySwarms’s documentation! – PySwarms 1.3.0 documentation. Welcome to PySwarms’s documentation! – PySwarms 1.3.0 documentation. URL: https://pyswarms.readthedocs.io/en/latest/ (date of access: 29.06.2025).

spotpython. GitHub Pages. URL: https://sequential-parameter-optimization.github.io/spotPython/ (date of access: 29.06.2025).

Joel L. O., Doorsamy W., Paul B. S. A Review of Missing Data Handling Techniques for Machine Learning. nternational Journal of Innovative Technology and Interdisciplinary Sciences. 2022. Vol. 5, no. 3. P. 971–1005. https://doi.org/10.15157/IJITIS.2022.5.3.971-1005

A Modern Introduction to Probability and Statistics / F. M. Dekking et al. London : Springer London, 2005. https://doi.org/10.1007/1-84628-168-7

Bringing clarity to investment decisions | MSCI. URL: https://www.msci.com/eqb/methodology/meth_docs/MSCI_GIMIVGMethod_Feb2021.pdf (date of access: 29.06.2025).

RobustScaler. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html (date of access: 29.06.2025).

Chakraborty D., Elzarka H. Advanced machine learning techniques for building performance simulation: a comparative analysis. Journal of Building Performance Simulation. 2018. Vol. 12, no. 2. P. 193–207. https://doi.org/10.1080/19401493.2018.1498538

CyclicalFeatures – 1.8.3. Feature-engine – 1.8.3. URL: https://feature-engine.trainindata.com/en/1.8.x/user_guide/creation/CyclicalFeatures.html (date of access: 29.06.2025).

Categorical Features Transformation with Compact One-Hot Encoder for Fraud Detection in Distributed Environment / I. Ul Haq et al. Communications in Computer and Information Science. Singapore, 2019. P. 69–80. https://doi.org/10.1007/978-981-13-6661-1_6

Notes on Parameter Tuning – xgboost 3.0.2 documentation. XGBoost Documentation – xgboost 3.0.2 documentation. URL: https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html#handleimbalanced-dataset (date of access: 29.06.2025).

XGBoost Parameters – xgboost 3.1.0-dev documentation. XGBoost Documentation – xgboost 3.0.2 documentation. https://xgboost.readthedocs.io/en/latest/parameter.html (date of access: 29.06.2025).

K-Fold Cross-Validation: An Effective Hyperparameter Tuning Technique in Machine Learning on GNSS Time Series for Movement Forecast / N. Le et al. Recent Research on Geotechnical Engineering, Remote Sensing, Geophysics and Earthquake Seismology. Cham, 2024. P. 377–382. https://doi.org/10.1007/978-3-031-43218-7_88

Soper D. S. Greed Is Good: Rapid Hyperparameter Optimization and Model Selection Using Greedy k-Fold Cross Validation. Electronics. 2021. Vol. 10, no. 16. P. 1973. https://doi.org/10.3390/electronics10161973

Awwalu J., Ogwueleka F. On Holdout and Cross Validation: A Comparison between Neural Network and Support Vector Machine. International Journal of Trend in Research and Development. 2019. Vol. 6, no. 2. P. 235–239.

Vaxign-ML: supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens / E. Ong et al. Bioinformatics. 2020. Vol. 36, no. 10. P. 3185–3191. https://doi.org/10.1093/bioinformatics/btaa119

Särndal C.-E., Swensson B., Wretman J. Model Assisted Survey Sampling. New York, NY : Springer New York, 1992. https://doi.org/10.1007/978-1-4612-4378-6

Heart Disease Prediction Evaluation of Machine Learning Models with PSO-Optimized K-Fold Cross-Validation / N. Gupta et al. 2024 IEEE Region 10 Symposium (TENSYMP), New Delhi, India, 27–29 September 2024. 2024. P. 1–6. https://doi.org/10.1109/tensymp61132.2024.10752215

Sofaer H. R., Hoeting J. A., Jarnevich C. S. The area under the precision-recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution. 2019. Vol. 10, no. 4. P. 565–577. https://doi.org/10.1111/2041-210x.13140

Shenouda J., Bajwa W. U. A Guide to Computational Reproducibility in Signal Processing and Machine Learning [Tips & Tricks]. IEEE Signal Processing Magazine. 2023. Vol. 40, no. 2. P. 141–151. https://doi.org/10.1109/msp.2022.3217659

XGBOOST HYPERPARAMETER OPTIMIZATION FOR INTELLIGENT B2B ORDER FORECASTING SYSTEMS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Language

logo