Project portfolio management in dynamic and uncertain environments increasingly requires methods capable of supporting rapid decision-making, continuous adaptation, and resilience against external volatility. Recent advances in machine learning provide a foundation for integrating algorithmic intelligence into portfolio-level processes, enabling organisations to select, prioritise, and adjust project configurations in real time. The purpose of this article was to develop and formalise an intelligent framework for adaptive project portfolio management based on the mathematical foundations of dynamic reinforcement learning algorithms. To achieve this goal, a complex of methods was applied, including mathematical modelling of decision-making processes using Multi-Armed Bandits, synthesis of the Upper Confidence Bound algorithm family, and scenario-based simulation for a comparative analysis of the proposed approaches’ effectiveness. The central result of the study was the justification of the advantages of the Dynamic Confidence Bound algorithm, which, through an exponential discounting mechanism, allowed the system to disregard outdated data and focus on current performance indicators. Experimental validation established that the use of machine learning increases cumulative reward by 18-22% compared to heuristic methods in stable environments, while in non-stationary conditions, Dynamic Confidence Bound outperforms classical approaches by 14-17%. Simulation results confirmed that the proposed model detects project performance degradation or shifts 2 to 4 times faster than standard mechanisms, minimising cognitive biases, particularly anchoring. It has been demonstrated that the implementation of adaptive discounting ensures 48-60% faster portfolio recovery after sharp external shocks compared to base Upper Confidence Bound algorithms. The study also demonstrated high model sensitivity to hyperparameter tuning, allowing for a flexible balance between the exploration of new opportunities and the exploitation of proven solutions depending on the organisation’s strategic context. The practical significance of the work lies in the creation of a ready-to-use computational pipeline that can be integrated into corporate project management systems to automate prioritisation and dynamic resource reallocation in real time
adaptive decision-making; Multi-Armed Bandit; Upper Confidence Bound; dynamic environments
Received 10.10.2025, Revised 29.01.2026, Accepted 24.02.2026 Published 21.05.2026
Retrieved from Vol. 13, No. 1, 2026
https://doi.org/10.56318/ eem2026.01.076
Pages 76-84