DOI: https://doie.org/10.10399/JBSE.2025716014
Malatesh Kamatar, Dr. P Bindhu Madhavi
Reactive Fault Tolerance; Modified Long Short-Term Memory; Hybrid Black-winged Kite with Lyrebird Optimization; Dynamic Cognitive Reinforcement Learning and Failure Recovery.
Cloud computing is becoming a sure technology for big applications, but its dynamic nature leads it to faults and inefficiency. This paper introduces a novel framework for reactive fault tolerance in cloud computing using deep learning and hybrid optimization to improve reliability and efficiency. The Hybrid Black-winged Kite with Lyrebird Optimization (HBKLO) algorithm is used to optimize scheduling of tasks and failure recovery by striking a balance between multi-objectives such as cost, makespan, and energy consumption. Dynamic resource usage is achieved by a Dynamic Cognitive Reinforcement Learning (DCRL)-based load balancer that reassigns the tasks across virtual machines in real-time. It applies the Modified Long Short-Term Memory (MLSTM) model for predicting faults in time using historical data and can allow proactive system management. In such a failure situation, the proposed framework also employs a reactive Failure Recovery mechanism with HBKLO algorithm that reassigns tasks dynamically on available VMs in such a way that there is minimal disruption and fast recovery, thus balancing resource utilization to enhance system reliability. The validated result proves the effectiveness of such an integrated approach for achieving scalable, cost-effective, and energy-efficient solutions in terms of managing dynamic workloads for modern cloud computing environments.