Back to Research

York University  

Project 1151 - Automated Incident Prediction and Resolution for Cloud-native Applications

SHARE THIS POST

Running from 2022 to present

Automated Incident Prediction and Resolution for Cloud-native Applications

Many cloud outages are caused by exhaustion of resources, deployment misconfigurations, and software bugs that spread beyond the faulty application or micro-service. Often, these failures are caused by interference, the unforeseen influence of applications on each other. Issues such as anomaly detection, fault prevention or remediation, workload prediction and mitigation have become more difficult to address in these large-scale shared environments. The long-term goal of this project is to investigate application-agnostic techniques to design, develop, verify, and manage flexible AIOps for cloud-native software systems. The short-term goals are to a) build AI/ML models for interference anomaly detection and b) runtime remediation models for interfering applications. The expected outcomes are: new scientific and technological advancements materialized in patents, prototypes, products and publications; exchange of knowledge through meetings, workshops and presentations; and a new generation of talent, capable of addressing the challenges of nascent economies.

Explore the product that harvests this research results  

Research team:

  • PI: Prof. Marin Litoiu, York University
  • Student: Yar Rouf, York University
  • Student: Harit Ahuja, York University
  • Student: Raphael Rouf, York University
  • Student: Zakeya Namrud, York University
  • IBM Project Lead (RCL): Ian Watts, IBM
  • IBM Manager (RCM): Ian Watts, IBM
  • IBM Contributor (RCC): Eugen Postea, IBM
  • IBM Contributor (RCC): Radu Mateescu, IBM

Institution:

York University   

SHARE THIS POST