Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, and P. Bahl, "Detailed diagnosis in enterprise networks," in SIGCOMM '09, New York, NY, USA: ACM, 2009, pp. 243-254.

This is a very recent work in the area of fault diagnosis in Enterprise networks. The current diagnosis tools are either concerned with failures at the machine level and not application/process level or require extensive knowledge of the applications which makes the problem intractable. NetMedic enables detailed diagnosis at the process level with little application specific knowledge by framing the problem as an inference problem and further estimating the statistical failure relationship of entities from the past observations.

In order to have a better inference performance, a rich set of variables is used for the process state instead of
Publish Post
a single health state variable. The goal is to obtain a detailed diagnosis while system is application agnostic. The process NetMedic takes is to detecting abnormality, computing the edge weight which indicating the historical correlation of failures and further ranking the likely causes of failure. The experiment authors performed showed the correct source to be always one of the top 5 ranked sources and 80% of the time to be the top ranked one.

NetMedic is an interesting application of statistical inference to real networks which can be quite useful if scalability is further resolved.

1 comment:

  1. Do you think it would generalize to other environments? Is it too dependent on assuming correlation implies causality? Or perhaps for the kind of apps and templates they focus on, everything works out fine?

    ReplyDelete