Currently designing and developing reference modules for a schematized diagnostic authoring experience to unify semantics across ~50 azure services toenable deeper ML based learning and more insightfulnext-step recommendations towards incident resolution usingcognitive workflows Earlier(2015-18), in the Azure SREObservability Development team, I led a stagewise transition of the monitoring infrastructure that reduced incident noise by ~65% and Time to Detect (TTD) by ~12 minutes, over and above uncovering monitoring gaps and other observability and troubleshooting enhancements. In the process, also mentored and built engineering discipline in the team and drove parallel initiatives to Validate Monitoring pipelines, automate Service Health Dashboards, track actionability of incidents and part-automate troubleshooting. Presented asa talk“Validation by Continuous Comparison†at SRE[MS]Con in Fall, 2018.
©