: Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection...Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale...
into our systems' performance and health. Your Impact As a Senior Staff SRE with the Cortex Observability team, you will: Cloud... and a strong motivation for high reliability at the service level Observability Tools: High proficiency with Thanos, Prometheus, Grafana...
into our systems’ performance and health. Your Impact As a Senior Staff SRE with the Cortex Observability team, you will: Cloud... and a strong motivation for high reliability at the service level Observability Tools: High proficiency with Thanos, Prometheus, Grafana...