Who You’ll Work WithSRE's at Arista combine strong software and systems engineering with a passion for operating production systems at scale. As an SRE you’ll be part of the team responsible for our global service fleet.What You’ll Do
As an SRE you’ll be responsible for our global CloudVision service fleet. This includes:Building the CI/CD lifecycle for services, from inception and design to deployment and scalingImproving operational processes through automationIdentifying key service indicators to be used in capacity planningOwning disaster recovery and managementDriving infrastructure and cloud-based application security designLeading sustainable incident response and blameless postmortemsBeing an active member of our globally distributed on-call teamArista’s CloudVision is an enterprise network management and streaming telemetry SaaS offering. CloudVision is deployed on Kubernetes across global regions using Spinnaker for our CI/CD pipeline. Our tech stack runs on GKE, using HBase/Hadoop as main distributed database and storage layer, ElasticSearch for powering search data, ClickHouse for fast real time queries of flow data, our own Kafka-based distributed real time stream processing layer for analytics, and TensorFlow for ML analysis. Our monitoring system is built on top of Prometheus, Grafana, Loki, and other OSS tools.