Vacancy caducado!
Company DescriptionEpsilon is the leader in outcome-based marketing. We enable marketing that's built on proof, not promises. Through Epsilon PeopleCloud, the marketing platform for personalizing consumer journeys with performance transparency, Epsilon helps marketers anticipate, activate and prove measurable business outcomes. Powered by CORE ID®, the most accurate and stable identity management platform representing 200+ million people, Epsilon's award-winning data and technology is rooted in privacy by design and underpinned by powerful AI. With more than 50 years of experience in personalization and performance working with the world's top brands, agencies and publishers, Epsilon is a trusted partner leading CRM, digital media, loyalty and email programs. Positioned at the core of Publicis Groupe, Epsilon is a global company with over 8,000 employees in over 40 offices around the world. For more information, visit epsilon.com. Follow us on Twitter at @EpsilonMktg.Job DescriptionAs a Site Reliability Engineer, you'd proactively monitor and improve end-to-end system performance, identifying deficiencies and potential failures throughout our infrastructure. You will build deep, end-to-end knowledge of the complexity of our platform and continuously create improvements and automation to enhance durability, performance, and maintainability of the platform. You are central to the automation of everything at Epsilon.
Responsibilities:- Using Full Stack methodologies, develop and maintain scalable alerting, ticketing, and logging tools for debugging and monitoring
- Proactively monitor events, investigate issues, analyze solutions, and drive problems through to resolution using a wide variety of Ops tools and monitoring platforms to gain knowledge, understanding, and enable persistent monitoring of system availability, performance, and capacity
- Maintain our monitoring systems and develop new metrics/monitoring dashboards as additional coverage events become necessary
- Provide support to maintain a high availability environment
- Bachelor's degree in Computer Science or related field
- Experience with Python, JavaScript, React, Java
- Good understanding of Linux, Bash and shell scripting
- Knowledge of and experience with network stack, protocols, network management and monitoring tools
- Experience with automation tools: Puppet, Chef, Docker, Jenkins and/or Ansible
- Knowledge of Docker for container orchestration
- Experience with SQL
- Ability to work collaboratively in a fast-paced environment
- Experience working with Agile methodologies, preferably SCRUM.
- Excited by Big Data technologies and interested in integrating statistics and analytics to make our systems perform even better