Pay Range$178,200.00-$293,900.00Company OverviewThe people of Memorial Sloan Kettering Cancer Center (MSK) are united by a singular mission: ending cancer for life. Our specialized care teams provide personalized, compassionate, expert care to patients of all ages. Informed by basic research done at our Sloan Kettering Institute, scientists across MSK collaborate to conduct innovative translational and clinical research that is driving a revolution in our understanding of cancer as a disease and improving the ability to prevent, diagnose, and treat it. MSK is dedicated to training the next generation of scientists and clinicians, who go on to pursue our mission at MSK and around the globe. Job DescriptionAssociate Director, Observability and Monitoring We have an exciting opportunity: Memorial Sloan Kettering Cancer Center is seeking a strategic and technically proficient Associate Director of Observability and Monitoring. This leader will drive the design, engineering, and operation of a comprehensive observability framework to enhance system reliability, provide predictive insights, and support both system and application-level monitoring. This role is pivotal in advancing our capabilities in observability, predictive analytics, and log management to ensure optimal performance and uptime. Role Overview:
Observability Strategy & Architecture: Define and implement MSKCC’s observability strategy, encompassing architecture, engineering, and operations. Build a robust framework that provides real-time insights into system and application health, user experience, and service performance.
System & Application Monitoring: Oversee the design and deployment of monitoring tools and technologies that capture key metrics at both the system and application levels. Ensure comprehensive visibility into infrastructure and application performance.
Predictive Analytics & Event Correlation: Develop and implement predictive analytics capabilities to identify and address potential issues before they impact performance. Build event correlation solutions that streamline incident detection, root cause analysis, and automated remediation.
Log Management: Lead the architecture and implementation of a scalable log management solution to enhance troubleshooting, security, and compliance. Integrate log data with monitoring systems to provide a holistic view of system health and performance.
Operational Excellence:Oversee the monitoring, alerting, and incident response processes, ensuring that critical systems are observed in real-time, and issues are addressed proactively.
Automation & Tooling:Lead the development and deployment of automation tools to streamline repetitive tasks, reduce errors, and enhance system reliability.
Oversee the integration and customization of tools for monitoring, logging, and alerting (e.g., Truesite, Splunk, DataDog). Foster a mindset of “Infrastructure as Code” (IaC) across tooling such as Terraform, Ansible, or Chef to ensure consistent, replicable infrastructure provisioning.
Innovation & Improvement: Continuously evaluate emerging technologies and methodologies to advance MSKCC’s observability capabilities. Implement improvements that reduce time-to-resolution, enhance visibility, and improve system reliability.
Key Qualifications:
Minimum of 10 years in observability and monitoring roles, with at least 5 years in a leadership capacity.
Proven experience in architecting and managing large-scale observability solutions in complex environments.
Strong background in automation and incident management.
Proficiency in observability tools (e.g., Datadog, Splunk, Prometheus, Grafana) and log management solutions.
Strong understanding of predictive analytics, event correlation, and application performance monitoring.
Experience in cloud and hybrid environments, with knowledge of AWS, GCP, or Azure.
Nice to Have:
Experience in a healthcare or research environment with a focus on high-availability and secure systems.
Familiarity with AI and machine learning approaches to enhance predictive analytics.
Core Skills:
Excellent communication skills with the ability to lead cross-functional teams, present to executive leadership, and foster collaboration among IT, development, and operations teams.
Demonstrated ability to solve complex technical problems, think strategically, and drive continuous improvement in observability practices.
Additional Information:
Schedule: Flexibility to be onsite as needed
Location: 633 Third Avenue, NY
Reporting to: Director, Infrastructure
Pay Range:$178,200 - $293,900
Helpful Links:
MSK Compensation Philosophy (https://careers.mskcc.org/frequently-asked-questions/)
Review Our Greats Benefits Offerings
#LI-POST
#LI-HYBRID
ClosingMSK is an equal opportunity and affirmative action employer committed to diversity and inclusion in all aspects of recruiting and employment. All qualified individuals are encouraged to apply and will receive consideration without regard to race, color, gender, gender identity or expression, sexual orientation, national origin, age, religion, creed, disability, veteran status or any other factor which cannot lawfully be used as a basis for an employment decision. Federal law requires employers to provide reasonable accommodation to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job or to perform your job. Examples of reasonable accommodation include making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. Stay in touch!Register now to join Memorial Sloan Kettering’sTalent Community (https://externaltalent-mskcc.icims.com/connect?back=intro&iniframe=1&hashed=-435744324) to receive inside information on our organization and new job opportunities.Job LocationsUSA-NY-New YorkPosted Date23 hours ago(2/9/2025 7:44 PM)Requisition ID 2025-83686 Category Digital Informatics and Technology Pay Range $178,200.00-$293,900.00