Vacancy caducado!
Req ID: RQ177252Type of Requisition: RegularClearance Level Must Be Able to Obtain: NonePublic Trust/Other Required: NACI (T1)Job Family: Systems EngineeringSkills:Computing,Data Science,GPU Computing,HPC Cluster,IT System ArchitectureExperience:10 + years of related experienceJob Description:GDIT is seeking a Senior HPC Architect to join our Scientific Infrastructure Team, providing High Performance Computing (HPC) services for a large biomedical research community with the National Institute of Allergy and Infectious Diseases (NIAID). Our Scientific Infrastructure Team is responsible for enabling and managing HPC and its associated infrastructure and interconnects across multiple locations, 100’s of COTS and open-source scientific applications, and 40PB of data storage to include data archive, lifecycle policy management and data sharing services. This team serves as a customer-facing presence for the NIAID research community, providing a single point of support for new initiatives, ongoing projects, and scientific infrastructure needs.In your role as a Senior HPC Architect, you will be a subject matter expert architecting, implementing, and managing multiple high performance compute clusters and their associated infrastructure for a large biomedical research community.Work Visa sponsorship will not be provided for this position.HOW A SENIOR HPC ARCHITECT WILL MAKE AN IMPACT:
Provide hands-on administration and support for two HPC clusters; a 4000+ core HPC cluster that is GPU-focused and a 1,500+ core HPC cluster, including monitoring performance and health of both clusters
Install and support bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, AI/ML
Architect and design HPC clusters to include designing new clusters or expanding existing components such as storage, InfiniBand, and compute
Monitor and report on cluster performance and generate data to show usage and trends
Perform troubleshooting and problem-solving for complex HPC operational and performance issues
Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows to effectively move data from scientific instruments to the HPC clusters for analysis.
Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.
Develop and maintain documentation and diagrams for the HPC clusters, review GitHub pull requests, and update content and training materials on the user wiki portal.
Teach and mentor team members on system design, best practices, and troubleshooting techniques.
WHAT YOU’LL NEED TO SUCCEED:Education: BS/BA (or equivalent)Required Experience: Minimum of 10 years related experienceRequired Technical Skills:
Minimum of 5 years’ experience as engineer or architect with HPC technologies
Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architectures
Experience with Slurm job scheduling, including troubleshooting job status and optimizing submission scripts
Experience using Git to manage shared software configuration code bases
Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP)
Minimum of five years’ experience in Linux systems administration
Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations
Good understanding of fundamental networking concepts and their practical applications
Experience with Spack or EasyBuild package manager, including making packages from PyPi, R, Github
Knowledge and experience in one or more scripting languages applicable to Linux (e.g. Bash, Perl, Python)
Security Clearance Level: Must be able to obtain a NIH Public TrustPreferred Skills:
Experience administering RedHat / CentOS based systems
Experience working in a life-sciences oriented environment
Experience configuring and using monitoring systems to monitor HPC clusters
Ability to determine meaningful metrics and usage data for monthly status reports and health dashboards
Experience with DevOps or DevSecOps methodologies, such as automation and configuration management
Strong troubleshooting skills
Location: This position is primarily remote with travel as needed for special projects to the client sites in MD or MT, estimated 2 times a year. There can also be opportunity to work from the customer site in Rockville, MD more regularly for those local to the area.GDIT IS YOUR PLACE:
401K with company match
Comprehensive health and wellness packages
Internal mobility team dedicated to helping you own your career
Professional growth opportunities including paid education and certifications
Cutting-edge technology you can learn from
Rest and recharge with paid vacation and holidays
#GDITFedHealthJobs -NIH#GDITFedHealthJobs#GDITPriorityThe likely salary range for this position is $136,340 - $184,460. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.Our benefits package for all US-based employees includes a variety of medical plan options, some with Health Savings Accounts, dental plan options, a vision plan, and a 401(k) plan offering the ability to contribute both pre and post-tax dollars up to the IRS annual limits and receive a company match.To encourage work/life balance, GDIT offers employees full flex work weeks where possible and a variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave. GDIT typically provides new employees with 15 days of paid leave per calendar year to be used for vacations, personal business, and illness and an additional 10 paid holidays per year. Paid leave and paid holidays are prorated based on the employee’s date of hire. The GDIT Paid Family Leave program provides a total of up to 160 hours of paid leave in a rolling 12 month period for eligible employees.To ensure our employees are able to protect their income, other offerings such as short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance are provided or available.We regularly review our Total Rewards package to ensure our offerings are competitive and reflect what our employees have told us they value most.We are GDIT. A global technology and professional services company that delivers consulting, technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 30 countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development. Together with our clients, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology.We connect people with the most impactful client missions, creating an unparalleled work experience that allows them to see their impact every day. We create opportunities for our people to lead and learn simultaneously. From securing our nation’s most sensitive systems, to enabling digital transformation and cloud adoption, our people are the ones who make change real.GDIT is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.