Vacancy caducado!
The Network.AI group is a new team within Facebook Infrastructure. The charter of the new group spans the design and operations of the AI networking Infra including the network switches and the host side systems, as well as forward-looking projects such as transport evolution. Network Engineers at Facebook are a hybrid software/network engineers who design, build and operate our worldwide data center network. This team owns the complete lifecycle of the AI network in the data center from planning, design, product definition, QA, deployment and monitoring. Simple and scalable network design, automation and data analytics are the keys to meeting our demands. In this role, you will be responsible for conceiving, developing and deploying network software, systems and tools that keep the AI data center network operating at maximum reliability, scalability and efficiency.Do you like developing innovative solutions to some of the most complex scaling and reliability challenges out there? Do you want to build and operate the hyper-scale data center network that powers the world's largest social network? Do you want to ship code in production that positively impacts the experience of billions of users worldwide? Then, this is the role for you.
- (Re)Design, deploy, manage and maintain the Facebook datacenter networks for AI infrastructure worldwide
- Develop software that improves the reliability, efficiency and velocity of building and operating the AI datacenter network
- Participate in the network on-call rotation and be an escalation contact for site events. Analyze data and identify root cause to network issues. Build monitoring systems and software robots that can debug and remediate network issues at scale
- Test new network platforms before they are deployed in production
- Build automation that improves the safety and reliability of our network software CI/CD pipeline
- Partner alongside the best engineers in the industry on the coolest stuff around - the code and systems you work on, will be in production and used by billions of users all around the world
- 2+ years of experience in one or more of higher level programming languages (Python, C, C, Go, etc.)
- Understanding of TCP/IP
- 7+ years of experience with RoCE, Infiniband, RDMA - understanding of typical configurations and performance
- 7+ years of experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems
- Experience in developing and understanding network device configuration for at least one vendor (Arista, Juniper, Cisco, Brocade, Ciena, Infinera, Linux, etc.)
- Experience in understanding and mitigating network hardware and topology failures
- BS or MS in Computer Science or Computer Engineering or Electrical Engineering
- Experience in a service provider or hyper-scale network in engineering or design roles
- Knowledge in TCP/IP Congestion Control Algorithms (DCTCP/Cubic)
- Knowledge of Network QoS and Scheduling algorithms (WRR/SP)
- Understanding of the internals of a Router/Switch hardware, NPU/data planes and Optics
- Understanding of the design principles and troubleshooting of distributed systems
Vacancy caducado!