Site Reliability Engineering Lead Manager (600k+)

Global Investment Manager #007


Build and mentor a team of SREs.
Help develop robust organizational practices around monitoring, alerting, testing, release, and incident response.
Identify key uptime and performance metrics for our production systems and help service owners define and track SLOs for each.
Participate in or lead system design reviews, release plan reviews, and incident post-mortems.
Identify key risks to our production infrastructure and help us plan and prioritize both technical and procedural mitigations.
Help run our production trading systems day-to-day. Software Reliability Engineers at PDT are not first-level responders, but we expect them to be involved in incident response so that they’re exposed to the maintenance costs of the system and helping to reduce them over time.


5+ years of in a Software Engineer, DevOps, or Site Reliability Engineering role
5+ years of experience leading and managing DevOps or Site Reliability Engineering teams
5+ years of experience working with a public cloud offering (preferably AWS)
Mastery of at least one compiled programming language
Mastery of at least one production configuration management tool and one cloud-based infrastructure-as-code tool
Excellent written and verbal communication skills
Experience with Kubernetes is a plus

Bachelor’s or Master’s degree in Computer Science, engineering, or related field from a rigorous academic program

To apply for this job email your details to

Job Location