Quant Firm #004
This role requires a deep Linux operating system and application administration skill set, proficiency in Python, and solid experience with configuration management/IaC. Successful candidates should also have exceptional organizational, communication, and project management skills, as well as the ability to troubleshoot complex technical issues.
Responsibilities
- Manage on-premise containerized web services
- Automate and troubleshoot a broad range of technical infrastructure
- Design and operate secure, reliable systems
- Develop and implement monitoring solutions to ensure high system uptime and reliability; utilize tools to detect and resolve issues proactively
- Document system architecture, processes, and best practices
- Break down complexity, iterate, and communicate progress to a wide variety of leads and stakeholders
- Assist with the administration of DHCP and DNS for both on-premise and external systems and applications
Qualifications
- 5+ years of experience in site reliability engineering or related disciplines
- Proficiency with Python
- Experience managing and monitoring containerized infrastructure
- Experience working with CI/CD tools such as Jenkins, GitHub Actions, or ArgoCD
- Expert experience with IaC and configuration management tools such as Terraform, SaltStack, Chef, Puppet, or Ansible
- Nice-to-haves:
- Experience building and operating systems on cloud platforms (e.g. AWS, Azure, GCP)
- OpenLDAP or other directory services management expertise
- Atlassian Data Center administration experience (on-prem)
- Web development experience