Site Reliability Engineer

Quant Firm #004

This role requires a deep Linux operating system and application administration skill set, proficiency in Python, and solid experience with configuration management/IaC. Successful candidates should also have exceptional organizational, communication, and project management skills, as well as the ability to troubleshoot complex technical issues.

Responsibilities

  • Manage on-premise containerized web services
  • Automate and troubleshoot a broad range of technical infrastructure
  • Design and operate secure, reliable systems
  • Develop and implement monitoring solutions to ensure high system uptime and reliability; utilize tools to detect and resolve issues proactively
  • Document system architecture, processes, and best practices
  • Break down complexity, iterate, and communicate progress to a wide variety of leads and stakeholders
  • Assist with the administration of DHCP and DNS for both on-premise and external systems and applications

Qualifications

  • 5+ years of experience in site reliability engineering or related disciplines
  • Proficiency with Python
  • Experience managing and monitoring containerized infrastructure
  • Experience working with CI/CD tools such as Jenkins, GitHub Actions, or ArgoCD
  • Expert experience with IaC and configuration management tools such as Terraform, SaltStack, Chef, Puppet, or Ansible
  • Nice-to-haves:
    • Experience building and operating systems on cloud platforms (e.g. AWS, Azure, GCP)
    • OpenLDAP or other directory services management expertise
    • Atlassian Data Center administration experience (on-prem)
    • Web development experience

To apply for this job email your details to Graham.Gates@TechExecOnline.com

Job Overview
Job Location