Site Reliability Engineer / DevOps Engineer

Investment Management #010

SRE Engineer

v The Linux SRE team is responsible Implementing, Deploying, and Troubleshooting the cloud and on-premise Linux platforms that run the bulk of the firms compute workloads.

v They work with various teams to enable engineers, quants, researchers, and developers to best leverage our Linux and related platforms within the firm.

v The Senior Linux SRE should have experience driving standardization through robust automation, testing and in depth monitoring.

v The role requires in depth knowledge of Linux platforms including performance optimization, security and management at scale.

v Experience automating the deployment of services that span physical Infrastructure, configuration management, on to the application, is required.

v The delivery of a platform must incorporate automated testing, and reviews to ensure consistency, as it moves through SDLC stages.

v This role will drive the development standards and prioritization of engineering efforts towards a common goal.

v In addition to general compute platforms, the team will be responsible for Cloud platforms, and low latency trading platforms co-located with exchanges.

v This candidate will ideally have experience with bare metal, hypervisors, containers, and public cloud platforms.

v Experience working with containers and container services (eg docker, podman, kubernetes) is also a significant plus as is familiarity with tools for testing build automation

v Solid programming and scripting skills required. Comfort and experience with Python,

v Deep understanding of Linux authentication and authorization mechanisms (eq PAM, SSSD, NIS, Kerberos, SSH, Sudo, etc.)

v Experience in tuning systems for high throughput and low latency

v Understanding of networking and protocols including TCP, UDP, HTTP, DNS, DHCP, NFS, NTP, PTP, and ability to leverage packet captures for troubleshooting

v Recent tasks that the team is handling:

§ Troubleshooting filesystem out of space, troubleshoot docker

§ Troubleshooting user missing from certain AD or security group, trying to access something

§ Troubleshooting network communication problems between salt master and minion(s)

§ Troubleshooting VDI problems. ( i.e. grow filesystem )

§ Troubleshooting installation from apt repo (we cache them)

o We internally cache/proxy the apt packages that developers use to install software on Linux rather than going out over the internet.

§ Troubleshooting access to NAS filesystems (permissions)

§ Troubleshooting RSS feed down

§ Troubleshooting access to azure storage

§ Modify VMWare settings for performance

§ Update certifications

To apply for this job email your details to

Job Location