Investment Management #010
v The Linux SRE team is responsible Implementing, Deploying, and Troubleshooting the cloud and on-premise Linux platforms that run the bulk of the firms compute workloads.
v They work with various teams to enable engineers, quants, researchers, and developers to best leverage our Linux and related platforms within the firm.
v The Senior Linux SRE should have experience driving standardization through robust automation, testing and in depth monitoring.
v The role requires in depth knowledge of Linux platforms including performance optimization, security and management at scale.
v Experience automating the deployment of services that span physical Infrastructure, configuration management, on to the application, is required.
v The delivery of a platform must incorporate automated testing, and reviews to ensure consistency, as it moves through SDLC stages.
v This role will drive the development standards and prioritization of engineering efforts towards a common goal.
v In addition to general compute platforms, the team will be responsible for Cloud platforms, and low latency trading platforms co-located with exchanges.
v This candidate will ideally have experience with bare metal, hypervisors, containers, and public cloud platforms.
v Experience working with containers and container services (eg docker, podman, kubernetes) is also a significant plus as is familiarity with tools for testing build automation
v Solid programming and scripting skills required. Comfort and experience with Python,
v Deep understanding of Linux authentication and authorization mechanisms (eq PAM, SSSD, NIS, Kerberos, SSH, Sudo, etc.)
v Experience in tuning systems for high throughput and low latency
v Understanding of networking and protocols including TCP, UDP, HTTP, DNS, DHCP, NFS, NTP, PTP, and ability to leverage packet captures for troubleshooting
v Recent tasks that the team is handling:
§ Troubleshooting filesystem out of space, troubleshoot docker
§ Troubleshooting user missing from certain AD or security group, trying to access something
§ Troubleshooting network communication problems between salt master and minion(s)
§ Troubleshooting VDI problems. ( i.e. grow filesystem )
§ Troubleshooting installation from apt repo (we cache them)
o We internally cache/proxy the apt packages that developers use to install software on Linux rather than going out over the internet.
§ Troubleshooting access to NAS filesystems (permissions)
§ Troubleshooting RSS feed down
§ Troubleshooting access to azure storage
§ Modify VMWare settings for performance
§ Update certifications