Hedge Fund #003

The System Operations Manager will be responsible for the global team across multiple regions (including Chicago, Hong Kong, London, New York, Singapore, and Austin), building out key processes to support a follow-the-sun model, and support incident management, disaster recovery, change management, across the firm. This role will primarily involve working with a team of engineers, developers, and support teams to build world class systems and the necessary tools to maintain and continually evolve. The position calls for someone with a mindset to proactively monitor, innovate, automate and use data/statistics to drive necessary improvements and drive an SRE mindset.

Responsibilities:

• Manage a team of engineers to provide the best experience for the organization
• Work with the team to ensure the reliability, availability, and performance of infrastructure and applications though active monitoring, response, and follow through on varying levels of incidents
• Help architect and improve alert management capabilities, drive projects to reduce event “noise,” and reduce the need for human event correlation over time
• Tuning Infrastructure, tools and applications to send meaningful and actionable alerts
• Assisting in the automation of repetitive tasks
• Drive SRE mindset — encourage ownership, invest in automation, build for scale, be proactive and prepared
• Resolving issues quickly for users and escalating to third parties or other groups as needed
• Effectively working in a cooperative and collaborative global team environment
• Engaging in technical collaboration with other Infrastructure groups and business teams

Primary Qualifications:

• Bachelor’s or Advanced Degree in Computer Science, Information Technology, Engineering, Business Administration or an allied field.
• Experience managing global technical teams – be a player/coach, working in the trenches
• Ability to effectively recognize and resolve complex technical issues
• Deep industry experience with Trading Platforms, market data delivery, critical real-time operations and environments
• Extensive experience in incident management, problem resolution, and driving process improvement
• Extensive experience with Disaster Recovery and high-availability planning and execution
• Experience with design and maintenance of enterprise monitoring platforms and tools
• Strong understanding of the SDLC process, release management, and enterprise change management
• Good understanding of Linux OS, and networking (TCP/IP, MCAST, etc.)
• Experience with enterprise automation systems (Rundeck, Tidal, etc.)