Hedge Fund #002

We are looking for a Systems Engineer to join our Aligned Infrastructure team. The team is comprised of multidisciplinary individuals with unrestricted access across a large environment. We believe that one cannot build a truly great service without the ability to make changes across the stack. We take great care in focusing on solving real business problems, reducing operational overhead and working together as a team.

This team is responsible for the following areas – this includes both engineering and operations:

1. Data modelling, database tuning & query optimization

2. HPC job scheduling

3. Workflow management and batch processing

4. Container orchestration

5. Service discovery

6. POSIX and object storage systems

On Premise:

Bare metal compute (Linux)
System tuning
Configuration management and drift management
Performance tuning
Network configuration management
Compute, storage, network system purchases / evaluations

Cloud:

Environment provisioning and management

Qualifications/Skills Required:

We are looking for individuals with experience in two or more of the following areas:

HPC job scheduling

Experience in environments at scale (eg. billions of jobs per week/month)
Understanding of cost metrics, preemption, job types, queuing, scheduler and optimizations
Experience with products like HTCondor, slurm, spectrum LSF, nomad, AWS batch

Container Orchestration (Kubernetes)

Experience with: PSPs, helm, admission/mutation controllers, PVs/PVCs, kube-router, BGP – generally demonstrated ability dig deep into the k8s projects to solve hard problems
Experience with docker & registries (eg. harbor, artifactory, GCP container registry, AWS container registry)
Mature approach to dealing with operational complexities and gaps of the kubernetes platform

Storage Systems

Experience deploying and managing petabyte scale systems supporting varied workloads
Mature approach to accessing price/performance, tiering and backup requirements
Experience with products like GPFS, NetApp, Pure, Lightbits Ceph, GCP PDs or other nvme specific products
Familiarity with NVMe over fabric, POSIX, object storage and various modes of permissioning data

Linux

Experience using configuration management systems (eg. saltstack, ansible)
Understanding of linux kernel components (eg. VFS, scheduler, memory mgmt., network)
Solid troubleshooting experience using gdb, OS & application tracing/profiling mechanisms
Experience with some of docker, lxd/lxc, kerberos, ebpf and virtualization technologies

Workflow management and batch processing

Experience in the challenges of workflow management in heavily multi-tenant environments
Mature approach to dealing with/avoiding task failure and system failure
Experience with products like airflow, nifi, gnubatch, GCP cloud composer, AWS sagemaker

Software Engineering

Proficient in OO development (we use python), git and CI/CD concepts
Comfortable contributing to a large code-base with varied technologies

In addition to the above, the following qualifications always apply:

Ability to review and/or extend open source platforms to satisfy business requirements
A passion for technology and automation, deep sense of curiosity and willingness to always question
A passion for in-depth understanding of technology, and building large-scale systems.
Excellent verbal and written communication skills.

To apply for this job email your details to Graham.Gates@TechExecOnline.com

Job Overview

Career Level
- Senior
Industry
- Information Technology
Qualification
- Bachelor Degree

Systems Engineer (HPC)

Hedge Fund #002

Senior Storage / Linux Engineer

Hands on Linux Engineering Manager

Job Overview

Career Level

Industry

Qualification

Job Location