Hedge Fund #002
We are looking for a Systems Engineer to join our Aligned Infrastructure team. The team is comprised of multidisciplinary individuals with unrestricted access across a large environment. We believe that one cannot build a truly great service without the ability to make changes across the stack. We take great care in focusing on solving real business problems, reducing operational overhead and working together as a team.
This team is responsible for the following areas – this includes both engineering and operations:
1. Data modelling, database tuning & query optimization
2. HPC job scheduling
3. Workflow management and batch processing
4. Container orchestration
5. Service discovery
6. POSIX and object storage systems
On Premise:
- Bare metal compute (Linux)
- System tuning
- Configuration management and drift management
- Performance tuning
- Network configuration management
- Compute, storage, network system purchases / evaluations
Cloud:
- Environment provisioning and management
Qualifications/Skills Required:
We are looking for individuals with experience in two or more of the following areas:
HPC job scheduling
- Experience in environments at scale (eg. billions of jobs per week/month)
- Understanding of cost metrics, preemption, job types, queuing, scheduler and optimizations
- Experience with products like HTCondor, slurm, spectrum LSF, nomad, AWS batch
Container Orchestration (Kubernetes)
- Experience with: PSPs, helm, admission/mutation controllers, PVs/PVCs, kube-router, BGP – generally demonstrated ability dig deep into the k8s projects to solve hard problems
- Experience with docker & registries (eg. harbor, artifactory, GCP container registry, AWS container registry)
- Mature approach to dealing with operational complexities and gaps of the kubernetes platform
Storage Systems
- Experience deploying and managing petabyte scale systems supporting varied workloads
- Mature approach to accessing price/performance, tiering and backup requirements
- Experience with products like GPFS, NetApp, Pure, Lightbits Ceph, GCP PDs or other nvme specific products
- Familiarity with NVMe over fabric, POSIX, object storage and various modes of permissioning data
Linux
- Experience using configuration management systems (eg. saltstack, ansible)
- Understanding of linux kernel components (eg. VFS, scheduler, memory mgmt., network)
- Solid troubleshooting experience using gdb, OS & application tracing/profiling mechanisms
- Experience with some of docker, lxd/lxc, kerberos, ebpf and virtualization technologies
Workflow management and batch processing
- Experience in the challenges of workflow management in heavily multi-tenant environments
- Mature approach to dealing with/avoiding task failure and system failure
- Experience with products like airflow, nifi, gnubatch, GCP cloud composer, AWS sagemaker
Software Engineering
- Proficient in OO development (we use python), git and CI/CD concepts
- Comfortable contributing to a large code-base with varied technologies
In addition to the above, the following qualifications always apply:
- Ability to review and/or extend open source platforms to satisfy business requirements
- A passion for technology and automation, deep sense of curiosity and willingness to always question
- A passion for in-depth understanding of technology, and building large-scale systems.
- Excellent verbal and written communication skills.