Cryptocurrency Firm #007
· Support the release of new services, data pipelines, and machine learning models through capacity planning, rollout planning, and release management.
· In collaboration with engineers and data scientists, define and implement monitoring strategies, and define SLAs and error budgets.
· Build and deploy automation tooling for supported services, data pipelines, and machine learning models.
· Troubleshoot and remediate issues with the services, data pipelines, and machine learning models you manage.
· Manage and run critical infrastructure and platform services.
· Track and execute continuous improvements.
· Strong understanding of Linux.
· Strong proclivity for automation and DevOps practices and tools such as Git, Ansible, and Terraform.
· Strong experience working with monitoring and logging tools: Prometheus/DataDog, ELK, Grafana.
· Good programming experience in either: Bash, Python, C++, or Java.
· Understanding of general networking protocols such as TCP/IP, DNS, and TLS.
· Familiarity with container orchestration platforms such as Nomad, ECS, or Kubernetes.
· Experience with database technologies such as Redis, Kafka, Snowflake, MongoDB, PostgresSQL, and MySQL.
· Experience with data engineering tools such as airflow, dbt, etc.
· Experience with machine learning tools (MLFlow/KubeFlow/BentoML) or platforms (SageMaker/Databricks).
· Broad exposure to at least one cloud platform: AWS, Google, Azure
· Familiarity working with open source software community
· Strong communication and writing skills.
· Minimum of BS degree in CS or related field. Preference is MS or PhD