Senior Data Engineer – Data Lake & AI

Hedge Fund #EP33

o Location: New York City (Midtown) or London, UK

o Work Schedule: Hybrid – In-office Tuesday, Wednesday, Thursday

o Employment Type: Full-Time

o Compensation: Up to $250K base | Total compensation up to $400K

Overview

Hedge Fund Capital is seeking a Senior Data Engineer to join its growing Data Platforms team. This individual will play a pivotal role in designing and scaling the firm’s data lake infrastructure and ingestion pipelines, enabling AI-driven analytics and robust data delivery across its investment platform.

You’ll work closely with the Cloud Platform Engineer, analytics leads, and security master teams to build production-grade systems that support real-time and batch data processing, structured for both traditional financial modeling and modern AI/LLM-based workflows.

Key Responsibilities

Pipeline Development

· Build and maintain scalable batch and streaming ingestion pipelines across asset classes using Core Java and SQL-based frameworks

· Extend ingestion frameworks to new vendors onboard and normalize structured financial datasets

· Apply QA and validation layers across both raw and curated zones in the data lake

AI-Ready Data Engineering

· Structure and deliver data outputs optimized for:

o LLMs

o Vector embeddings

o AI agent access

· Expose lakehouse tables and metadata for AI summarization, query generation, and insight surfacing

· Prototype data-enabled AI workflows using tools like LangChain, Amazon Bedrock, or open-source models

Platform Integration & Performance

· Collaborate with the Cloud Platform Engineer to optimize:

o S3 partitioning, compression, and schema evolution

· Improve latency, parallelization, and throughput using compute engines such as Snowflake, Spark, or EMR

· Ensure data lineage, observability, and metadata integration for critical pipelines

Collaboration & Data Delivery

· Deliver clean, production-grade datasets to downstream teams including:

o Analytics Enablement

o Security Master

· Document assumptions, structures, and logic to ensure transparency and reproducibility

· Identify and resolve data quality issues, schema drift, or vendor feed anomalies

Qualifications

· 5–10 years of experience in core data engineering, ideally within financial services or regulated environments

· Proficiency in:

o Core Java

o SQL

o Ingestion frameworks (AWS Glue, OpenFlow, custom ETL)

· Hands-on experience with:

o Data lakes (S3, Delta, Iceberg)

o Cloud compute platforms (e.g., Snowflake, EMR)

o Must have Production Experience of creating pipelines across Data Lakes

· Strong understanding of:

o Data modeling, schema design, and tradeoffs between normalization/denormalization

· Exposure to:

o LLM/AI infrastructure (LangChain, Bedrock, Vector DBs)

§ Experience with AI; as you will enable AI empowered Analytics at this role

o Data cataloging and lineage tools (Amundsen, DataHub, Great Expectations)

· Experience tuning batch and micro-batch workloads

· Familiarity with financial market/reference data (fixed income, equities, derivatives) is a plus

To apply for this job email your details to Graham.Gates@TechExecOnline.com

Job Overview
Job Location