
Hedge Fund #EP33
o Location: New York City (Midtown) or London, UK
o Work Schedule: Hybrid – In-office Tuesday, Wednesday, Thursday
o Employment Type: Full-Time
o Compensation: Up to $250K base | Total compensation up to $400K
—
Overview
Hedge Fund Capital is seeking a Senior Data Engineer to join its growing Data Platforms team. This individual will play a pivotal role in designing and scaling the firm’s data lake infrastructure and ingestion pipelines, enabling AI-driven analytics and robust data delivery across its investment platform.
You’ll work closely with the Cloud Platform Engineer, analytics leads, and security master teams to build production-grade systems that support real-time and batch data processing, structured for both traditional financial modeling and modern AI/LLM-based workflows.
—
Key Responsibilities
Pipeline Development
· Build and maintain scalable batch and streaming ingestion pipelines across asset classes using Core Java and SQL-based frameworks
· Extend ingestion frameworks to new vendors onboard and normalize structured financial datasets
· Apply QA and validation layers across both raw and curated zones in the data lake
AI-Ready Data Engineering
· Structure and deliver data outputs optimized for:
o LLMs
o Vector embeddings
o AI agent access
· Expose lakehouse tables and metadata for AI summarization, query generation, and insight surfacing
· Prototype data-enabled AI workflows using tools like LangChain, Amazon Bedrock, or open-source models
Platform Integration & Performance
· Collaborate with the Cloud Platform Engineer to optimize:
o S3 partitioning, compression, and schema evolution
· Improve latency, parallelization, and throughput using compute engines such as Snowflake, Spark, or EMR
· Ensure data lineage, observability, and metadata integration for critical pipelines
Collaboration & Data Delivery
· Deliver clean, production-grade datasets to downstream teams including:
o Analytics Enablement
o Security Master
· Document assumptions, structures, and logic to ensure transparency and reproducibility
· Identify and resolve data quality issues, schema drift, or vendor feed anomalies
—
Qualifications
· 5–10 years of experience in core data engineering, ideally within financial services or regulated environments
· Proficiency in:
o Core Java
o SQL
o Ingestion frameworks (AWS Glue, OpenFlow, custom ETL)
· Hands-on experience with:
o Data lakes (S3, Delta, Iceberg)
o Cloud compute platforms (e.g., Snowflake, EMR)
o Must have Production Experience of creating pipelines across Data Lakes
· Strong understanding of:
o Data modeling, schema design, and tradeoffs between normalization/denormalization
· Exposure to:
o LLM/AI infrastructure (LangChain, Bedrock, Vector DBs)
§ Experience with AI; as you will enable AI empowered Analytics at this role
o Data cataloging and lineage tools (Amundsen, DataHub, Great Expectations)
· Experience tuning batch and micro-batch workloads
· Familiarity with financial market/reference data (fixed income, equities, derivatives) is a plus