Cloud Data Engineer
Comp: Up to 350k
The successful candidate will be a Data Engineer with experience designing and building data analytics platforms on cloud based infrastructure. The role will be focused on building a strategic data pipeline that will transform and persist data for various analytics use cases, while ensuring the completeness, consistency, and security of the data. The data sets will include structured and unstructured data, including a strong emphasis on text analytics and search.
Design and implementation of a large scale data analytics platform in a cloud based environment for ongoing production analytics is required. The candidate should have experience with both cloud-native data pipeline and transformation tools, such as AWS Kinesis, Redshift, Lambda, and EMR, as well as with open source tools such as NiFi, Kafka, Flume, Hadoop, Spark, and Hive.
The candidate must have experience developing production ready code to perform data transformations and basic analytics, in one or more programming languages, to include Python. Experience working with numerical, scientific, and machine learning libraries is desired.
Experience with text based analytics including basic NLP techniques (tokenization, stemming, NER, etc.) is a strong plus. Experience with Lucene based search engines is required with a preference for Elasticsearch.
The candidate should have experience persisting data in multiple forms for different types of analysis. Experience transforming and persisting data to relational, various forms of NoSQL, and graph data stores, is strongly desired as is experience working with unstructured data.
Experience with machine learning including building platforms to support the systematic training and testing of machine learning models is a strong plus.
Experience working collaboratively with a team and ensuring code review, testing, and automation is implemented, with a focus on being able to continuously integrate and deploy updates to the platform without impacting its resiliency or consistency.
- Extensive background in designing large scale data analytics platforms
- Experience in designing data pipelines in a cloud environment, preferably AWS
- Build solutions to process structured and unstructured data from multiple sources
- Experience with OLTP and OLAP systems
- Experience performing ETL at scale with open source and cloud native tools
- Strong logical and physical schema design and implementation
- Experience with Lucene based search engines, with a preference for Elasticsearch
- Work closely with the data science team
- Design and implement highly available, scalable and encrypted storage solutions
- Strong coding skills in Python or Java using analytic packages
- Management and operations of all environments
- Extensive experience managing data engineering for production environments
- Experience designing, building, and automating, AWS or other cloud environments
- Data persistence on AWS platforms such as S3, RDS, EMR, DynamoDB, and Redshift
- AWS pipeline and transformation tools such as Kinesis, Lambda, and EMR
- AWS APIs and automation including Boto3
- Cloud Formation or other cloud environment build automation tools
- Building elastically scalable environments that leverage both horizontal and vertical scaling
- Developing cost saving strategies using both reserved and spot instances
- Experience developing collaboratively, including infrastructure as code, in Python with Git
- Excellent communication skills.
- Excellent written and verbal communications with clients, vendors, and teammates, with an ability to summarize and translate between business and technical contexts
- Excellent troubleshooting and analytical skills
- Self-starter able to execute independently, with light supervision
- Bachelor’s degree (or higher) preferred in a STEM subject
|Job Category||Full Time|