Financial Services Company #021
· Design, build & maintain large scale high performing, secure Kubernetes and other application platform infrastructure on
Azure, AWS, Oracle Cloud, GCP etc.
· Own the Infrastructure and work with DevOps teams to Build, Release, Monitor and run the services to improve service
· Devise the config management / orchestration suite, know where it’s broken, work towards fixing them and explore new
· Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving
those actions to closure
· Performance and maturity baselines of DevOps process, tools maturity & coverage, metrics, technology and engineering
· Define, Measure and improve Reliability Metrics, Observability (Monitoring, Logging-Tracing solutions), Ops process
(Incident, Problem Management) and streamline – automate release management
· Be a subject matter expert, able to upskill / cross skill engineering teams on SRE principles, tools and execution
· Dev Ops, Debugging skills, experience in logging and monitoring solutions such as Elastic Search, Kibana, Prometheus,
AWS CloudWatch/Cloud Metrics, etc.
· Produce and maintain documented operational procedures and diagrams for all environments and related applications
· Provide support for developers and business users requests and queries
· Conduct performance monitoring/tuning, and capacity monitoring/planning for all environments
· Produce scripts to perform and automate regular administration and housekeeping tasks
· Liaise with infrastructure teams to implement and deliver the container direction
· Perform additional duties as assigned by line management
Skills and experience:
· The candidate should have work experience running distributed systems and Experience with automated provisioning and
management of AWS/Azure infrastructure and services.
· Experience with scripting and orchestration including Terraform and/or CloudFormation or similar
· Experience with monitoring tools such Dynatrace AppDynamics, ELK, Grafana, Prometheus, or equivalent
· Experience with public cloud (Azure, AWS, GCP)
· Experience working with Jira, Jenkins, Jfrog, X-ray, ECR, ACR, git, Prometheus, etc.
· Experience automating the software dev/test/deployment lifecycle with continuous integration and continuous deployment
· Experience with scaling, monitoring, and troubleshooting actively running systems
· Good understanding & implementation experience using 12-factor App principle
· Experience in building monitoring/metrics & alerting tool (APM tool), custom dashboard for each Application stack against
· Experience with container networking & security, image scanning for vulnerabilities using tools,
· Experience with Source code management and Implementation of Security best practices, other DevSecOps principles and
· Excellent hands-on experience with Red Hat Unix/Linux-OS Internals and administration.
· Good understanding of uplifting the maturity (App Engineering practices & Ops)
· Understanding of software delivery lifecycles, particularly Agile/Lean & DevOps