The Product Reliability Engineering group is looking for a bright, passionate, and dedicated employee to join in our Operations 2nd Level Applications team as Staff Site Reliability Engineer role. In this role you will be responsible for supporting various Digital projects and supporting our production transaction processing systems. These systems are truly the backbone of our business and process millions of transactions daily.
Essential Functions:-
Support critical applications and ensure the stability of the applications by performing proactive maintenance activities.
Engage in automation activities.
Supporting application and infrastructure based on new technologies like Kubernetes containers, Kafka, Graphana, Prometheus, Elastic etc.
Perform root cause analysis and remediation.
Good knowledge on Cloud and VM ware infrastructure.
Good knowledge on F5 Load Balancer, TCP layer architecture.
Good Experience on Kubernetes and Docker (preferable OpenShift, MKE vendor products).
Basic knowledge of ansible and YAML scripting.
Requires working knowledge of production support processes such as incident/change/problem management, call triaging, escalation procedures and such.
Ability to write and maintain scripts to monitor system activity including application smoke test activities during pre and postproduction implementations.
Monitor application performance (e.g. memory, logging, latency).
Writing SQL queries for data analytics.
Code release into Test and Production environments using industry standard deployment tools.
Support application deployment using chef/Jenkins.
Support Client escalated issues specific to applications. (e.g. increased latency, transactional issues, features not working as expected etc. ).
Implement and maintain Performance monitoring dashboards using industry standard tools (Splunk, Thousand Eyes, Keynote, Runscope, Ghost inspector, Evolven, Graphite etc..).
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

