Hadoop/Big-Data:

Sound knowledge on managing large scale Hadoop platforms including monitoring the platform, debugging issues, and tuning the performance of the cluster.
In-depth knowledge of the Hadoop ecosystem, including Zookeeper, HDFS, Yarn, HIVE, SPARK, Trino and Kafka.
Proven experience in debugging issues on both Hadoop platform and applications.
Familiarity with security tools such as Kerberos, Ranger, and active directory integrations.
Experience on Cloud technologies preferably AWS EMR.
Knowledge on Kubernetes, AI, MLOPS will be advantageous.

Collaboration and Teamwork:

Collaborate closely with L-3 teams to review new use cases and implement cluster hardening techniques, ensuring the development of robust and reliable platforms.
Foster cross-team collaboration, building and maintaining strong relationships with customer teams, user communities, architects, and engineering teams.
Work jointly on key deliverables to ensure production scalability and stability.

Automation: Hands-on Experience with automations using Ansible, Shell, python, or any programming languages. The ability to automate the manual tasks is key in this role.

Observability: knowledge on observability tools like Grafana, opera, Prometheus and Splunk.

Linux: understanding of Linux, networking, CPU, memory, and storage.

Programming Languages: Knowledge of and ability to code or program in one of python, Java or a widely used coding language.

Communication: Excellent interpersonal skills, along with superior verbal and written communication abilities.

This position is not ideal for a Hadoop developer.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

Staff Site Reliability Engineer - PRE