Site Reliability Engineer

Details of the offer

Kazang Micro Merchant Division Senior Site Reliability Engineer A vacancy exists for a Senior SRE within the Kazang - Micro Merchant Division, in Cape Town, South Africa (Hybrid). We are seeking a Site Reliability Engineer (SRE) with expertise in Linux-based, open-source environments to ensure the reliability, scalability, and performance of our systems. In this role, you will design and implement automated solutions for monitoring and system optimisation while managing and maintaining critical infrastructure. You will work closely with the DevOps team to support deployments and CI/CD pipelines, leveraging open-source tools to address operational challenges and enhance system resilience. Key Responsibilities include, but are not limited to: Design, implement, and maintain reliable systems in a Linux and open-source environment to meet uptime and performance objectives. Support the DevOps team with CI/CD pipelines, ensuring seamless and reliable deployments. Manage and optimize AWS-based infrastructure for scalability, cost efficiency, and performance. Develop and maintain monitoring and alerting systems to ensure observability and proactively address system issues. Build and maintain robust solutions for metric collection, dashboarding, and alerting to provide actionable insights and real-time system visibility. Conduct root cause analysis for incidents, implementing preventive measures to improve system resilience. Perform regular system maintenance, including updates, patches, and optimizations. Prepare and deliver comprehensive reporting on system performance, incidents, and reliability metrics. Identify and mitigate risks to system reliability, scalability, and security. Ensure compliance with organizational and regulatory standards in system design and operations. Participate in a rotational on-call schedule to ensure the reliability and availability of critical systems. In order to be considered for this position, the following requirements must be met: Years of Experience : A minimum of 5 years of professional experience in Site Reliability Engineering, DevOps, or a related field, with demonstrated expertise in Linux-based, open-source environments, and cloud infrastructure (AWS). Education : A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field is required. Equivalent practical experience in lieu of a formal degree will be considered for highly qualified candidates. Technical Competencies: Fault Finding and Debugging Expertise in diagnosing and resolving complex system issues, including performance bottlenecks, service outages, and application errors, using debugging tools, logs, and monitoring data. Scripting and Programming Proficiency in at least one programming or scripting language (e.g., Python, Bash, Go), with the ability to write automation scripts, develop tools, and optimize system performance . Cloud Infrastructure Management (AWS) Hands-on experience with AWS services (e.g., EC2, S3, RDS, VPC), with the ability to design, manage, and optimize cloud-based infrastructure for scalability, reliability, and cost-efficiency. Monitoring and Observability Skilled in implementing monitoring solutions (e.g., Prometheus, Grafana, ELK stack) and designing systems for metrics collection, dashboarding, and alerting to ensure system health and performance. Automation and Infrastructure as Code (IaC) Proficiency with tools like Ansible, Terraform, or similar frameworks to automate system management, deployments, and configurations, reducing manual effort and ensuring consistency. Behavioural Competencies: Problem-Solving and Critical Thinking Demonstrates a proactive and analytical approach to identifying issues, diagnosing root causes, and implementing effective solutions in complex technical environments. Collaboration and Teamwork Works effectively with cross-functional teams, including DevOps, development, and operations, fostering a culture of shared ownership and open communication to achieve reliability goals. Adaptability and Continuous Learning Embraces change, learns new technologies quickly, and adjusts strategies to meet evolving system and organizational needs, particularly in fast-paced, dynamic environments.


Nominal Salary: To be agreed

Source: Whatjobs_Ppc

Job Function:

Requirements

Roads Design Engineer

We have a client with an amazing opportunity based in Cape Town. Role : Roads Design Engineer Industry : Civil Engineering Level : Mid-Level Key Requirements...


Goldman Tech Resourcing - Western Cape

Published 8 days ago

Senior Engineering Consultant

Project Leadership: Oversee engineering projects from conception to completion. Technical Expertise: Provide advanced technical analysis and recommendations....


Exceed Human Resource Consultants Ltd - Western Cape

Published 8 days ago

Compliance Officer (Cat I, Ii And Iia) - Western Cape

At Masthead, our passion is keeping businesses in business. As a national supplier of risk management services to independent financial advisors and other li...


Masthead Ltd - Western Cape

Published 8 days ago

Rf Engineering Manager

We are seeking a highly skilled RF Engineering Manager to lead our talented team of RF engineers in designing and optimizing advanced RF systems. This role i...


Network Recruitment - Western Cape

Published 8 days ago

Built at: 2024-11-24T01:02:04.587Z