Senior Devops Support Engineer / Site Reliability Engineer (Sre)

Senior Devops Support Engineer / Site Reliability Engineer (Sre)
Company:

Concurrent Systems


Details of the offer

About Our Company:


We are a dynamic, fully remote international company at the forefront of the future of work. Our distributed team spans multiple time zones and countries, allowing us to leverage a global talent pool and diverse perspectives.
Our mission is to revolutionise prepaid mobile services in Africa and Asia through innovative, omnichannel solutions. We empower Mobile Network Operators to optimise their entire value chain, balancing self-care options with agent-assisted services to reach all customers effectively in diverse economic environments.
We deliver consistent, hyper-personalised offers across all points of sale, transforming how operators create, manage, and distribute services. Our seamless integration of USSD and smart app technologies ensures accessibility and uniform experiences for all users. We help operators bundle and sell benefits effectively to subscribers, enhancing sales, distribution, and customer interactions.
Our goal is to spearhead telecommunications advancement in emerging markets, bridging technology gaps and fostering digital inclusion while adapting to regional challenges. Our integrated portfolio includes innovative solutions such as the OSG USSD gateway, CoaleSCE menu service environment, SmartShop bundle management system, and Crediverse EVD solution.


Job Overview:


We're seeking an experienced and adaptable Senior DevOps Support Engineer to join our dynamic remote team. This role is critical in maintaining and optimizing our complex, high-volume transaction-based software systems. The ideal candidate will excel in fast-paced environments, possess strong problem-solving skills, and have the ability to navigate and improve evolving system architectures.
You'll play a key role in bridging traditional support engineering with modern DevOps practices, contributing significantly to our mission of revolutionizing prepaid mobile services in emerging markets. This position offers unique challenges and opportunities for those who thrive on solving complex problems, driving system improvements, and collaborating closely with development teams.


Key Responsibilities:



Respond swiftly to incidents via VPN to customer sites, providing rapid troubleshooting and resolution
Deploy and upgrade our complex, high-volume transaction-based software in diverse customer environments
Implement and maintain robust monitoring and observability solutions to proactively identify potential issues
Develop and improve automated deployment processes using GitHub Actions and Ansible, enhancing system reliability and efficiency
Spearhead our GitOps-based configuration management and infrastructure as code initiatives
Participate in on-call rotations for critical incident response
Lead efforts in improving system documentation and knowledge sharing within the team
Lead the development and maintenance of our customer-facing knowledge base and self-service portal, ensuring comprehensive documentation and intuitive navigation for optimal customer experience
Conduct in-depth analysis of transaction data records, log files, and database tables to identify and resolve complex issues
Develop and execute complex SQL queries to investigate data anomalies, troubleshoot system behavior, and generate comprehensive reports
Create and maintain data analysis scripts and tools to automate routine investigations and improve efficiency in problem resolution
Design and implement solutions to address intricate challenges related to transaction processing and subscriber management
Perform root cause analysis on critical system issues and develop long-term solutions
Produce comprehensive incident reports for internal teams and external stakeholders, including detailed incident descriptions, thorough root cause analyses, and actionable recommendations for system improvements
Manage clustered environments and implement redundancy measures to ensure high availability
Develop and maintain disaster recovery (DR) sites and procedures, conducting regular DR drills
Collaborate closely with our distributed team to drive continuous improvement in our systems and processes
Mentor junior team members and contribute to building a culture of engineering excellence
Collaborate closely with developers to troubleshoot and resolve incidents, ensuring quick resolution and knowledge sharing between teams
Foster a DevOps culture by bridging the gap between development and operations, promoting shared responsibility for system reliability



Required skill/experience:



Proven experience (5+ years) in DevOps or Site Reliability Engineering roles
Strong knowledge of Unix operating systems and hardware
Extensive experience with telecom systems and protocols
Advanced proficiency in database management, particularly MariaDB
Strong proficiency in SQL and experience with advanced database querying techniques
Expertise in log analysis and the ability to extract meaningful insights from large volumes of log data
Experience with data visualization tools to effectively communicate findings from data analysis
Familiarity with scripting languages (e.g., Python, Bash) for automating data analysis tasks
Expertise in DevOps tools and practices, including GitHub Actions and Ansible
Demonstrated ability to work with and improve complex system architectures
Strong analytical skills for processing and interpreting large volumes of transaction data
Excellent troubleshooting and problem-solving skills, especially in high-pressure situations
Ability to work effectively in a remote environment with minimal supervision
Strong communication skills for coordinating with team members and customers
Proficiency in creating and maintaining system architecture documentation
Solid understanding of high-availability concepts and implementation in telecom environments
Experience with clustering technologies and redundancy strategies
Knowledge of disaster recovery planning and implementation
Strong collaborative skills, with experience working closely with development teams to resolve complex issues
Understanding of software development processes and ability to read and understand code for troubleshooting purposes
Experience in fostering a DevOps culture and promoting cross-team collaboration



Preferred skill/experience:



Experience working with systems handling millions of daily transactions
Background in fintech or telecom industries
Familiarity with USSD and smart app technologies
Knowledge of software deployment best practices in diverse environments
Experience with automated testing and continuous integration/deployment (CI/CD) pipelines
Background in customer support or technical account management for critical systems
Certifications related to high-availability systems or disaster recovery (e.g., CDCP, CBCP)
Experience in roles that bridged development and operations teams
Experience with big data technologies and distributed system analysis
ITIL v3 or v4 certification



What We Offer:



Opportunity to work on challenging projects that directly impact millions of users in emerging markets
Remote work environment that values work-life balance and independent problem-solving
Chance to be a key player in a dynamic team, with significant opportunity for individual impact and growth
Competitive compensation package, tailored to your location and experience
Exposure to cutting-edge technologies in the mobile services industry
Potential for rapid career advancement as you help drive our company's growth and evolution
Flexible working hours to accommodate different time zones and operational needs
Regular team-building activities and virtual events to foster connections among remote team members



Our Values:



Customer-Centric Approach: We passionately serve our customers by delivering innovative solutions that address complex challenges and create lasting value.
Engineering Elegance: We believe in purposeful design, intuitive usability, refined simplicity, and maintainability in all our solutions, even in the face of complex system landscapes.
Continuous Improvement: We're committed to constantly enhancing our systems, processes, and skills to stay at the forefront of our industry.



How to Apply:


If you're excited by the challenge of optimizing critical systems and driving technological advancement in emerging markets, we want to hear from you. Please submit your resume, a brief cover letter explaining your interest in the role and how you've tackled complex system challenges in the past, and any relevant portfolio or project examples to ******.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.


#J-18808-Ljbffr

Requirements

Senior Devops Support Engineer / Site Reliability Engineer (Sre)
Company:

Concurrent Systems


Verification Technician

Minimum Requirements: Must have a Grade 12, with Mathematics and/or Science as subjects required Relevant experience within FMCG Sector would be advantageou...


South Africa

Published a month ago

It Desktop Support Technician

A great opportunity for a top performing IT Desktop Support Technician to join our professional team, reporting to the IT Systems Administrator. Key Responsi...


South Africa

Published a month ago

Infrastructure Team Lead (Dbn)

We Want You: Are you a tech-savvy superhero with a passion for leading teams? BET Software is seeking an Infrastructure Team Lead who can provide guidance an...


From Betting Entertainment Technologies (Pty) Ltd - South Africa

Published a month ago

Merit Non Motor Desktop Assessor

Santam's Claims department has a position available for Merit Desktop Assessor (Non-Motor) based in Hill on Empire, Gauteng. ABOUT THE ROLEThe Merit Nonmotor...


From Sanlam Limited - South Africa

Published a month ago

Built at: 2024-09-20T15:29:06.110Z