Join us to create the giants in the industry
At Derivco, we believe that our people are not just employees – they are the heart and soul of our business. It's their skills, expertise, and passion that define who we are and drive us towards excellence every day.
We empower our people to think creatively, push boundaries, and take ownership of their work. We don't just want employees; we want innovators and difference-makers who are ready to make an impact.
Culture is at the core of everything we do. We create an environment where our people can thrive, grow, and unleash their full potential. We provide the right tools, support, and opportunities for personal and professional development.
Role Overview: Lead the observability and site reliability engineering processes to ensure high availability and reliability of live systems while coordinating with development, operations, and support teams to enhance system performance and address operational issues. Drive innovation and continuous improvement by implementing best practices for system optimization, leveraging technological advancements, and managing budget and projects to align with organizational goals and achieve KPIs.
Key Responsibilities:
Develop and execute a comprehensive site reliability strategy aligned with the organization's overall business objectives.
Implement and manage best practices for system performance optimization, capacity planning, and disaster recovery.
Coordinate with development, operations, and support teams to improve system reliability and address operational issues.
Facilitate communication between regional teams and senior management to align SRE efforts with organizational goals.
Oversee global observability efforts, ensuring comprehensive monitoring and visibility into system health and performance.
Monitor and evaluate reliability & monitoring, identifying areas for improvement and implementing solutions to address inefficiencies and risk mitigation.
Drive innovation by leveraging technological advancements and industry trends to maintain industry leading uptime.
Manage team budget to align with strategy, ensuring financial feasibility across projects. Contribute to the development of departmental budget.
Manage projects to ensure they adhere to estimated costs and provide input into reducing costs and optimizing software.
Lead and drive a management team with a focus on strategic delivery, setting clear direction and transparent expectations, to exhibit the culture and values of the business and achieve / exceed KPIs.
Drive continuous improvement initiatives to enhance IT service quality, including incident management, problem resolution, and change management processes.
Key Qualifications and Experience:
8-12 years with 8 in site reliability engineering or observability / monitoring, & at least 3 years in a leadership role.
Advanced understanding of job-related concepts and specialization in some areas and/or Bachelor degree Computer Science, Information Technology Business Management, or a related field. Relevant certifications (e.g., Google Cloud Professional SRE, AWS Certified Solutions Architect) preferred.
Software Development Life Cycle (SDLC); Software Engineering Methodologies and Frameworks; Business Process Improvement; Change Management; Excellent Problem-Solving Skills; Technology Support; Service Level Management.
Thorough understanding of business domain, business strategy and long-term objectives; Customer Portfolio; Product Portfolio; Customer Agreements; SLA; OLA.
Derivco is an equal opportunities employer. We value people as individuals with diverse opinions, cultures, lifestyles and circumstances and we are committed to equality of opportunity and to providing a productive working environment free from unfair and unlawful discrimination.
#J-18808-Ljbffr