JOB PURPOSE
The Senior Incident Response Manager is responsible for overseeing the coordination and execution of the organization's incident response processes, ensuring the timely and efficient resolution of major incidents. This role involves developing strategies for handling incidents, coordinating with key stakeholders, and maintaining operational stability. The individual will be expected to lead high-impact situations, improve incident management processes, and work with cross-functional teams to resolve complex technical issues.
RESPONSIBILITIES
Incident Management & Responsibilities
Lead the incident response team in identifying, diagnosing, and resolving high-severity incidents in a timely manner
Manage the entire lifecycle of incidents from detection to post-incident review, ensuring appropriate measures are taken to prevent future occurrences
Coordinate with internal departments, third-party vendors, and external partners during major incidents
Escalate unresolved issues to senior management and drive the resolution process
Ensure incidents are identified promptly through proactive monitoring or incident reports
Assign priority levels to incidents based on business impact, urgency, and severity
Lead the incident resolution process from detection through closure, maintaining oversight and ensuring swift action
Implement corrective actions to prevent recurrence and minimize downtime
Deliver timely resolution of incidents within established SLAs (Service Level Agreements)
Ensure all incidents are properly logged, tracked, and documented in the ticketing system
Ensure communication of status updates to stakeholders throughout the incident lifecycle
Incident Reporting & Documentation
Ensure accurate and comprehensive documentation of all incidents, including root cause analysis, impact analysis, and post-incident reviews
Develop detailed incident reports for management, outlining incident resolution timelines, mitigation actions, and lessons learned
Develop incident reports that capture the full lifecycle of each incident, from detection to resolution
Perform in-depth root cause analysis for major incidents, identifying both technical and process-related factors
Present incident reports to senior leadership, including potential risks and suggestions for process improvements
Establish metrics (MTTR, number of incidents, etc.) for tracking the performance of the incident response process
Ensure timely and accurate reporting on incidents for internal and external stakeholders
Deliver detailed reports that can be used to improve system resilience and incident management
Maintain transparency with senior leadership on incident impacts and outcomes
Response Coordination
Serve as the central point of contact for all incident response activities, coordinating between various teams (e.g., IT, DevOps, Security, Vendors)
Lead regular incident response meetings, war rooms, or conference calls to facilitate real-time problem-solving
Ensure that the right stakeholders are involved based on the nature of the incident (e.g., business continuity, legal, communications)
Escalate incidents to senior management when critical thresholds are met or breached
Ensure seamless coordination across teams, minimizing delays in incident resolution
Ensure accurate escalation paths are followed based on the severity of the incident
Keep stakeholders informed with real-time updates, ensuring they understand the potential impact and mitigation efforts
Emergency Response
Serve as the primary leader during high-severity incidents or business-impacting crises
Mobilize and direct resources rapidly to respond to incidents, minimizing downtime and impact on operations
Ensure contingency plans are activated and business continuity protocols are followed in extreme cases
Communicate incident status and resolution plans with clarity, particularly when operations are significantly impacted
Ensure that response to critical incidents is swift, minimizing impact on customers and operations
Ensure that emergency protocols are followed accurately, minimizing operational risks
Maintain a high state of readiness within the incident response team for major incidents or crises
Continuous Improvement
Analyse past incidents to identify trends, root causes, and recurring issues
Lead post-incident reviews (PIRs) and after-action meetings to gather insights and lessons learned
Propose and implement improvements to incident response processes, tools, and methodologies based on feedback from PIRs
Stay updated on new incident management frameworks, tools, and practices and introduce relevant innovations into the team's workflows
Ensure a measurable reduction in the recurrence of incidents over time
Deliver clear, actionable recommendations from post-incident reviews to prevent similar incidents
Keep the incident response process current with industry best practices and technological advancements
Cross-functional Collaboration
Work closely with IT operations, network teams, security teams, and software development to ensure cohesive incident response
Collaborate with external vendors and service providers as needed for incident resolution
Maintain close relationships with business leaders to align incident response priorities with overall business objectives
Facilitate communication between technical and non-technical stakeholders, ensuring clear understanding of incidents
Ensure alignment of incident response activities with business priorities, ensuring minimal disruption to core operations
Maintain effective communication and collaboration between internal teams and external partners
Ensure stakeholder expectations are managed, and clear, concise updates are provided
Policy & Procedure Development
Develop and maintain incident management policies, procedures, and runbooks based on industry best practices (e.g., ITIL, NIST)
Ensure that all documentation is up to date, covering standard operating procedures for various incident types
Work with compliance and security teams to ensure incident management practices align with regulatory and legal requirements
Regularly review and update policies to incorporate new risks, tools, and business requirements
Ensure that incident response documentation is clear, actionable, and followed during incidents
Maintain compliance with regulatory requirements related to incident management
Ensure that all team members are familiar with and adhere to incident management policies
Process Improvement & Strategy Development
Continuously review and improve the incident management processes, frameworks, and protocols to enhance operational efficiency
Develop and maintain incident response plans, ensuring the organization is prepared to address critical incidents quickly and effectively
Collaborate with service delivery and IT operations teams to ensure alignment between incident management and overall business objectives
Team Leadership & Mentorship
Lead and mentor a team of Incident Response Specialists, ensuring professional development and technical proficiency
Provide guidance and training to team members, promoting best practices in incident management and technical troubleshooting
Ensure team performance aligns with the business objectives and targets
Maintain high levels of team morale and engagement, fostering a collaborative and accountable work environment
Identify skills gaps within the team and facilitate upskilling initiatives
Stakeholder Engagement
Serve as the primary point of contact for incident escalation and resolution, communicating with internal and external stakeholders to ensure they are informed throughout the incident lifecycle
Maintain strong working relationships with cross-functional teams including Service Delivery, IT Infrastructure, Application Support, and third-party vendors
Technology & Tools Management
Ensure the appropriate tools, systems, and resources are in place for effective incident detection, tracking, and resolution
Stay updated on emerging technologies and tools relevant to incident management and recommend improvements to the organization's incident response capabilities
BEHAVIOURAL COMPETENCIES
Tech Savvy
Customer-focused
Evaluating problems
Investigate issues
Information seeking
Processing details and information
Communicating information
Showing resilience
Adjusting to change
Learning ability
Teamwork
Business knowledge and approach
Instils Trust
Plans and Aligns
EDUCATION
Matric
Bachelor's degree in Computer Science, Information Technology, or a related field. Advanced certifications in Incident Management or IT Service Management (e.g., ITIL, CISSP) are a plus.\
Strong Microsoft Office productivity tools knowledge
EXPERIENCE
Minimum of 10 years of experience in a similar role
Experience in understanding the Technology systems and processes
Experience in creating Incident processes within SLAs
Extensive experience in Service management function
Strong stakeholder management experience
#J-18808-Ljbffr
*Please note that the hiring team responsible for this position will be using the PikUniq platform for candidate screening and conducting one-way interviews....
Pikuniq - Gauteng
Published a month ago
About the Role: We are seeking a seasoned SAP Cloud Platform Integration Consultant to join our dynamic team in the financial services sector. This is a cri...
Adept Digital Advisory (Pty) Ltd - Gauteng
Published a month ago
LOCATION: 100% Remote (with occasional on-site PI planning in Johannesburg) We are seeking a highly skilled and motivated Intermediate Full Stack Developer ...
Baec Specialists - Gauteng
Published a month ago
Sandton (Office Based (no remote/hybrid)) Our company is seeking a Firmware Developer with experience and a keen interest in IoT Technology. We are faced wi...
Baec Specialists - Gauteng
Published a month ago
Built at: 2024-11-15T10:15:49.353Z