Description
Client based in Sandton is hiring! We are in search of a Senior Machine Learning Operations Engineer (ML Ops Engineer) to join the Private Bank Technical Business Intelligence Team. The successful candidate will be responsible for deploying, maintaining, and monitoring machine learning models. We are looking for someone with a background in cloud infrastructure, Kubernetes, deployment pipelines, and a deep understanding of machine learning.
Responsibilities and Skills Include:
Deliver strategic goals and business objectives
Maintaining platform stability
Design and build solutions focused on efficiency
Strong team dynamics, people skills and relationship/network building
Ensuring the strategy and teamwork within the principles and practices of MLOps and engineering as defined by group engineering and best practices
Solid grasp of DevOps/SRE methodologies and practices
Provide technical guidance and support throughout the release process, including strong troubleshooting abilities across the platform and channel
Strong design and solutioning experience, across multiple technologies and understanding of Cloud DevOps services and hosting
Git and CI/CD understanding
Cloud native, hybrid cloud, and on-prem design principle understanding
Developing and maintaining deployment pipelines for machine learning models on Microsoft Azure
Monitoring and optimizing the performance of machine learning models in production
Collaborating with data scientists for seamless deployment of models
Ensuring high availability and reliability of the machine learning infrastructure on Microsoft Azure
Providing technical support for machine learning models in production
Conducting regular security assessments and ensuring compliance with industry standards and best practices
Keeping up-to-date with new Azure ML offerings and technologies to continuously improve our ML ops processes
Requirements:
Minimum BSc Computer Science, Engineering, or related field
At least 5 years of experience in ML Operations or a similar role
Extensive experience with Microsoft Azure, Azure pipelines, Functions, and ML offerings
Knowledge of containerization technologies (Docker, Kubernetes, Rancher)
Strong programming skills in Python, FastAPI, Redis, and SQL
Strong understanding of Software Engineering concepts
Strong experience writing unit tests
Knowledge of machine learning frameworks such as TensorFlow, PyTorch, etc.
Experience with monitoring and logging tools (e.g. Grafana, Kibana, etc.)
Excellent problem-solving skills and attention to detail
Knowledge on design patterns
#J-18808-Ljbffr