As a Data Engineer, you'll be at the heart of transforming complex, unstructured data into actionable insights. You'll be responsible for implementing process improvements, enhancing data systems, and ensuring the integrity and scalability of our data infrastructure. If you're excited to work with large data sets, optimize ETL processes, and ensure data quality, this is the perfect opportunity for you Responsibilities: Process Optimization: Suggest and implement efficiencies to automate manual processes and optimize data delivery for better scalability. System Enhancements: Develop and implement new features and enhancements across our data systems, improving the CI/CD process for optimal data pipelines. Data Pipeline Development: Assemble and manage complex data sets to meet both functional and non-functional business requirements. ETL Processes: Lead the design, development, and optimization of efficient ETL pipelines. Testing & Consistency: Develop and execute unit tests to ensure data consistency and reliability across pipelines. Monitoring & Reporting: Implement and maintain automated monitoring solutions to support reporting and analytics infrastructure. Data Governance & Quality: Ensure high standards of data governance, quality assurance, and the maintenance of data infrastructure systems (AWS, databases, security). Metadata & Documentation: Maintain data warehouse and data lake metadata, data catalogs, and user documentation for internal business users. Best Practices: Enforce best practices across our database systems (e.g., collations, indices, database engines) to ensure performance and reliability. Technologies: Programming Languages: Python, SQL Data Engineering: PySpark, Airflow, Terraform Cloud Platforms: AWS (primary), Azure, GCP Containerization: Docker Version Control & CI/CD: GitHub, Jenkins, etc. Qualifications and Experience: Educational Background: BSc Degree in Computer Science, Information Systems, Engineering, or a related technical field (or equivalent work experience). Experience: 5 years of related work experience, including at least 2 years of experience in building and optimizing 'big data' data pipelines and maintaining large data sets. Technical Proficiency: Strong experience with Python and SQL (PostgreSQL, MS SQL). Hands-on experience with cloud services, particularly AWS (experience with Azure or GCP is a plus). Proficiency in ETL processes and data lifecycle management . Experience with Glue and PySpark is highly desirable. Skilled in version control , CI/CD pipelines , and GitHub . Data Management Expertise: Deep understanding of data quality assurance, data governance, and best practices for managing data transformation processes, metadata, and dependencies. Analytical Skills: Strong problem-solving abilities and experience working with unstructured datasets. Big Data: Familiarity with message queuing , stream processing , and working with highly scalable 'big data' stores. Attention to Detail: A meticulous eye for detail and a commitment to maintaining high-quality data systems. Communication: Strong communication skills and the ability to work collaboratively in a team. Apply now