We are looking for an experienced and highly skilled Data Engineer with strong background in building and managing robust data pipelines, working with large-scale datasets, and ensuring data availability for analytical and operational purposes.

Responsibilities:

design, develop, and maintain scalable and reliable data pipelines to handle large volumes of structured and unstructured data;
work with cross-functional teams to understand data needs and deliver efficient data solutions;
implement and optimize ETL (Extract, Transform, Load) processes for data integration from various sources;
build and maintain data architectures, ensuring that data is accessible, accurate, and up-to-date for the business and analytics teams;
develop data models and support data warehousing solutions to ensure efficient data storage and retrieval;
collaborate with Data Scientists, Analysts, and other teams to facilitate data analysis and decision-making processes;
continuously monitor, test, and improve data systems to ensure they meet business requirements and perform at scale;
ensure data quality and security, implementing best practices for data governance.
document data systems and workflows for transparency and operational continuity.

Requirements:

5+ years of professional experience in Data Engineering or similar roles;
strong proficiency in SQL, Python, and other data engineering tools;
experience with cloud platforms (Azure, AWS, GCP) and relevant tools for data processing and storage;
extensive experience with building and maintaining ETL pipelines using tools like Apache Airflow, Talend, or similar;
strong understanding of data modeling, data warehousing, and big data technologies (e.g., Hadoop, Spark, Databricks);
experience with data lake architectures and storage solutions;
familiarity with containerization tools (e.g., Docker, Kubernetes);
experience working in agile environments with CI/CD processes;
ability to troubleshoot and optimize data systems for maximum performance;
strong problem-solving skills and attention to detail;
English proficiency at B2 level or higher.
Tech Stack:

Languages: Python (pandas, numpy, pySpark), SQL
Platforms: Azure, AWS, GCP, Databricks, Hadoop
Tools: Apache Airflow, Talend, Spark, Kafka, Docker, Kubernetes
Methods: ETL, Data Warehousing, Data Lake architecture
Processes: CI/CD, Agile/Scrum

Nice to also have:

Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
Experience with Machine Learning infrastructure or MLOps pipelines.
Background in distributed systems or microservices architecture.

Conditions:

Official employment under the Labor Code of Kazakhstan
Remote format with an option to work from the office
International projects and team