Vyacheslav Kapitonov

Data Engineer
Yerevan, Armenia (Open to Relocation) | vyach.kapitonov@gmail.com
LinkedIn | tg: @vyacheslav_kapitonov

Summary

Data Engineer with 4+ years of experience in building scalable ETL pipelines, data quality frameworks, and modern DWH architectures. Proficient in Python, SQL, and dbt, with hands-on experience in Apache Airflow, Spark, and DevOps practices (CI/CD, Kubernetes). Experienced in designing reliable data architectures and optimizing query performance for high-load environments.

Work Experience

Samokat | Data Engineer — Quick Commerce & Retail Tech (Top 1)
05.2024 – Current
  • Joined a newly formed Data Quality team and architected a configuration-driven ecosystem to ensure platform-wide data integrity:
    • Developed a Python/SQL framework for business logic validation (replacing Great Expectations).
    • Built a scalable Spark-based reconciliation service for detecting discrepancies between event streams and S3 storage.
    • Engineered a dynamic DAG generator in Airflow: users provide SQL and YAML configs which are parsed into ClickHouse; Airflow continuously polls ClickHouse to auto-create, update, and schedule pipelines. Successfully scaled to orchestrate over 2,000+ active data quality checks across the platform.
  • Managed OpenMetadata infrastructure on Kubernetes (deployed via ArgoCD), customizing Helm charts and automating metric ingestion.
  • Established CI/CD pipelines using GitLab CI, Tox, and Pre-commit hooks; implemented database version control via Liquibase.
  • Pioneered GenAI initiatives by developing the first AI agent prototype using LangGraph.
QIWI | Data Engineer — FinTech & Payment Systems
05.2023 – 05.2024
  • Developed multi-gigabyte data marts for a team of 4 Data Scientists using a Spark-based internal ETL platform within the Hadoop ecosystem, accelerating the training and feature engineering for ~10 ML models.
  • Extended ETL capabilities by writing custom UDFs in Scala and optimizing complex SQL logic for high-load processing.
  • Maintained the backend of a real-time scoring service (PostgreSQL) and orchestrated regular data workflows using Apache Airflow.
CIT (Gosuslugi) | Data Engineer — GovTech & Digital Services
11.2021 – 04.2023
  • Led the data migration of 4 regional services to the SMEV4 standard, managing metadata registries and ensuring continuous data integrity.
  • Designed an end-to-end reporting system: formed API requirements, built Python/SQL ETL pipelines, and created an inter-departmental dashboard to monitor and track the percentage of overdue citizen applications.
  • Deployed and administered a self-hosted Apache Airflow instance on a dedicated server.

Skills

Programming: Python, SQL.
Big Data & ETL: Apache Airflow, Apache Spark, Hadoop (HDFS), dbt, Pandas.
DevOps & Infrastructure: Docker, Kubernetes, ArgoCD, Helm, GitLab CI/CD, Linux, Liquibase.
Other: OpenMetadata, LangGraph (GenAI), Data Quality methodologies.

Education

Tula State University | Bachelor’s Degree
09.2015 – 07.2019
  • Faculty: Control Systems and Navigation