Vyacheslav Kapitonov

Summary

Data Engineer with 4+ years of experience in building scalable ETL pipelines, data quality frameworks, and modern DWH architectures. Proficient in Python, SQL, and dbt, with hands-on experience in Apache Airflow, Spark, and DevOps practices (CI/CD, Kubernetes). Experienced in designing reliable data architectures and optimizing query performance for high-load environments.

Work Experience

Samokat | Data Engineer — Quick Commerce & Retail Tech (Top 1)

05.2024 – Current

Joined a newly formed Data Quality team and architected a configuration-driven ecosystem to ensure platform-wide data integrity:
- Developed a Python/SQL framework for business logic validation (replacing Great Expectations).
- Built a scalable Spark-based reconciliation service for detecting discrepancies between event streams and S3 storage.
- Engineered a dynamic DAG generator in Airflow: users provide SQL and YAML configs which are parsed into ClickHouse; Airflow continuously polls ClickHouse to auto-create, update, and schedule pipelines. Successfully scaled to orchestrate over 2,000+ active data quality checks across the platform.
Managed OpenMetadata infrastructure on Kubernetes (deployed via ArgoCD), customizing Helm charts and automating metric ingestion.
Established CI/CD pipelines using GitLab CI, Tox, and Pre-commit hooks; implemented database version control via Liquibase.
Pioneered GenAI initiatives by developing the first AI agent prototype using LangGraph.

QIWI | Data Engineer — FinTech & Payment Systems

05.2023 – 05.2024

Developed multi-gigabyte data marts for a team of 4 Data Scientists using a Spark-based internal ETL platform within the Hadoop ecosystem, accelerating the training and feature engineering for ~10 ML models.
Extended ETL capabilities by writing custom UDFs in Scala and optimizing complex SQL logic for high-load processing.
Maintained the backend of a real-time scoring service (PostgreSQL) and orchestrated regular data workflows using Apache Airflow.

CIT (Gosuslugi) | Data Engineer — GovTech & Digital Services

11.2021 – 04.2023

Led the data migration of 4 regional services to the SMEV4 standard, managing metadata registries and ensuring continuous data integrity.
Designed an end-to-end reporting system: formed API requirements, built Python/SQL ETL pipelines, and created an inter-departmental dashboard to monitor and track the percentage of overdue citizen applications.
Deployed and administered a self-hosted Apache Airflow instance on a dedicated server.

Summary

Work Experience

Skills

Education