Product How It Works Services Work Blog About Contact
Back to Blog Data Engineering

Building a Data Migration Framework with Python and Airflow

January 7, 2026 10 min read Customer City Engineering

Salesforce org consolidations are notoriously complex. You’re not just moving records—you’re reconciling data models, deduplicating across systems, maintaining relationships, and doing it all while the business operates.

We recently completed a migration for a major food delivery platform consolidating four Salesforce orgs. Here’s the framework we built.

The Architecture

Our migration framework follows a classic ETL pattern, but with some key enhancements for reliability and auditability.

1. Extraction Layer

We use Salesforce Bulk 2.0 API via simple-salesforce. The key insight: treat extraction as streaming, not batch.

from simple_salesforce import Salesforce

def extract_object(sf, object_name, query):
    """Extract with pagination and chunked writes"""
    bulk = sf.bulk2[object_name]
    job = bulk.query(query)

    for chunk in job.get_results():
        df = pd.DataFrame(chunk)
        write_to_s3(df, f"raw/{object_name}/{chunk_id}.parquet")
        yield len(chunk)

2. Transformation Layer

Transformations run in Airflow DAGs with:

3. Load Layer

We use a staged approach: first to a staging org for validation, then to production with rollback capability.

Key Patterns

Change Data Capture (CDC)

For ongoing sync during the migration window, we implemented CDC using Salesforce Platform Events:

@task
def process_cdc_events():
    events = sf.query("SELECT ... FROM AccountChangeEvent")
    for event in events:
        update_staging_record(event)

Data Quality Profiling

Before migration, we profile every object:

Results


Planning a Salesforce consolidation? Let’s talk about your requirements.

Have questions about this topic? Let's discuss your project.

Get in touch