Why this matters
Real platforms span many domains: sales, support, marketing, billing, warehouse, devices, and more. As a Data Architect, you model how these domains connect so that analytics, applications, and machine learning can use consistent, trustworthy data.
- Unify customers, products, and events across systems to power a customer 360 and reliable KPIs.
- Define canonical entities and data contracts so domains can evolve independently without breaking others.
- Reduce duplication by modeling shared concepts once, with clear ownership and mappings.
- Enable governance (lineage, privacy, consent) by capturing provenance and agreements between domains.
Concept explained simply
Cross-domain integration modeling defines shared concepts (like Customer, Product, Order) so different domains can exchange and combine data without confusion. At the conceptual level, you focus on business meaning and relationships. At the logical level, you define attributes, keys, mappings, constraints, and change-tracking rules.
- Canonical entities: neutral definitions understood by all domains.
- Identifier strategy: global surrogate keys plus mappings to each domain's local IDs.
- Reference data harmonization: mapping lists like status, country, currency, units.
- Event vs state: model both transactional events and current snapshots with timestamps.
- Lineage and provenance: record source, version, and transformation context.
Mental model: a hub of shared concepts with spokes to domains
Imagine a small hub of shared concepts (Customer, Product, Interaction) with spokes going to each domain (Sales, Support, Marketing). The hub is stable and versioned. Each spoke maps local fields and codes to the hub. When a domain changes, only its spoke mapping updates—keeping the hub stable.
Worked examples
Example 1: Customer 360 across Sales, Support, Marketing
Conceptual view:
- Customer — interacts via channels.
- Interaction — call, email, chat, campaign touch.
- Consent — per purpose and jurisdiction.
Logical highlights:
- Customer(global_customer_id, full_name, primary_email, created_at, source_of_truth)
- CustomerIdMap(global_customer_id, crm_contact_id, support_user_id, mkt_subscriber_id, first_seen_at)
- Interaction(interaction_id, global_customer_id, channel, type, occurred_at, source_domain)
- Consent(global_customer_id, purpose, status, effective_from, effective_to)
Mapping choices
- Global key: generate global_customer_id; keep all local IDs in CustomerIdMap.
- Email conflicts: pick precedence (Support over Marketing) and record source_of_truth.
- Channel/type enums: maintain mapping table, e.g., mkt:click -> Interaction.type=CampaignClick.
- History: SCD2 on Customer for privacy-sensitive fields; Consent has effective ranges.
Example 2: Product and Inventory across Commerce and Warehouse
Conceptual view:
- Product — has Variants — held at Locations with quantities.
Logical highlights:
- Product(product_id, name, brand, category)
- ProductVariant(variant_id, product_id, sku, attributes, status)
- InventoryPosition(position_id, variant_id, location_id, uom, quantity, as_of)
- CodeMap(product_id, sku, warehouse_item_code, effective_from, effective_to)
Mapping choices
- SKU vs item_code: tie both to variant_id through CodeMap.
- UoM: normalize to a canonical UoM; keep original_uom for audit.
- Price: model separately with effective ranges; do not embed in Product.
Example 3: Device telemetry and Maintenance work orders
Conceptual view:
- Device — emits Telemetry — triggers WorkOrder.
Logical highlights:
- Device(device_id, serial_number, model, owner)
- Telemetry(event_id, device_id, metric_name, metric_value, unit, recorded_at)
- WorkOrder(work_order_id, device_id, opened_at, closed_at, cause_code, resolution)
- EnumMap(domain, code, canonical_code, description)
Mapping choices
- Device identity: serial_number may not be unique; assign device_id and map serial_number with history.
- Causal linkage: keep Telemetry.event_id that triggered a WorkOrder; do not denormalize raw metrics into WorkOrder.
- Enums: harmonize cause_code across vendors via EnumMap.
Step-by-step approach
- Define scope and outcomes. Which questions must cross domains answer? E.g., LTV by acquisition channel requires Customer, Orders, Marketing Touches.
- List shared concepts and owners. Identify canonical entities and which domain is source-of-truth for each attribute.
- Design identifier strategy. Choose global surrogate keys; create mapping tables to local IDs with timestamps.
- Model relationships and cardinalities. Resolve many-to-many via bridge entities; decide on optionality.
- Harmonize reference data. Build mapping tables for codes, units, currencies; define default and fallback rules.
- Plan change tracking. Decide SCD type per attribute, event timestamps, and effective dating.
- Specify a data contract. Define schemas, allowed values, nullability, SLAs, versioning, and deprecation policy.
- Validate with examples. Run sample records end-to-end; check for collisions and lost semantics.
Mini task: write a 90-second data contract note
Draft 3 bullets: schema name and version, delivery SLA, and list of breaking vs non-breaking change examples.
Practice exercises
Try these in a text editor or whiteboard. Compare with the solutions only after attempting.
Exercise 1: Harmonize Customer across Sales, Support, Marketing
Given three domain snippets:
- Sales.Customer(contact_id, account_id, email, created_at)
- Support.User(user_id, email, phone, gdpr_flag, country)
- Marketing.Subscriber(subscriber_id, email, consent_status, source)
Task: Propose a conceptual and logical model that creates a canonical Customer, maps IDs, aligns consent, and handles conflicting emails. Include SCD strategy and precedence rules.
- Checklist:
- Global key defined and mapped to all local IDs
- Email precedence with provenance
- Consent modeled with purpose and effective dates
- Interaction types enumerated or mapped
Exercise 2: Integrate Product and Inventory
Given:
- Commerce.Product(sku, variant_id, title, price_amount, price_currency, category)
- Warehouse.Item(item_code, uom, quantity_on_hand, location_id)
Task: Design a canonical Product/Variant and InventoryPosition with code mappings, UoM harmonization, and time variance for price and inventory snapshots.
- Checklist:
- Variant-level identity chosen
- Mapping table between sku and item_code
- Canonical UoM decision and conversion note
- Price and inventory modeled as time-variant
Common mistakes and self-checks
- Collapsing many-to-many into one-to-many and losing valid relationships.
- Reusing a domain's local ID as a global ID, causing collisions.
- Forgetting provenance: not storing which source set a value.
- Ignoring change history for attributes that evolve (e.g., consent, address).
- Overloading canonical entities with domain-specific fields; reduce to shared meaning.
Self-check prompts
- Can I explain each relationship's cardinality using a real scenario?
- Do I know which domain is the source-of-truth per attribute?
- If a domain changes an enum, do I know what breaks and how it is versioned?
- Can I trace any canonical record back to exact source IDs and timestamps?
Practical projects
- Customer 360 MVP:
- Deliver a canonical Customer and Interaction model with ID mapping and consent
- Demonstrate dedup of at least 3 conflicting records with provenance
- Product and Price Hub:
- Provide Variant, Price (effective-dated), and InventoryPosition
- Show UoM and currency harmonization with conversion notes
- Telemetry-to-WorkOrder Linkage:
- Correlate telemetry events to maintenance work orders with reason codes
- Document enum mappings and data contract
Who this is for
- Data Architects designing shared models
- Data Engineers implementing pipelines across domains
- Analytics and Platform Leads aligning metrics and entities
Prerequisites
- Basic ER modeling (entities, relationships, cardinalities)
- Understanding of keys (natural, surrogate) and SCD patterns
- Familiarity with domain-driven concepts (bounded contexts)
Learning path
- Clarify business outcomes and shared concepts
- Draft conceptual canonical model
- Add logical detail: keys, attributes, constraints
- Define mappings: IDs, enums, reference data
- Specify data contracts and versioning rules
- Validate with worked examples and tests
Next steps
Take the quick test to confirm your understanding. Everyone can access the test; only logged-in users will have saved progress.
Quick Test
Answer the questions below. If unsure, revisit the worked examples and exercises before submitting.
Mini challenge
Design a minimal canonical model that lets Finance calculate gross margin by marketing campaign and product variant over time. List the entities, keys, and at least two mappings you must have.
Possible direction
Entities: Customer, ProductVariant, OrderLine, CampaignTouch, Price, Cost. Keys: global_customer_id, variant_id, order_line_id. Mappings: campaign codes across tools; SKU to variant_id. Ensure effective-dated Price and Cost to time-align margin.