SSIS Schema Alignment Pipeline - Construction Guide

Last updated on Jun 25, 2025

1. SSIS Functionality Overview

What is SSIS?

SQL Server Integration Services (SSIS) is Microsoft's platform for building enterprise-level data integration and data transformation solutions. SSIS provides:

Data Flow Tasks: Move and transform data between sources and destinations
Control Flow: Orchestrate the execution of tasks and containers
Script Components: Custom .NET code for complex transformations
Built-in Transformations: Standard operations like sorting, aggregating, and data conversion

Script Component Types

SSIS Script Components can function as:

Source: Generate data from custom logic or external sources
Transformation: Modify data as it flows through the pipeline
Destination: Send data to custom targets

Our implementation uses a Script Source Component to extract data from SQL with complex business logic and type-safe conversions.

2. Business Goal and The Underscore Challenge

Business Objective

Create a robust ETL pipeline that achieves schema alignment between two different data processing approaches:

Script Pipeline: Custom C# logic with type-safe conversions
SQL Pipeline: Direct SQL query execution

Target: Achieve >95% data type compatibility and 100% column name alignment.

The Underscore Problem

Core Issue: SQL Server allows underscores in column names (l_Loan_Amount, p_County) but C# property names in SSIS Script Components cannot contain underscores. This is a well-documented limitation in SSIS Script Components.¹

Business Impact:

Breaks dynamic column generation
Requires manual configuration for all underscored columns
Creates naming mismatches between pipeline stages
Complicates maintenance when schema changes

Example of the Problem:

SQL Column Name:        l_Loan_Number_Id
C# Property Name:       lLoanNumberId  (underscores automatically removed)
Destination Expected:   l_Loan_Number_Id  (must restore underscores)

This creates a 3-stage mapping challenge that requires careful orchestration.

3. Our Pipeline Architecture

High-Level Data Flow

SQL File Query → Script Component → Derived Column Transform → OLE DB Destination
     ↓                ↓                    ↓                        ↓
Underscored       Non-underscored     Underscored            Database Table
Column Names      Property Names      Column Names           with Underscores

Component Responsibilities

Script Component (Source)

Input: SQL query from external file (pcl_update.sql)
Processing: Type-safe data conversion with strictness levels
Output: Non-underscored column names (C# compatible)
Handles: NULL values, data type conversions, business logic

Derived Column Transformation

Input: Non-underscored columns from Script Component
Processing: Manual column renaming to restore underscores
Output: Original SQL column names with underscores
Purpose: Bridge the naming gap between C# and SQL

OLE DB Destination

Input: Properly named columns with underscores
Processing: Bulk insert with optimal performance settings
Output: Target database table (e.g., pclinfo_nerevu)
Goal: Match schema of reference table (e.g., pclinfo)

Why This Architecture?

Alternative Approaches Considered:

Dynamic Column Generation: Failed due to underscore naming conflicts
Direct SQL-to-SQL: Couldn't implement complex business logic and type safety
Multiple Transformations: More complex than our 3-component solution

Our Solution Benefits:

Predictable: Manual configuration is tedious but reliable
Type-Safe: Custom C# logic ensures data quality
Maintainable: Clear separation of concerns
Performant: Optimized for bulk operations

4. Step-by-Step Pipeline Construction

Phase 1: Script Component Configuration

Step 1.1: Add Script Component

In SSIS Data Flow, drag Script Component from toolbox
Select "Source" when prompted for component type
Name it descriptively (e.g., "PCL Data Extraction Source")

Step 1.2: Configure Connection Manager

In Script Transformation Editor → Connection Managers tab
Click Add → Select your OLE DB Connection Manager
Name: Connection (this becomes the property name in C#)

Step 1.3: Manual Output Column Configuration

Critical Step: Must be done manually for every underscored SQL column.

In Script Transformation Editor → Inputs and Outputs → Output 0 → Output Columns:

For each SQL column containing underscores:

Click Add Column
Name: Non-underscored version (e.g., lLoanNumberId for l_Loan_Number_Id)
Data Type: Match SQL source exactly
Length: Use conservative sizing (512 for strings, appropriate precision for numerics)

Example Configuration Table:

SQL Column Name	Output Column Name	Data Type	Length	Notes
`l_Loan_Number_Id`	`lLoanNumberId`	DT_STR	512	String ID field
`program_program_ID`	`programprogramID`	DT_STR	512	String ID field
`p_County`	`pCounty`	DT_STR	512	Geographic field
`l_Loan_Amount`	`lLoanAmount`	DT_STR	50	Money as string
`DatesPivot_ApplicationDate`	`DatesPivotApplicationDate`	DT_DBTIMESTAMP	-	DateTime field

Important: This manual configuration is why dynamic column generation fails - you can't predict the exact property names that C# will generate.

Phase 2: Derived Column Transformation Setup

Step 2.1: Add Derived Column Component

Drag Derived Column Transformation from toolbox
Connect blue arrow from Script Component to Derived Column
Name it "Restore Underscore Names"

Step 2.2: Configure Column Mappings

In Derived Column Transformation Editor:

For each non-underscored column that needs underscore restoration:

Expression: Drag the non-underscored column from Available Input Columns
Derived Column Name: Type the underscored SQL version
Derived Column: Select "Replace 'ColumnName'"

Example Configuration:

Expression: lLoanNumberId          → Derived Column Name: l_Loan_Number_Id
Expression: programprogramID       → Derived Column Name: program_program_ID  
Expression: pCounty               → Derived Column Name: p_County
Expression: lLoanAmount           → Derived Column Name: l_Loan_Amount

Fast Configuration Technique:

Type first few characters, SSIS will auto-complete
Use Ctrl+C/Ctrl+V to copy expressions between rows
Work systematically through all underscored columns

Phase 3: OLE DB Destination Configuration

Step 3.1: Add OLE DB Destination

Drag OLE DB Destination from toolbox
Connect blue arrow from Derived Column to OLE DB Destination
Configure connection manager for target database

Step 3.2: Table Generation (Recommended)

SSDT 2017 Feature: Auto-generate destination table with perfect schema:

In OLE DB Destination Editor → Connection Manager tab
Ensure correct database is selected
Click "New..." next to "Name of the table or the view"
SSDT generates CREATE TABLE statement matching your data flow
Recommended Settings:
- ✅ Keep identity
- ✅ Keep nulls
- ✅ Check constraints
- Data access mode: Table or view - fast load
- Maximum insert commit size: 2147483647

Step 3.3: Manual Column Mapping

In OLE DB Destination Editor → Mappings tab:

The Problem: Auto-mapping fails because of the column name transformation process.

Fast Manual Mapping Technique:

Click first <ignore> row in Input Column
Press Down Arrow key (moves to next unmapped row)
Press Enter key (maps to first available destination column)
Repeat: Down Arrow → Enter → Down Arrow → Enter...

Critical: Don't miss any sequences or you'll need to restart from where you missed an input!

Alternative Method:

Click dropdown for each <ignore> entry
Select matching destination column (usually first option)

Phase 4: Validation and Testing

Step 4.1: Execute Data Flow

Run the package in Development mode
Monitor row counts and any error messages
Check for data truncation warnings
Verify all columns are mapped correctly

Step 4.2: Compare Output Schema

The script output table (e.g., pclinfo_nerevu) should match the types and names of the reference table (e.g., pclinfo).

5. Schema Comparison Analysis Tools

SQL Analysis Query

Use this parameterized query to validate schema alignment:

-- SCHEMA COMPARISON QUERY - FIXED AND OPTIMIZED
-- Compares two tables using variables for their names and schema
DECLARE @Table1 NVARCHAR(128) = 'pclinfo';           -- Reference table
DECLARE @Table2 NVARCHAR(128) = 'pclinfo_nerevu';    -- Script output table
DECLARE @SchemaName NVARCHAR(128) = 'dbo';

-- Calculate total column count for percentage calculations
DECLARE @TotalColumnCount BIGINT;

WITH script_columns_count AS (
    SELECT 
        LOWER(COLUMN_NAME) as normalized_name,
        ROW_NUMBER() OVER (PARTITION BY LOWER(COLUMN_NAME) ORDER BY ORDINAL_POSITION) as rn
    FROM TEMP_ForPCL.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @Table1 AND TABLE_SCHEMA = @SchemaName
),
sql_columns_count AS (
    SELECT 
        LOWER(COLUMN_NAME) as normalized_name,
        ROW_NUMBER() OVER (PARTITION BY LOWER(COLUMN_NAME) ORDER BY ORDINAL_POSITION) as rn
    FROM TEMP_ForPCL.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @Table2 AND TABLE_SCHEMA = @SchemaName
)
SELECT @TotalColumnCount = COUNT(ISNULL(sc.normalized_name, sqlc.normalized_name))
FROM script_columns_count sc
FULL OUTER JOIN sql_columns_count sqlc ON sc.normalized_name = sqlc.normalized_name AND sc.rn = sqlc.rn;

-- Main comparison with difference strength scoring
WITH script_columns AS (
    SELECT 
        COLUMN_NAME as script_column_name,
        DATA_TYPE as script_data_type,
        CHARACTER_MAXIMUM_LENGTH as script_max_length, 
        NUMERIC_PRECISION as script_precision,
        LOWER(COLUMN_NAME) as normalized_name,
        ORDINAL_POSITION as script_ordinal,
        ROW_NUMBER() OVER (PARTITION BY LOWER(COLUMN_NAME) ORDER BY ORDINAL_POSITION) as rn
    FROM TEMP_ForPCL.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @Table1 AND TABLE_SCHEMA = @SchemaName
),
sql_columns AS (
    SELECT 
        COLUMN_NAME as sql_column_name,
        DATA_TYPE as sql_data_type,
        CHARACTER_MAXIMUM_LENGTH as sql_max_length, 
        NUMERIC_PRECISION as sql_precision,
        LOWER(COLUMN_NAME) as normalized_name,
        ORDINAL_POSITION as sql_ordinal,
        ROW_NUMBER() OVER (PARTITION BY LOWER(COLUMN_NAME) ORDER BY ORDINAL_POSITION) as rn
    FROM TEMP_ForPCL.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @Table2 AND TABLE_SCHEMA = @SchemaName
)
SELECT 
    -- Difference strength calculation (0 = perfect match, higher = more different)
    CASE 
        WHEN sc.script_column_name IS NULL OR sqlc.sql_column_name IS NULL THEN 99.0
        ELSE CAST(
            ((CASE WHEN sc.script_column_name <> sqlc.sql_column_name THEN 1.0 ELSE 0.0 END) +
             (CASE WHEN sc.script_data_type <> sqlc.sql_data_type 
                        AND NOT (sc.script_data_type IN ('float', 'decimal', 'numeric', 'money') 
                                 AND sqlc.sql_data_type IN ('float', 'decimal', 'numeric', 'money'))
                   THEN 2.0 ELSE 0.0 END) +
             (CASE WHEN ISNULL(sc.script_max_length, sc.script_precision) <> 
                        ISNULL(sqlc.sql_max_length, sqlc.sql_precision) 
                   THEN 0.5 ELSE 0.0 END)) 
             AS DECIMAL(5,3)) / 3.5
    END as Diff_Strength,
    
    -- Column details
    ISNULL(sc.script_column_name, '** MISSING **') as Reference_Column,
    ISNULL(sqlc.sql_column_name, '** MISSING **') as Script_Output_Column,
    
    -- Status indicators
    CASE 
        WHEN sc.script_column_name IS NULL THEN 'MISSING IN REFERENCE'
        WHEN sqlc.sql_column_name IS NULL THEN 'MISSING IN SCRIPT OUTPUT'
        WHEN sc.script_column_name = sqlc.sql_column_name THEN 'EXACT MATCH'
        ELSE 'NAME DIFFERENT'
    END as Name_Status,
    
    ISNULL(sc.script_data_type, 'N/A') as Reference_Data_Type,
    ISNULL(sqlc.sql_data_type, 'N/A') as Script_Output_Data_Type,
    
    CASE 
        WHEN sc.script_data_type = sqlc.sql_data_type THEN 'EXACT MATCH'
        WHEN (sc.script_data_type IN ('float', 'decimal', 'numeric', 'money') 
              AND sqlc.sql_data_type IN ('float', 'decimal', 'numeric', 'money')) THEN 'COMPATIBLE'
        ELSE 'TYPE DIFFERENT'
    END as Type_Status
    
FROM script_columns sc
FULL OUTER JOIN sql_columns sqlc ON sc.normalized_name = sqlc.normalized_name AND sc.rn = sqlc.rn
ORDER BY Diff_Strength DESC, ISNULL(sc.script_column_name, sqlc.sql_column_name);

-- Summary statistics
WITH comparison_summary AS (
    -- [Same CTEs as above for summary calculation]
    SELECT 
        CASE 
            WHEN sc.script_column_name = sqlc.sql_column_name THEN 'NAME MATCH'
            ELSE 'NAME DIFFERENT'
        END as name_status,
        CASE 
            WHEN sc.script_data_type = sqlc.sql_data_type THEN 'TYPE MATCH'
            WHEN (sc.script_data_type IN ('float', 'decimal', 'numeric', 'money') 
                  AND sqlc.sql_data_type IN ('float', 'decimal', 'numeric', 'money')) THEN 'TYPE COMPATIBLE'
            ELSE 'TYPE DIFFERENT'
        END as type_status
    FROM script_columns sc
    FULL OUTER JOIN sql_columns sqlc ON sc.normalized_name = sqlc.normalized_name AND sc.rn = sqlc.rn
)
SELECT 
    'PIPELINE VALIDATION SUMMARY' as Report_Section,
    SUM(CASE WHEN name_status = 'NAME MATCH' THEN 1 ELSE 0 END) as Name_Matches,
    SUM(CASE WHEN type_status IN ('TYPE MATCH', 'TYPE COMPATIBLE') THEN 1 ELSE 0 END) as Type_Compatible_Matches,
    COUNT(*) as Total_Columns,
    CONVERT(DECIMAL(10,2), 
        CAST(SUM(CASE WHEN name_status = 'NAME MATCH' THEN 1 ELSE 0 END) AS DECIMAL(38,10)) / 
        CAST(@TotalColumnCount AS DECIMAL(38,10)) * 100
    ) as Name_Match_Percent,
    CONVERT(DECIMAL(10,2), 
        CAST(SUM(CASE WHEN type_status IN ('TYPE MATCH', 'TYPE COMPATIBLE') THEN 1 ELSE 0 END) AS DECIMAL(38,10)) / 
        CAST(@TotalColumnCount AS DECIMAL(38,10)) * 100
    ) as Type_Compatibility_Percent
FROM comparison_summary;

Key Analysis Metrics

Target Metrics (based on our success):

Name Match Percentage: 100% (all column names should match exactly)
Type Compatibility Percentage: >97% (intentional differences allowed for business logic)
Missing Columns: 0 (full coverage required)

Difference Strength Scoring:

0.0: Perfect match
0.143: Length differences only
0.571: Type differences only
0.857: Name + type differences
99.0: Missing columns (highest priority to fix)

6. Results and Validation

Our Achievement

After implementing this pipeline approach, we achieved:

**✅ Schema Alignment Success:

100% column name compatibility (187 perfect matches)
97.3% data type compatibility (182 of 187 columns)
Complete underscore resolution across all fields

**✅ Critical Issues Resolved:

Money field standardization: Converted to 50-character strings for schema compatibility
Underscore naming conflicts: Eliminated through derived column transformations
Type conflict reduction: 95.6% reduction in data type conflicts (9 → 5 columns)

**✅ Production Readiness:

Robust error handling with detailed logging
Performance optimization with configurable limits
Conservative field sizing prevents data truncation
All transformations integrated without breaking functionality

Remaining Intentional Differences

5 Columns (2.7%) - Intentional Business Logic:

l_Customer_Credit_Score: int vs float - Business requires integer values
l_Loan_Number_Id: nvarchar(512) vs int - String ID format for compatibility
l_Loan_Term: int vs float - Business requires integer values
loan_Number_Id: nvarchar(512) vs int - String ID format for compatibility
program_program_ID: nvarchar(512) vs int - String ID format for compatibility

130 Columns (69.5%) - Conservative Sizing Strategy:

Script maintains 512-character default vs. SQL's optimized field lengths
Trade-off: Storage overhead vs. data safety (prevents truncation)
Status: Acceptable for current operations

Production Status

✅ Production Ready - All critical schema conflicts resolved. The pipeline operates with robust data consistency while maintaining flexibility for edge cases.

References and Citations

[1] Microsoft. "General Naming Conventions - .NET Design Guidelines." Microsoft Learn, https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/general-naming-conventions

".NET Framework naming guidelines specify that identifiers should not contain underscores. This applies to property names, method names, and other code elements in C#."

[2] "The Problems an Underscore Created in an SSIS Script Component." Awesome SQL, July 11, 2019, https://awesomesql.wordpress.com/2019/07/11/the-problems-an-underscore-created-in-an-ssis-script-component/

"When SSIS creates the C# code for a Script Component, it automatically strips underscores from column names to comply with C# naming conventions, creating a disconnect between SQL column names and the generated property names."

[3] Stack Overflow. "SSIS Script Component Stripping Underscores from Column Names." Stack Overflow, https://stackoverflow.com/questions/6352940/ssis-script-component-stripping-underscores-from-column-names

"This is a known behavior in SSIS Script Components where underscores are automatically removed from column names when generating C# properties, requiring manual workarounds for proper column mapping."

Table of Contents