Friday, May 21, 2010

Load Data using Change Data Capture (CDC)...

What is Change Data Capture (CDC)

Change Data Capture is a feature of the Database Engine which is available only in SQL Server 2008 Enterprise, Developer, and Evaluation Editions. CDC feature captures insert, update and delete operations applied to SQL Server tables and provides the details of the changes available in an easily consumed, relational format. CDC uses it's own tables to store data related to tracking tables with the same columns as in the tracking table and some additional columns to store important meta data.

CDC with SQL Server Integration Services (SSIS)

There may be a situation where we will need to store the data loaded from ETL Package to another table. That means in a situation where we maintain two tables to keep the same data. In such a situation we can use CDC.
Using CDC, we can load data to one table where CDC is enabled and execute a Stored Procedure to load data to the other table by capturing changed data using CDC.

Enabling CDC

Before enabling CDC for a table we need to enable CDC for the Database.
To that we can execute the below query

USE MyDatabase
Exec sys.sp_cdc_enable_db

Then using below query we can verify whether the CDC is enabled for MyDatabase.

SELECT is_cdc_enabled FROM sys.databases WHERE name = 'MyDatabase'

We can see the result of the above query as 1, which is the value for a CDC enabled database. Otherwise it will give the result as 0.

After that we have to enable CDC for each table. To do it we can use the below query to enable CDC for Table1.


USE MyDatabase
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Table1',
@role_name = NULL
GO

When we enable CDC for a table we can see a table is created under System Tables of the database as cdc.dbo_Table1_CT. Also there are two Table Valued Functions are created as
fn_cdc_get_all_changes_dbo_Table1 and fn_cdc_get_net_changes_dbo_Table1.


Function to Retrieve Change Data

We can write a SQL Function as given below to retrieve change data for a particular period for a particular table.


CREATE function cdc.uf_ChangeData_Table1 (
@start_time datetime
)
returns @Table1 table (
Column1 smallint
,Column2 smallint
,Column3 money
,CDC_OPERATION varchar(1)



) as
begin
declare @from_lsn binary(10), @to_lsn binary(10)

select @from_lsn = sys.fn_cdc_increment_lsn(sys.fn_cdc_map_time_to_lsn('largest less than or equal',@start_time))

select @to_lsn = sys.fn_cdc_get_max_lsn()


insert into @Table1
select
Column1,
Column2,
Column3,
case __$operation
when 1 then 'D'
when 2 then 'I'
when 4 then 'U'
else null
end as CDC_OPERATION
from
cdc.fn_cdc_get_net_changes_dbo_Table1(@from_lsn, @to_lsn, 'all')

return
end
go


In this function only the start time is given and get all the change data after that.



Stored Procedure to Execute the Function and Load the Data to the Table

Stored procedure was written to execute the earlier function to get the change data for a particular period and load those data to the destination tab. Stored Procedure is as below and it takes the input parameter as the start time and the required table.

CREATE PROCEDURE dbo.LoadChangedData (
@start_time DATETIME
,@table VARCHAR(50)
)
AS
BEGIN

IF (@fact_table='Table1')
BEGIN
MERGE INTO dbo.Table2 AS dest
USING (SELECT Column1,
Column2,
Column3 FROM cdc.uf_ChangeData_Table1(@start_time)) AS src

ON (src.Column1=dest.Coulmn1 AND src.Column2=dest.Coulmn2 )
WHEN MATCHED AND (src.CDC_OPERATION='U')
THEN
UPDATE SET dest.Column3=src.Column3

WHEN MATCHED AND (src.CDC_OPERATION='D') THEN DELETE

WHEN NOT MATCHED AND (src.CDC_OPERATION='I' OR src.CDC_OPERATION='U') THEN
INSERT (Column1,
Column2,
Column3 ) VALUES (src.Column1,
src.Column2,
src.Column3);

END
END


This stored procedure will do the insert, update or delete operations to the destination table considering the operation type.


Combining With the ETL

We can add a Execute SQL Task to Data Flow after OLEDB Destination component in the Data Flow and Execute the LoadChangedData stored procedure by providing required two parameters. We need to have the start time as the time before inserting the values to the first table. That means before the OLEDB Destination component.

This procedure will load the inserted data to an another table.


No comments:

Post a Comment