Question :
I have a simple data flow that copies a subset of data from a source table on an internal database to a table on web-facing database.
If there is a problem, the error is outputted to an errors table.
That’s all fine.
In the source table there is a bit column for SSIS_TRANSFERRED
that I wish to set to 1 when the copy process completes. However, I’m unsure how to approach this.
My instinct is to craft an SQL Statement that runs against each Unique ID for every row successfully transferred as part of that package – is there a simple approach to this (i.e. as part of the data flow) or do I need to create a new Control Flow with OLE DB Command that queries the web-facing table and marks the corresponding internal rows as ‘transferred’ accordingly?
Answer :
If you want to keep all the components within the current Data Flow Task, then you could add a Multicast within this Data Flow Task, with one output to the destination and the other output to an OLE DB Command that updates the source records based on the rows transferred to the destination. However the Multicast transformation is a synchronous transformation, thus the records will go to both outputs simultaneously and this could lead to blocking or deadlock issues. A simpler approach may be to just add an Execute SQL Task after the Data Flow Task that updates the source table based on the transferred records from the destination table. For the update, you’ll want to do a set-based update, such as the SQL statement below. To only perform this update based on rows transferred within that package execution, a Multicast could used in the Data Flow Task to output only the unique IDs to a staging table, and then update the source based on the matching IDs in the staging table in the subsequent Execute SQL Task. Just make sure to add a step to truncate the staging table in the beginning of the package to clear data from the prior execution.
UPDATE SRC
SET SRC.SSIS_TRANSFERRED = 1
FROM dbo.SourceTable SRC
INNER JOIN dbo.DestinationTable DEST
ON SRC.ID = DEST.ID
With T-SQL there’s an output inserted
clause which would be really helpful in this scenario.
I’m not aware of a way to do that with SSIS, so your alternate approach of using a second data flow after your first (you can connect the second data flow to the first so your dependency will be honored), querying the web database, and updating the internal one as appropriate will work as long as your workload can tolerate the additional latency involved in waiting to update the SSIS_TRANSFERRED
column.
As @mark Iannucci says, the OUTPUT clause will be your friend here.
The following query demonstrates the approach.
-- Just setup junk
DECLARE
@SRC TABLE
(
SSIS_TRANSFERRED bit
, col2 int
)
;
INSERT INTO
@SRC
(
SSIS_TRANSFERRED
, col2
)
SELECT TOP 100
0 AS SSIS_TRANSFERRED
, CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS int) AS col2
FROM
sys.all_columns AS AC;
-- This is what you will use in your Source component
-- Copy this logic into your component
UPDATE SRC
SET SRC.SSIS_TRANSFERRED = 1
OUTPUT Deleted.*
FROM @SRC SRC
WHERE SRC.SSIS_TRANSFERRED = 0;
How this works, is that it will update your source data. That’s an atomic operation – it does or doesn’t. If it updates a row, it is emitted to the SSIS buffers and will flow to your destination. If something bad happens, the transaction blows up and no data is added to the buffer.
Note that I use the Deleted
virtual table with the original value of 0 for SSIS_TRANSFERRED. Had I selected Inserted
, then I’d have rows with a 1 value. What the expected behaviour for your target is up to you.