Question :
How can I validate a SSN column that is encrypted in a table using AE without passing one by one value as a parameter? I have to validate if the SSN is formatted properly and numbers and ‘-‘ etc.
I tried inserting the SSN column values into a #table to perform the validations on the decrypted values but that fails.
This is being implemented in a stored procedure which does all validations on the encrypted table in bulk and fails for SSN. Any suggestions that can help me perform this task?
I understand that validation has to be done by app. The requirement is that a third party tool loads a file to the DB which has SSN in it and they don’t do any validation for the file upfront before loading it into the table. We have encrypted the column on the table and then after we want to validate the ssn to check if it’s 10 digits or not etc. How do we achieve this? It’s not defeating the purpose as anyone who has the master key can perform required actions as per the need.
Answer :
You are fighting against the technology that you’re using. The point of Always Encrypted is that data is only decrypted on the client. A few quotes from the documentation:
To successfully update the column [ed: as opposed to updating in
T-SQL], do the following:SELECT the data out of the SSN column, and store it as a result set in the application. This will allow for the application (client
driver) to decrypt the column.
INSERT the data from the result set into SQL Server.
…
Queries can perform equality comparison on columns encrypted using
deterministic encryption, but no other operations (for example,
greater/less than, pattern matching using the LIKE operator, or
arithmetical operations).
…
Always Encrypted is not supported for the columns with the below
characteristics (for example, the Encrypted WITH clause cannot be used
in CREATE TABLE/ALTER TABLE for a column, if any of the following
conditions apply to the column):
- Columns referenced by computed columns (when the expression does unsupported operations for Always Encrypted)
- Columns referenced by check constraints
Put more simply, you can’t do things like pattern matching or inserting data from any column, even the non-encrypted ones, into a temp table. If you could find a way to do what you want within a stored procedure with no other involved tables then you’ve broken the security of Always Encrypted and should report that issue to Microsoft. Whoever is asking you to do this is asking for you to do something unreasonable. Do the validation on the client because that’s where you have access to unencrypted data.
The below is a bad idea, and you should not do this.
Meeting your requirements is possible if your SSN column was encrypted using deterministic encryption. From the documentation:
Use deterministic encryption for columns that will be used as search
or grouping parameters, for example a government ID number.
This is important because it allows joins. The trick is to load all valid SSNs into a different table and to encrypt that column in the same way. You can then join the two tables to find any SSNS that aren’t valid. Loading a billion valid SSNs into a table is pretty easy and takes about five minutes on my machine. You can reduce the result set further if you want to do more robust validation. The problem is applying the encryption to a billion row table can take a long time. To go through a proof of concept with just one million rows:
Here’s one valid row and one invalid row loaded into the table that needs to be validated:
DROP TABLE IF EXISTS dbo.SSNS;
CREATE TABLE dbo.SSNS (
ID BIGINT NOT NULL,
SSN VARCHAR(25) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (COLUMN_ENCRYPTION_KEY = SSNS__SSN,
ENCRYPTION_TYPE = DETERMINISTIC ,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256')
);
GO
DECLARE @ssn VARCHAR(25) = '111-11-1111';
INSERT INTO dbo.SSNS VALUES (1, @ssn);
GO
DECLARE @ssn VARCHAR(25) = 'SEAN GALLARDY WAS HERE';
INSERT INTO dbo.SSNS VALUES (2, @ssn);
GO
Next populate the table of all allowed SSNs. In the code below I only insert 0.1% of the data.
DROP TABLE IF EXISTS #t_100;
DROP TABLE IF EXISTS #t_1000;
DROP TABLE IF EXISTS #t_10000;
SELECT TOP (100)
RIGHT('0' + CAST(t.RN AS VARCHAR(10)), 2) NUM
INTO #t_100
FROM
(
SELECT -1 + ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;
-- only insert 0.1% of data as a proof of concept
SELECT '111' NUM
INTO #t_1000;
--SELECT TOP (1000)
-- RIGHT('00' + CAST(t.RN AS VARCHAR(10)), 3) NUM
--INTO #t_1000
--FROM
--(
-- SELECT -1 + ROW_NUMBER()
-- OVER (ORDER BY (SELECT NULL)) RN
-- FROM master..spt_values t1
-- CROSS JOIN master..spt_values t2
--) t;
SELECT TOP (10000)
RIGHT('000' + CAST(t.RN AS VARCHAR(10)), 4) NUM
INTO #t_10000
FROM
(
SELECT -1 + ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;
DROP TABLE IF EXISTS dbo.ALL_VALID_SSNS_POC;
CREATE TABLE dbo.ALL_VALID_SSNS_POC (
SSN VARCHAR(25) COLLATE Latin1_General_BIN2
);
-- about 5 minutes for all data, significantly faster for 1 million rows
INSERT INTO dbo.ALL_VALID_SSNS_POC WITH (TABLOCK)
SELECT
t1000.NUM
+ '-' + t100.NUM
+ '-' + t10000.NUM AS SSN
FROM (SELECT TOP (1000) * FROM #t_1000) t1000
CROSS JOIN #t_100 t100
CROSS JOIN #t_10000 t10000
OPTION (MAXDOP 8, NO_PERFORMANCE_SPOOL, FORCE ORDER);
CREATE CLUSTERED INDEX CI ON dbo.ALL_VALID_SSNS_POC (SSN);
Now the data needs to be encrypted which is the problem. The process took about an hour and grew the transaction log by 250 GB before I killed it. Among other things, updating a billion rows like this will be expensive:
UPDATE [dbo].[ALL_VALID_SSNS] SET
[dbo].[ALL_VALID_SSNS].[tceGuidCol1] = NEWID()
It seems possible to do if you’re willing to wait long enough and you have enough log space. Be sure to increase your timeout. I also had better results generating powershell code through the “Encrypt Columns…” SSMS database task and running the code that way.
Once the encryption is done you can get all of the rows with valid SSNs with a query like this:
SELECT s.*
FROM dbo.SSNS s
WHERE EXISTS (
SELECT 1
FROM dbo.ALL_VALID_SSNS_POC vs
WHERE s.SSN = vs.SSN
);
and you can get all of the rows with invalid SSNs with a query like this:
SELECT s.*
FROM dbo.SSNS s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.ALL_VALID_SSNS_POC vs
WHERE s.SSN = vs.SSN
);