Page 1 of 1

Data Masking using parallel jobs

Posted: Wed Jul 16, 2008 1:19 pm
by DSR
Hi,
Could anyone please explain briefly how a masking is done using parallel jobs in Data Stage 7.5.

How a typical Masking job should be?

Thanks,
DS.

Posted: Wed Jul 16, 2008 4:12 pm
by ray.wurlod
Welcome aboard.

Can you be more specific about what you mean by "masking"?

Posted: Wed Jul 16, 2008 5:04 pm
by vmcburney
There is a data masking for DataStage solution which is due in late 2009! IBM Professional Services have already built one using routines but they are not sharing it unless you engage them directly. The main question is whether you really need data masking. Most sites make the ETL server secure and don't need to mask the data on it.

Posted: Wed Jul 16, 2008 5:37 pm
by ray.wurlod
Others also have data obfuscation routines available. But is that what you mean by "masking"?

Masking

Posted: Thu Jul 17, 2008 2:00 pm
by DSR
ray.wurlod wrote:Welcome aboard.

Can you be more specific about what you mean by "masking"? ...
Masking is just like desoding or encrypting the original records in the table to some other values that may be from selecting from other files.

Posted: Thu Jul 17, 2008 5:18 pm
by vmcburney
You need to find out what needs to be masked, why it needs to be masked and how robust that masking needs to be. Credit Card numbers for example need 106 bit encryption or masking that DataStage cannot do, customer names could just be scrambled with an external C++ routine algorithm. There are applications to encrypt that sit outside DataStage, there are masking functions for Oracle database that you could apply on load, there is a robust masking solution from IBM via the Optim product that protects test data, there are the encode and decode stages, there are java calls and web services calls you can make to third party encryption and hashing products. If you mask too early you can't tranform the data. If you have unmasked data landing on the DataStage server then it is exposed then no amount of masking will undo that.

Posted: Fri Jul 18, 2008 7:23 am
by DSR
vmcburney wrote:You need to find out what needs to be masked, why it needs to be masked and how robust that masking needs to be. Credit Card numbers for example need 106 bit encryption or masking that DataStage cannot do, customer names could just be scrambled with an external C++ routine algorithm. There are applications to encrypt that sit outside DataStage, there are masking functions for Oracle database that you could apply on load, there is a robust masking solution from IBM via the Optim product that protects test data, there are the encode and decode stages, there are java calls and web services calls you can make to third party encryption and hashing products. If you mask too early you can't tranform the data. If you have unmasked data landing on the DataStage server then it is exposed then no amount of masking will undo that.

Thank you,
I understnad the idea. Working on developing masking jobs. Hope to find more info..

Data Obfuscation

Posted: Mon Sep 29, 2008 2:05 pm
by datastagenewbie
[quote="ray.wurlod"]Others also have data obfuscation routines available. But is that what you mean by "masking"? ...[/quote]

Where can I find the Data Obfuscation routines?

Posted: Mon Sep 29, 2008 2:08 pm
by chulett
vmcburney wrote:There is a data masking for DataStage solution which is due in late 2009! IBM Professional Services have already built one using routines but they are not sharing it unless you engage them directly.

Posted: Fri Dec 12, 2008 6:33 am
by kapil_333
I am also supposed to do data obfuscation on files with 200+ columns.
There must be some auotmated way to do it in Datastage.
Or other way is to take columns individually and sort them on irrelevant columns.

I want know from DS user how he did it ?

Posted: Fri Dec 12, 2008 12:30 pm
by ray.wurlod
The RoadMap presentation at IOD 2008 conference mentioned that this is functionality "they" (IBM) are looking at including in a future release. Possibly as early as 8.2, but no promises.

Posted: Sat Dec 13, 2008 6:45 am
by vmcburney
Right now probably the easiest way to do it is on the way out of the database using some of the data masking functions that are now available on the latest DBs using user-defined SQL or a stored procedure as the source. There are some DataStage data masking functions offered by IBM services but I've never seen them and I don't know how they are implemented. Possibly some type of routines - C or basic. You still need to apply them to every column. You could also have a look at Optim from IBM - a product that specialises in data masking and could be used prior to the data reaching DataStage.