Data Masking using parallel jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSR
Participant
Posts: 3
Joined: Fri Jul 11, 2008 9:03 am

Data Masking using parallel jobs

Post by DSR »

Hi,
Could anyone please explain briefly how a masking is done using parallel jobs in Data Stage 7.5.

How a typical Masking job should be?

Thanks,
DS.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

Can you be more specific about what you mean by "masking"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There is a data masking for DataStage solution which is due in late 2009! IBM Professional Services have already built one using routines but they are not sharing it unless you engage them directly. The main question is whether you really need data masking. Most sites make the ETL server secure and don't need to mask the data on it.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Others also have data obfuscation routines available. But is that what you mean by "masking"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSR
Participant
Posts: 3
Joined: Fri Jul 11, 2008 9:03 am

Masking

Post by DSR »

ray.wurlod wrote:Welcome aboard.

Can you be more specific about what you mean by "masking"? ...
Masking is just like desoding or encrypting the original records in the table to some other values that may be from selecting from other files.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You need to find out what needs to be masked, why it needs to be masked and how robust that masking needs to be. Credit Card numbers for example need 106 bit encryption or masking that DataStage cannot do, customer names could just be scrambled with an external C++ routine algorithm. There are applications to encrypt that sit outside DataStage, there are masking functions for Oracle database that you could apply on load, there is a robust masking solution from IBM via the Optim product that protects test data, there are the encode and decode stages, there are java calls and web services calls you can make to third party encryption and hashing products. If you mask too early you can't tranform the data. If you have unmasked data landing on the DataStage server then it is exposed then no amount of masking will undo that.
DSR
Participant
Posts: 3
Joined: Fri Jul 11, 2008 9:03 am

Post by DSR »

vmcburney wrote:You need to find out what needs to be masked, why it needs to be masked and how robust that masking needs to be. Credit Card numbers for example need 106 bit encryption or masking that DataStage cannot do, customer names could just be scrambled with an external C++ routine algorithm. There are applications to encrypt that sit outside DataStage, there are masking functions for Oracle database that you could apply on load, there is a robust masking solution from IBM via the Optim product that protects test data, there are the encode and decode stages, there are java calls and web services calls you can make to third party encryption and hashing products. If you mask too early you can't tranform the data. If you have unmasked data landing on the DataStage server then it is exposed then no amount of masking will undo that.

Thank you,
I understnad the idea. Working on developing masking jobs. Hope to find more info..
datastagenewbie
Participant
Posts: 64
Joined: Tue Sep 23, 2008 9:54 am

Data Obfuscation

Post by datastagenewbie »

[quote="ray.wurlod"]Others also have data obfuscation routines available. But is that what you mean by "masking"? ...[/quote]

Where can I find the Data Obfuscation routines?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vmcburney wrote:There is a data masking for DataStage solution which is due in late 2009! IBM Professional Services have already built one using routines but they are not sharing it unless you engage them directly.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kapil_333
Participant
Posts: 48
Joined: Tue Sep 09, 2008 2:39 am

Post by kapil_333 »

I am also supposed to do data obfuscation on files with 200+ columns.
There must be some auotmated way to do it in Datastage.
Or other way is to take columns individually and sort them on irrelevant columns.

I want know from DS user how he did it ?
NJOY......!!!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The RoadMap presentation at IOD 2008 conference mentioned that this is functionality "they" (IBM) are looking at including in a future release. Possibly as early as 8.2, but no promises.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Right now probably the easiest way to do it is on the way out of the database using some of the data masking functions that are now available on the latest DBs using user-defined SQL or a stored procedure as the source. There are some DataStage data masking functions offered by IBM services but I've never seen them and I don't know how they are implemented. Possibly some type of routines - C or basic. You still need to apply them to every column. You could also have a look at Optim from IBM - a product that specialises in data masking and could be used prior to the data reaching DataStage.
Post Reply