Hi All,
Is there any technique in DataStage that will allow us to masked Productional data to create test data? We have a project that consists of sensitive data - ssn, hourly wage, name, address, etc. We would like to use good test data that is not production data.
Thanks,
Abhi.
Masked Data
Moderators: chulett, rschirm, roy
In the past I've written a number of jobs that take the "live" data and run it through a transform which either randomizes columns or masks information, depending upon what needs to be done. In many cases randomizing would ruin the correlations (i.e. if the customer name is the key for a lookup), so a simple text-based encoding would be used (i.e. subtract 13 from the ascii value of each character) in all occurrences of that name. This can end up being a lot of work if the keys are identical with the data that should be masked. Then again, the big project I was working on was with millions of credit card records and transactions, so the anonymizing step was quite important during the initial test phase.
-
- Premium Member
- Posts: 108
- Joined: Sat Feb 05, 2005 6:52 pm
- Location: US
Diamondabhi,
I won't post any jobs or code for this, but will elaborate a bit:
If you have numerics that need to be masked use the RND() function which returns a pseudo-random number [it is important to seed it with a constant value if you need reproduceable numbers on subsequent runs]. So if you have a currency amount you want masked (i.e. 45432.00) I would either do a completely random number as in RND(9999999)/100 or RND(100)/10*In.Currency to use the original in some fashion. Text fields with names that need to stay consistant between tables (key fields) can be masked:
This example is a bit too simple, but it should give you an idea of what can be done. You could even go in and use a SOUNDEX conversion, but that would give you a lot of duplicates.
Perhaps you could explain if you need a simple or complex masking done - and specifically for which type of constructs.
I won't post any jobs or code for this, but will elaborate a bit:
If you have numerics that need to be masked use the RND() function which returns a pseudo-random number [it is important to seed it with a constant value if you need reproduceable numbers on subsequent runs]. So if you have a currency amount you want masked (i.e. 45432.00) I would either do a completely random number as in RND(9999999)/100 or RND(100)/10*In.Currency to use the original in some fashion. Text fields with names that need to stay consistant between tables (key fields) can be masked:
Code: Select all
Function Mask(InputString)
StringLen = LEN(InputString)
Ans = ''
FOR i = 1 TO StringLen
Modulus = MOD(i,3)
IF Modulus=0 THEN Ans := CHAR(SEQ(InputString[i,1])-1)
ELSE IF Modulus=1 THEN Ans := CHAR(SEQ(InputString[i,1])+1)
ELSE Ans := InputString[i,1]
NEXT i
Perhaps you could explain if you need a simple or complex masking done - and specifically for which type of constructs.
-
- Premium Member
- Posts: 108
- Joined: Sat Feb 05, 2005 6:52 pm
- Location: US
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom