Data from Log File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Data from Log File

Post by suryadev »

Hello,

Below is the sample data from a log file

sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

There are around 100M such records in this log file and position of SSN is also different in each single paragraph

Please suggest me how to pull the SSN value from the log file and make it some number

Thank you
Thanks,
Surya
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Read the records as one long string field, filter out any that do not start with 'SSN:' and then use Field() to get the second field from that ":" delimited string. Convert.
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thank you!

So the complete file will be a single field? and also the target should look the same as source with only changes SSN values.

Please correct me if I am wrong
Thanks,
Surya
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

I did read the file with a sequential stage, the data looks god when I view it.

Now the records can be filtered where I see SSN but how can I combine the records which are dropped earlier?


Thanks again!
Thanks,
Surya
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You never mentioned that part in your original post, I was under the impression all you wanted were the SSN values which is why I provided the answer that I did. Can you be more specific with regards to exactly what your end result should be?
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Sorry for the confusion.
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

I tried doing it as you said and in the transformer can use an If condition and change it as needed but my main issue was I need to pass this data to the masking stage and all I can do in the masking stage is find the position in that particular record where there is SSN and mask it but how to handle the other records in the log file...that is where I cannot move further...

Please suggest me some solution
Thanks,
Surya
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So that's what you meant by "make it some number". Based on your description and the subject of your post I thought you wanted to extract it as "data from the log file" and convert it from a string to a number, not mask it. Always helps to fully explain what you are trying to do sooner rather than later.

I have zero experience with the Masking stage. Does it need to be bypassed for the records that don't need masking or can you not tell it which ones are just pass through? Anyone know?
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

If there are several fields we can give functions to only fields which need masking but in our case we have only 1 field where we need to mask for only some records where ever these is SSN.
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you need to be able to reverse the mapping to a masked value?

In either case, creating a routine is probably the easiest way to go. The routine needs to generate the same (?) unique replacement for each SSN.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thanks very much!

Reverse the mapping? do you mean creating a seperate flow to the masking pack?

How can the routine be created? writing an own routine?
In the list of the records I need to search where ever it says SSN?

As given below

Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Can field function be used to find the position for SSN as it will in the same position when searched as records but not sure where in will be as columns as now it is in second line , sometimes third line.....

Please suggest how to use it to search for word SSN in the records?
Thanks,
Surya
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

By 'reverse the mapping to a masked value' Ray was asking if you had a need to be able to undo or 'unmask' the masked value back to its original value. Hopefully not as masking is a one-way process.

Not seeing the problem with SSN, as noted earlier all it seeems you need to do from the examples you posted is check to see if the first four characters are 'SSN:'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thank you!

It is not possible to unmask back again...as the masking pack is irreversible.

I got an idea of generating sequence number to all the records and just send the records with SSN to masking pack and after that use the two links(SSN records and Non SSN records) to funnel stage and sort based on sequence number so that the records order remains the same. hope that works!
Thanks,
Surya
Post Reply