Page 1 of 1

Data from Log File

Posted: Tue Jul 16, 2013 2:25 pm
by suryadev
Hello,

Below is the sample data from a log file

sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

There are around 100M such records in this log file and position of SSN is also different in each single paragraph

Please suggest me how to pull the SSN value from the log file and make it some number

Thank you

Posted: Tue Jul 16, 2013 2:56 pm
by chulett
Read the records as one long string field, filter out any that do not start with 'SSN:' and then use Field() to get the second field from that ":" delimited string. Convert.

Posted: Tue Jul 16, 2013 2:58 pm
by suryadev
Thank you!

So the complete file will be a single field? and also the target should look the same as source with only changes SSN values.

Please correct me if I am wrong

Posted: Tue Jul 16, 2013 3:57 pm
by suryadev
I did read the file with a sequential stage, the data looks god when I view it.

Now the records can be filtered where I see SSN but how can I combine the records which are dropped earlier?


Thanks again!

Posted: Tue Jul 16, 2013 4:14 pm
by chulett
You never mentioned that part in your original post, I was under the impression all you wanted were the SSN values which is why I provided the answer that I did. Can you be more specific with regards to exactly what your end result should be?

Posted: Tue Jul 16, 2013 9:14 pm
by suryadev
Sorry for the confusion.
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

I tried doing it as you said and in the transformer can use an If condition and change it as needed but my main issue was I need to pass this data to the masking stage and all I can do in the masking stage is find the position in that particular record where there is SSN and mask it but how to handle the other records in the log file...that is where I cannot move further...

Please suggest me some solution

Posted: Tue Jul 16, 2013 9:35 pm
by chulett
So that's what you meant by "make it some number". Based on your description and the subject of your post I thought you wanted to extract it as "data from the log file" and convert it from a string to a number, not mask it. Always helps to fully explain what you are trying to do sooner rather than later.

I have zero experience with the Masking stage. Does it need to be bypassed for the records that don't need masking or can you not tell it which ones are just pass through? Anyone know?

Posted: Wed Jul 17, 2013 3:10 pm
by suryadev
If there are several fields we can give functions to only fields which need masking but in our case we have only 1 field where we need to mask for only some records where ever these is SSN.

Posted: Wed Jul 17, 2013 4:50 pm
by ray.wurlod
Do you need to be able to reverse the mapping to a masked value?

In either case, creating a routine is probably the easiest way to go. The routine needs to generate the same (?) unique replacement for each SSN.

Posted: Mon Jul 29, 2013 9:36 am
by suryadev
Thanks very much!

Reverse the mapping? do you mean creating a seperate flow to the masking pack?

How can the routine be created? writing an own routine?
In the list of the records I need to search where ever it says SSN?

As given below

Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX

Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX

Can field function be used to find the position for SSN as it will in the same position when searched as records but not sure where in will be as columns as now it is in second line , sometimes third line.....

Please suggest how to use it to search for word SSN in the records?

Posted: Mon Jul 29, 2013 10:10 am
by chulett
By 'reverse the mapping to a masked value' Ray was asking if you had a need to be able to undo or 'unmask' the masked value back to its original value. Hopefully not as masking is a one-way process.

Not seeing the problem with SSN, as noted earlier all it seeems you need to do from the examples you posted is check to see if the first four characters are 'SSN:'.

Posted: Tue Jul 30, 2013 9:49 am
by suryadev
Thank you!

It is not possible to unmask back again...as the masking pack is irreversible.

I got an idea of generating sequence number to all the records and just send the records with SSN to masking pack and after that use the two links(SSN records and Non SSN records) to funnel stage and sort based on sequence number so that the records order remains the same. hope that works!