Data from Log File
Moderators: chulett, rschirm, roy
Data from Log File
Hello,
Below is the sample data from a log file
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
There are around 100M such records in this log file and position of SSN is also different in each single paragraph
Please suggest me how to pull the SSN value from the log file and make it some number
Thank you
Below is the sample data from a log file
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
There are around 100M such records in this log file and position of SSN is also different in each single paragraph
Please suggest me how to pull the SSN value from the log file and make it some number
Thank you
Thanks,
Surya
Surya
You never mentioned that part in your original post, I was under the impression all you wanted were the SSN values which is why I provided the answer that I did. Can you be more specific with regards to exactly what your end result should be?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Sorry for the confusion.
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
I tried doing it as you said and in the transformer can use an If condition and change it as needed but my main issue was I need to pass this data to the masking stage and all I can do in the masking stage is find the position in that particular record where there is SSN and mask it but how to handle the other records in the log file...that is where I cannot move further...
Please suggest me some solution
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
I tried doing it as you said and in the transformer can use an If condition and change it as needed but my main issue was I need to pass this data to the masking stage and all I can do in the masking stage is find the position in that particular record where there is SSN and mask it but how to handle the other records in the log file...that is where I cannot move further...
Please suggest me some solution
Thanks,
Surya
Surya
So that's what you meant by "make it some number". Based on your description and the subject of your post I thought you wanted to extract it as "data from the log file" and convert it from a string to a number, not mask it. Always helps to fully explain what you are trying to do sooner rather than later.
I have zero experience with the Masking stage. Does it need to be bypassed for the records that don't need masking or can you not tell it which ones are just pass through? Anyone know?
I have zero experience with the Masking stage. Does it need to be bypassed for the records that don't need masking or can you not tell it which ones are just pass through? Anyone know?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Do you need to be able to reverse the mapping to a masked value?
In either case, creating a routine is probably the easiest way to go. The routine needs to generate the same (?) unique replacement for each SSN.
In either case, creating a routine is probably the easiest way to go. The routine needs to generate the same (?) unique replacement for each SSN.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks very much!
Reverse the mapping? do you mean creating a seperate flow to the masking pack?
How can the routine be created? writing an own routine?
In the list of the records I need to search where ever it says SSN?
As given below
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Can field function be used to find the position for SSN as it will in the same position when searched as records but not sure where in will be as columns as now it is in second line , sometimes third line.....
Please suggest how to use it to search for word SSN in the records?
Reverse the mapping? do you mean creating a seperate flow to the masking pack?
How can the routine be created? writing an own routine?
In the list of the records I need to search where ever it says SSN?
As given below
Below is the source
sampleXXXXXXXXXXXXXX
SSN: 123456789
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:987654321
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Below is the Target
sampleXXXXXXXXXXXXXX
SSN: 576195198
XXXX
XXXXXXXXXX
XXXXXXXXXXX
Sample2XXXXXXXXXXXX
SSN:348231890
XXXX
XXXXXXXXXX
XXXXXXXXXXXXX
Can field function be used to find the position for SSN as it will in the same position when searched as records but not sure where in will be as columns as now it is in second line , sometimes third line.....
Please suggest how to use it to search for word SSN in the records?
Thanks,
Surya
Surya
By 'reverse the mapping to a masked value' Ray was asking if you had a need to be able to undo or 'unmask' the masked value back to its original value. Hopefully not as masking is a one-way process.
Not seeing the problem with SSN, as noted earlier all it seeems you need to do from the examples you posted is check to see if the first four characters are 'SSN:'.
Not seeing the problem with SSN, as noted earlier all it seeems you need to do from the examples you posted is check to see if the first four characters are 'SSN:'.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thank you!
It is not possible to unmask back again...as the masking pack is irreversible.
I got an idea of generating sequence number to all the records and just send the records with SSN to masking pack and after that use the two links(SSN records and Non SSN records) to funnel stage and sort based on sequence number so that the records order remains the same. hope that works!
It is not possible to unmask back again...as the masking pack is irreversible.
I got an idea of generating sequence number to all the records and just send the records with SSN to masking pack and after that use the two links(SSN records and Non SSN records) to funnel stage and sort based on sequence number so that the records order remains the same. hope that works!
Thanks,
Surya
Surya