Country Codes Extraction
Moderators: chulett, rschirm, roy
Country Codes Extraction
Hi,
I have a column which contains calling party number and i want to extacct the country code from that number and i am using string matching technique e.g. First check for 5 digit and then 4 digit till the last digit and find that value in a string. i have to traverse 5 time for each row in worse case, the problem i am facing is the performance bottleneck that server processes only 50 to 70 rows per sec and if i remove that code it processes at very high speed.
i want expert opinion about how to extract the country codes efficiently as i have to process 80 to 90 million rows on average.
I have a column which contains calling party number and i want to extacct the country code from that number and i am using string matching technique e.g. First check for 5 digit and then 4 digit till the last digit and find that value in a string. i have to traverse 5 time for each row in worse case, the problem i am facing is the performance bottleneck that server processes only 50 to 70 rows per sec and if i remove that code it processes at very high speed.
i want expert opinion about how to extract the country codes efficiently as i have to process 80 to 90 million rows on average.
Regards,
Affan
"Questioning is Half Knowledge"
Affan
"Questioning is Half Knowledge"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's no way around the fact that you have to check five-digit prefix, four-digit prefix, and so on.
Read-cached hashed file is the fastest mechanism "out of the box" to determine whether you have a hit.
Otherwise you can create a routine that compares against a bit map, or a map of 1 and 0 characters in a dimensioned array initialized into COMMON. I have posted this solution previously, and implemented it for a large telco (for variable length area codes, but the same technique can be used).
Read-cached hashed file is the fastest mechanism "out of the box" to determine whether you have a hit.
Otherwise you can create a routine that compares against a bit map, or a map of 1 and 0 characters in a dimensioned array initialized into COMMON. I have posted this solution previously, and implemented it for a large telco (for variable length area codes, but the same technique can be used).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, Thanks a lot for the prompt reply as i was really praying for some help as it mid night and i am stuck in this problem.
It went totally out of my head
can you please explain it a little bit further how to do it as i am not that much experienced in DS.
Thanks in advance, waiting for your reply![Smile :)](./images/smilies/icon_smile.gif)
It went totally out of my head
![Embarassed :oops:](./images/smilies/icon_redface.gif)
can you please explain it a little bit further how to do it as i am not that much experienced in DS.
Thanks in advance, waiting for your reply
![Smile :)](./images/smilies/icon_smile.gif)
Regards,
Affan
"Questioning is Half Knowledge"
Affan
"Questioning is Half Knowledge"
We can help you further if you more clearly explain your requirements with an example of data. Ray's not the only one who can help you. ![Wink :wink:](./images/smilies/icon_wink.gif)
![Wink :wink:](./images/smilies/icon_wink.gif)
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes, we have gurus like you as well
but he was the only one who replied late nite
anyways heres an example
i have a n number 00923009424280 and theres another number 00966503021412
and now i want to extarct the country codes; in first it would be 92 and in the second case it would be 966 and in other cases the country code lenght will tend to vary based on the country being called.
so in DS i want to extract this code in an optimal technique, i have lookup table too where all the country codes are stored.
Let me know if you need further explantion too.
Thanks in advance.
![Smile :)](./images/smilies/icon_smile.gif)
![Smile :)](./images/smilies/icon_smile.gif)
anyways heres an example
i have a n number 00923009424280 and theres another number 00966503021412
and now i want to extarct the country codes; in first it would be 92 and in the second case it would be 966 and in other cases the country code lenght will tend to vary based on the country being called.
so in DS i want to extract this code in an optimal technique, i have lookup table too where all the country codes are stored.
Let me know if you need further explantion too.
Thanks in advance.
Regards,
Affan
"Questioning is Half Knowledge"
Affan
"Questioning is Half Knowledge"
Maffan,
in the past I've done a couple of TELCO projects and when I have had to do this I've used a branching decision tree structure. Due to the mechanism used with variable lengths you will not be able to make this into a single condition.
I also found that it makes more sense to order the if-then-else statements or CASE or similar structure by expected frequency of country codes for that provider and not necessarily by length of country prefix.
in the past I've done a couple of TELCO projects and when I have had to do this I've used a branching decision tree structure. Due to the mechanism used with variable lengths you will not be able to make this into a single condition.
I also found that it makes more sense to order the if-then-else statements or CASE or similar structure by expected frequency of country codes for that provider and not necessarily by length of country prefix.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Summary of Technique
In a before-stage routine initialize a string to Str("0",100000) then populate any position in the string with 1 if that position corresponds with a valid country code. For example ValidCodes[966,1] = "1". (You could also do this with a DataStage job that reads the table and assembles the string in a stage variable; run the results through an Aggregator stage setting Last as the aggregate function.)
To test any country code, compare against ValidCodes[InLink.CountryCode,1] in a routine whose type is transform function.
In a before-stage routine initialize a string to Str("0",100000) then populate any position in the string with 1 if that position corresponds with a valid country code. For example ValidCodes[966,1] = "1". (You could also do this with a DataStage job that reads the table and assembles the string in a stage variable; run the results through an Aggregator stage setting Last as the aggregate function.)
To test any country code, compare against ValidCodes[InLink.CountryCode,1] in a routine whose type is transform function.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.