Country Codes Extraction

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Country Codes Extraction

Post by maffan76 »

Hi,
I have a column which contains calling party number and i want to extacct the country code from that number and i am using string matching technique e.g. First check for 5 digit and then 4 digit till the last digit and find that value in a string. i have to traverse 5 time for each row in worse case, the problem i am facing is the performance bottleneck that server processes only 50 to 70 rows per sec and if i remove that code it processes at very high speed.

i want expert opinion about how to extract the country codes efficiently as i have to process 80 to 90 million rows on average.
Regards,
Affan
"Questioning is Half Knowledge"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no way around the fact that you have to check five-digit prefix, four-digit prefix, and so on.

Read-cached hashed file is the fastest mechanism "out of the box" to determine whether you have a hit.

Otherwise you can create a routine that compares against a bit map, or a map of 1 and 0 characters in a dimensioned array initialized into COMMON. I have posted this solution previously, and implemented it for a large telco (for variable length area codes, but the same technique can be used).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Ray, Thanks a lot for the prompt reply as i was really praying for some help as it mid night and i am stuck in this problem.

It went totally out of my head :oops:

can you please explain it a little bit further how to do it as i am not that much experienced in DS.

Thanks in advance, waiting for your reply :)
Regards,
Affan
"Questioning is Half Knowledge"
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Ray,
i would really appreciate if you could share that code so that i can undertand how to do it.

Thanks in advance.
Regards,
Affan
"Questioning is Half Knowledge"
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

We can help you further if you more clearly explain your requirements with an example of data. Ray's not the only one who can help you. :wink:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Affan,

Post the code what you tried, that will also help them to change if required.

regards
kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Time zones! I like to sleep occasionally.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Yes, we have gurus like you as well :) but he was the only one who replied late nite :)

anyways heres an example

i have a n number 00923009424280 and theres another number 00966503021412
and now i want to extarct the country codes; in first it would be 92 and in the second case it would be 966 and in other cases the country code lenght will tend to vary based on the country being called.
so in DS i want to extract this code in an optimal technique, i have lookup table too where all the country codes are stored.

Let me know if you need further explantion too.

Thanks in advance.
Regards,
Affan
"Questioning is Half Knowledge"
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Maffan,

in the past I've done a couple of TELCO projects and when I have had to do this I've used a branching decision tree structure. Due to the mechanism used with variable lengths you will not be able to make this into a single condition.

I also found that it makes more sense to order the if-then-else statements or CASE or similar structure by expected frequency of country codes for that provider and not necessarily by length of country prefix.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Summary of Technique

In a before-stage routine initialize a string to Str("0",100000) then populate any position in the string with 1 if that position corresponds with a valid country code. For example ValidCodes[966,1] = "1". (You could also do this with a DataStage job that reads the table and assembles the string in a stage variable; run the results through an Aggregator stage setting Last as the aggregate function.)

To test any country code, compare against ValidCodes[InLink.CountryCode,1] in a routine whose type is transform function.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply