Range Lookup

varshanswamy · Post by **varshanswamy** » Mon Feb 14, 2005 11:03 pm

I have 2 database stages
First database stage contains a number field called IP_NUMBER.

Second database stage contains fields called

IP_FROM
IP_TO
COUNTRY_NAME
COUNTRY_CODE
DOMAIN_NAME

I have done a sparse lookup and a query in the database as follows
that is to obtain COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME based on the IP_NUMBER.

SELECT
COUNTRY_CODE,
COUNTRY_NAME,
DOMAIN_NAME
FROM
IP_LOOKUP
WHERE
ORCHESTRATE.IP_NUMBER
BETWEEN
IP_FROM
AND
IP_TO

The problem is since it is a sparse lookup, it is taking a very long time. Even though the table has been indexed, it is executing at the rate of one record per second, there are 1 million records, to be processed, so it is really very slow. Please suggest on this.

sudharsanan · Post by **sudharsanan** » Tue Feb 15, 2005 1:04 am

Does COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME and IP_NUMBER are also keys in this table or not..or IP_NUMBER is the only key.. are you using the concept of surrogate keys in this table..give us more info or the datamodel so that we can find why this is taking more time to execute...

varshanswamy · Post by **varshanswamy** » Tue Feb 15, 2005 1:57 am

The problem is that there are no keys here, There are no surrogate keys too,
I have fiven the structure of the 2 tables , based on the IP_NUMBER from one table I need to lookup in another table if it is present in the range and derive the rest of the values of COUNTRY_CODE, COUNTRY_NAME and DOMAIN_NAME. The values of IP_FROM and IP_TO are also not constant.
so this is the problem

sudharsanan · Post by **sudharsanan** » Tue Feb 15, 2005 5:42 am

I think since you done have a key in the second lookup it is taking more time to retrive the result.. can you try using a sort stage to sort the incoming data from the database.. this might retrive the data quickly..i am guessing

..just give a try and let us know what has happened...

aparnanarale · Post by **aparnanarale** » Thu Feb 17, 2005 11:00 pm

There is not much difference even if I sort the data, so please let me know what to do

HSBCdev · Post by **HSBCdev** » Fri Feb 18, 2005 2:58 am

I want to ask this question too, but I use 2 Dataset stages instead.

So how can I build up a DS job that can get the same result as the SQL statement.

Thanks a lot !

varshanswamy wrote:I have 2 database stages
First database stage contains a number field called IP_NUMBER.

Second database stage contains fields called

IP_FROM
IP_TO
COUNTRY_NAME
COUNTRY_CODE
DOMAIN_NAME

I have done a sparse lookup and a query in the database as follows
that is to obtain COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME based on the IP_NUMBER.

SELECT
COUNTRY_CODE,
COUNTRY_NAME,
DOMAIN_NAME
FROM
IP_LOOKUP
WHERE
ORCHESTRATE.IP_NUMBER
BETWEEN
IP_FROM
AND
IP_TO

The problem is since it is a sparse lookup, it is taking a very long time. Even though the table has been indexed, it is executing at the rate of one record per second, there are 1 million records, to be processed, so it is really very slow. Please suggest on this.

Gazelle · Post by **Gazelle** » Sun Feb 20, 2005 11:32 pm

Our friends at Ascential Support wrote a cusom stage for us... the problem is that it "collects" the entire million rows before doing the range lookup.

The rather inelegant solution we implemented was to join against all reference rows (I think it's called a cartesian product) and then have another stage that filters out only those rows where IP_NUMBER >= IP_FROM and IP_NUMBER <= IP_TO.

To do the join, we added a "dummy_key" column to each side of the join (and set its value to "1").

It's a pretty data-heavy solution, though, so I don't know whether it will really help with the execution time.

I hope it helps.

- g
"If at first you don't succeed, use a bigger hammer."

HSBCdev · Post by **HSBCdev** » Mon Feb 21, 2005 3:44 am

Gazelle, in fact, the method you mention we have already used before, but the performance is really poor.

How about your custom stage ?

Or any other solution ?

Gazelle wrote:Our friends at Ascential Support wrote a cusom stage for us... the problem is that it "collects" the entire million rows before doing the range lookup.

The rather inelegant solution we implemented was to join against all reference rows (I think it's called a cartesian product) and then have another stage that filters out only those rows where IP_NUMBER >= IP_FROM and IP_NUMBER <= IP_TO.

To do the join, we added a "dummy_key" column to each side of the join (and set its value to "1").

It's a pretty data-heavy solution, though, so I don't know whether it will really help with the execution time.

I hope it helps.

- g
"If at first you don't succeed, use a bigger hammer."

varshanswamy · Post by **varshanswamy** » Mon Feb 21, 2005 10:20 pm

I have implemented the same by writing a server job and calling it in the form of a shared container in the parallel job. I still need to check on performance. In the server job, I have put the records of the IP_LOOKUP table into a hash file, and later in another job I have used the Universe Stage to do the lookup based on the range IP_NUMBER > IP_FROM and IP_NUMBER < IP_TO. But I would appreciate if anyone could give me a solution to this in PX itself.

Gazelle · Post by **Gazelle** » Tue Feb 22, 2005 12:09 am

It is much the same in PX. We just don't use a hash file.
On the incoming stream, we add a column called "dummy_key" and give it a value of "1".
On the reference stream, we also add a column called "dummy_key" and give it a value of "1".
A "Join" is done on the key "dummy_key", which pairs each rows from the input stream with all rows on the reference stream.
The joined rows then go to a "Transformer" stage where we only let through rows where IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO.

I can't give much info about the Custom Stage, other than:
I am having a bit of trouble getting it to work; it can only use one key (and it's max and min value); it needs the partioning set to "Entire".
I can't see that it would provide much of a performance improvement (since I believe it collects all the incoming rows before it starts processing), but it sure looks a lot more elegant!
Speak to Ascential Support for more info.

- g

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Feb 22, 2005 3:43 am

You said you have 2 database stages. Does it imply that the 2 tables reside in separate db or in the same db in different tables?

If they reside on the same db, you may be better-off using an SQL join instead of a lookup. You can use outer join if you wish to include non-matching rows.

varshanswamy · Post by **varshanswamy** » Tue Feb 22, 2005 4:29 am

I want to know one thing how is it possible to filter based on IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO in transformer do you mean
adding it in the constraint. Because I would be having a cartesian product now of the records when I join using a "dummy_key".

Gazelle wrote:It is much the same in PX. We just don't use a hash file.
On the incoming stream, we add a column called "dummy_key" and give it a value of "1".
On the reference stream, we also add a column called "dummy_key" and give it a value of "1".
A "Join" is done on the key "dummy_key", which pairs each rows from the input stream with all rows on the reference stream.
The joined rows then go to a "Transformer" stage where we only let through rows where IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO.

I can't give much info about the Custom Stage, other than:
I am having a bit of trouble getting it to work; it can only use one key (and it's max and min value); it needs the partioning set to "Entire".
I can't see that it would provide much of a performance improvement (since I believe it collects all the incoming rows before it starts processing), but it sure looks a lot more elegant!
Speak to Ascential Support for more info.

- g

varshanswamy · Post by **varshanswamy** » Tue Feb 22, 2005 4:31 am

I have a single database, the problem why I cant join is there are no keys. But I will still try the outer join.

Sainath.Srinivasan wrote:You said you have 2 database stages. Does it imply that the 2 tables reside in separate db or in the same db in different tables?

If they reside on the same db, you may be better-off using an SQL join instead of a lookup. You can use outer join if you wish to include non-matching rows.

Gazelle · Post by **Gazelle** » Tue Feb 22, 2005 6:33 pm

Yes, the filtering was done in the Transformer stage as a "constraint".

HSBCdev · Post by **HSBCdev** » Tue Feb 22, 2005 7:44 pm

I think after join stage use Filter stage is much better ( Transformer 's performance is not good).

Besides, Gazelle, Would you please send me your custom stage (source code), I would like to have a look and study.

Many many thanks !

Gazelle wrote:Yes, the filtering was done in the Transformer stage as a "constraint".

DSXchange

Range Lookup

Range Lookup

Re: Range Lookup

Re: Range Lookup