Range Lookup
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
Range Lookup
I have 2 database stages
First database stage contains a number field called IP_NUMBER.
Second database stage contains fields called
IP_FROM
IP_TO
COUNTRY_NAME
COUNTRY_CODE
DOMAIN_NAME
I have done a sparse lookup and a query in the database as follows
that is to obtain COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME based on the IP_NUMBER.
SELECT
COUNTRY_CODE,
COUNTRY_NAME,
DOMAIN_NAME
FROM
IP_LOOKUP
WHERE
ORCHESTRATE.IP_NUMBER
BETWEEN
IP_FROM
AND
IP_TO
The problem is since it is a sparse lookup, it is taking a very long time. Even though the table has been indexed, it is executing at the rate of one record per second, there are 1 million records, to be processed, so it is really very slow. Please suggest on this.
First database stage contains a number field called IP_NUMBER.
Second database stage contains fields called
IP_FROM
IP_TO
COUNTRY_NAME
COUNTRY_CODE
DOMAIN_NAME
I have done a sparse lookup and a query in the database as follows
that is to obtain COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME based on the IP_NUMBER.
SELECT
COUNTRY_CODE,
COUNTRY_NAME,
DOMAIN_NAME
FROM
IP_LOOKUP
WHERE
ORCHESTRATE.IP_NUMBER
BETWEEN
IP_FROM
AND
IP_TO
The problem is since it is a sparse lookup, it is taking a very long time. Even though the table has been indexed, it is executing at the rate of one record per second, there are 1 million records, to be processed, so it is really very slow. Please suggest on this.
-
- Participant
- Posts: 64
- Joined: Fri Jul 16, 2004 7:53 am
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
The problem is that there are no keys here, There are no surrogate keys too,
I have fiven the structure of the 2 tables , based on the IP_NUMBER from one table I need to lookup in another table if it is present in the range and derive the rest of the values of COUNTRY_CODE, COUNTRY_NAME and DOMAIN_NAME. The values of IP_FROM and IP_TO are also not constant.
so this is the problem
I have fiven the structure of the 2 tables , based on the IP_NUMBER from one table I need to lookup in another table if it is present in the range and derive the rest of the values of COUNTRY_CODE, COUNTRY_NAME and DOMAIN_NAME. The values of IP_FROM and IP_TO are also not constant.
so this is the problem
-
- Participant
- Posts: 64
- Joined: Fri Jul 16, 2004 7:53 am
-
- Participant
- Posts: 6
- Joined: Mon Feb 07, 2005 11:54 pm
-
- Premium Member
- Posts: 141
- Joined: Tue Mar 16, 2004 8:22 am
- Location: HSBC - UK and India
- Contact:
Re: Range Lookup
I want to ask this question too, but I use 2 Dataset stages instead.
So how can I build up a DS job that can get the same result as the SQL statement.
Thanks a lot !
So how can I build up a DS job that can get the same result as the SQL statement.
Thanks a lot !
varshanswamy wrote:I have 2 database stages
First database stage contains a number field called IP_NUMBER.
Second database stage contains fields called
IP_FROM
IP_TO
COUNTRY_NAME
COUNTRY_CODE
DOMAIN_NAME
I have done a sparse lookup and a query in the database as follows
that is to obtain COUNTRY_CDOE, COUNTRY_NAME, DOMAIN_NAME based on the IP_NUMBER.
SELECT
COUNTRY_CODE,
COUNTRY_NAME,
DOMAIN_NAME
FROM
IP_LOOKUP
WHERE
ORCHESTRATE.IP_NUMBER
BETWEEN
IP_FROM
AND
IP_TO
The problem is since it is a sparse lookup, it is taking a very long time. Even though the table has been indexed, it is executing at the rate of one record per second, there are 1 million records, to be processed, so it is really very slow. Please suggest on this.
Our friends at Ascential Support wrote a cusom stage for us... the problem is that it "collects" the entire million rows before doing the range lookup.
The rather inelegant solution we implemented was to join against all reference rows (I think it's called a cartesian product) and then have another stage that filters out only those rows where IP_NUMBER >= IP_FROM and IP_NUMBER <= IP_TO.
To do the join, we added a "dummy_key" column to each side of the join (and set its value to "1").
It's a pretty data-heavy solution, though, so I don't know whether it will really help with the execution time.
I hope it helps.
- g
"If at first you don't succeed, use a bigger hammer."
The rather inelegant solution we implemented was to join against all reference rows (I think it's called a cartesian product) and then have another stage that filters out only those rows where IP_NUMBER >= IP_FROM and IP_NUMBER <= IP_TO.
To do the join, we added a "dummy_key" column to each side of the join (and set its value to "1").
It's a pretty data-heavy solution, though, so I don't know whether it will really help with the execution time.
I hope it helps.
- g
"If at first you don't succeed, use a bigger hammer."
-
- Premium Member
- Posts: 141
- Joined: Tue Mar 16, 2004 8:22 am
- Location: HSBC - UK and India
- Contact:
Gazelle, in fact, the method you mention we have already used before, but the performance is really poor.
How about your custom stage ?
Or any other solution ?
How about your custom stage ?
Or any other solution ?
Gazelle wrote:Our friends at Ascential Support wrote a cusom stage for us... the problem is that it "collects" the entire million rows before doing the range lookup.
The rather inelegant solution we implemented was to join against all reference rows (I think it's called a cartesian product) and then have another stage that filters out only those rows where IP_NUMBER >= IP_FROM and IP_NUMBER <= IP_TO.
To do the join, we added a "dummy_key" column to each side of the join (and set its value to "1").
It's a pretty data-heavy solution, though, so I don't know whether it will really help with the execution time.
I hope it helps.
- g
"If at first you don't succeed, use a bigger hammer."
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
Re: Range Lookup
I have implemented the same by writing a server job and calling it in the form of a shared container in the parallel job. I still need to check on performance. In the server job, I have put the records of the IP_LOOKUP table into a hash file, and later in another job I have used the Universe Stage to do the lookup based on the range IP_NUMBER > IP_FROM and IP_NUMBER < IP_TO. But I would appreciate if anyone could give me a solution to this in PX itself.
It is much the same in PX. We just don't use a hash file.
On the incoming stream, we add a column called "dummy_key" and give it a value of "1".
On the reference stream, we also add a column called "dummy_key" and give it a value of "1".
A "Join" is done on the key "dummy_key", which pairs each rows from the input stream with all rows on the reference stream.
The joined rows then go to a "Transformer" stage where we only let through rows where IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO.
I can't give much info about the Custom Stage, other than:
I am having a bit of trouble getting it to work; it can only use one key (and it's max and min value); it needs the partioning set to "Entire".
I can't see that it would provide much of a performance improvement (since I believe it collects all the incoming rows before it starts processing), but it sure looks a lot more elegant!
Speak to Ascential Support for more info.
- g
On the incoming stream, we add a column called "dummy_key" and give it a value of "1".
On the reference stream, we also add a column called "dummy_key" and give it a value of "1".
A "Join" is done on the key "dummy_key", which pairs each rows from the input stream with all rows on the reference stream.
The joined rows then go to a "Transformer" stage where we only let through rows where IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO.
I can't give much info about the Custom Stage, other than:
I am having a bit of trouble getting it to work; it can only use one key (and it's max and min value); it needs the partioning set to "Entire".
I can't see that it would provide much of a performance improvement (since I believe it collects all the incoming rows before it starts processing), but it sure looks a lot more elegant!
Speak to Ascential Support for more info.
- g
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
I want to know one thing how is it possible to filter based on IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO in transformer do you mean
adding it in the constraint. Because I would be having a cartesian product now of the records when I join using a "dummy_key".
adding it in the constraint. Because I would be having a cartesian product now of the records when I join using a "dummy_key".
Gazelle wrote:It is much the same in PX. We just don't use a hash file.
On the incoming stream, we add a column called "dummy_key" and give it a value of "1".
On the reference stream, we also add a column called "dummy_key" and give it a value of "1".
A "Join" is done on the key "dummy_key", which pairs each rows from the input stream with all rows on the reference stream.
The joined rows then go to a "Transformer" stage where we only let through rows where IP_NUMBER >= IP_FROM AND IP_NUMBER <= IP_TO.
I can't give much info about the Custom Stage, other than:
I am having a bit of trouble getting it to work; it can only use one key (and it's max and min value); it needs the partioning set to "Entire".
I can't see that it would provide much of a performance improvement (since I believe it collects all the incoming rows before it starts processing), but it sure looks a lot more elegant!
Speak to Ascential Support for more info.
- g
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
I have a single database, the problem why I cant join is there are no keys. But I will still try the outer join.
Sainath.Srinivasan wrote:You said you have 2 database stages. Does it imply that the 2 tables reside in separate db or in the same db in different tables?
If they reside on the same db, you may be better-off using an SQL join instead of a lookup. You can use outer join if you wish to include non-matching rows.
-
- Premium Member
- Posts: 141
- Joined: Tue Mar 16, 2004 8:22 am
- Location: HSBC - UK and India
- Contact:
I think after join stage use Filter stage is much better ( Transformer 's performance is not good).
Besides, Gazelle, Would you please send me your custom stage (source code), I would like to have a look and study.
Many many thanks !
Besides, Gazelle, Would you please send me your custom stage (source code), I would like to have a look and study.
Many many thanks !
Gazelle wrote:Yes, the filtering was done in the Transformer stage as a "constraint".