Date Range lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Date Range lookup

Post by kris007 »

Hi,
I have a following situation:
Table 1(Primary link-not really a table input to a lookupstage):
key, purchdate
1,2006/07/26
Table2(lookup-reference-table is in db2):
key,effdate,expirydate
1,2006/06/24,2006/06/26
1,2006/06/26,2006/07/28
I have to get the records whose keys match and purchdate falls in between effdate and expirydate i.e.,
Table1.key=Table2.key and table2.effdate<table1.purchdate<=table2.expirydate.

Can any one suggest a design for this lookup.
Thanks in advance
Kris

Where's the "Any" key?-Homer Simpson
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

Hi Kris007,
Can you try doing a look up of the DB2 table to the input stream based on the primary key and write a constraint something like this:

Code: Select all

table2.effdate<table1.purchdate and table1.purchdate<=table2.expirydate.

have to get the records whose keys match and purchdate falls in between effdate and expirydate i.e.,
Table1.key=Table2.key and table2.effdate<table1.purchdate<=table2.expirydate.
I am not sure whether it works or not but trying to give an idea...

Thank you..
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

No, I cannot refer Table2 columns when putting condition in the lookup stage.
I can only put conditions on the input columns or prior reference links.
Kris

Where's the "Any" key?-Homer Simpson
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

I am sorry. I thought in server design but this is parallel job... Yes it is not possible to give a constraint on look up stage.
But I think you can make full outer join of the tables based on the key and pass this to filter stage and give the condition.....
kris007 wrote:No, I cannot refer Table2 columns when putting condition in the lookup stage.
I can only put conditions on the input columns or prior reference links.
Thank you
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Hi Kris,

Do a lookup on the key column and use a modify stage to check the range constraint. Orelse dump the data in the temp tables and run the sql query against it.

PX is :evil: when trying to do non-equi-joins.

Sam
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

us1aslam1us wrote:Hi Kris,

Do a lookup on the key column and use a modify stage to check the range constraint.

Sam
Sam- Can I use modify stage to check range constraint? or Did you mean filter stage?
But I think you can make full outer join of the tables based on the key and pass this to filter stage and give the condition.....
Meena-I think I am gonna do that.
Thank you both.
Kris

Where's the "Any" key?-Homer Simpson
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Sorry i mean to say filter stage.

Thanks
Sam
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

That is spot on, you need a lookup stage followed by a filter stage. Resist the urge to put in a transformer with a constraint as it will be slower in this instance. Better to have a lookup - filter - transformer rather than a lookup - transformer.

DataStage 8 (Hawk) adds range lookup functionality to the lookup stage so you can lookup and filter in the one stage.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

vmcburney wrote:Resist the urge to put in a transformer with a constraint as it will be slower in this instance.
Why is that? How does transformer function in parallel jobs. I mean in server it doesnt take much time.
Kris

Where's the "Any" key?-Homer Simpson
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The transformer is a good allrounder: offering constraints, transforms, stage variables etc. The problem is that it performs all those transforms on all rows before the constraints are applied. This means if your input to the lookup has 100mill records, your range lookup expands it to 300mill records your transformer will transform all 300mill and then filters out the rows it doesn't need back down to 100mill.

With the lookup - filter - transformer you get 100mill into the lookup stage, 300mill into the filter stage and just 100mill into the transformer. This can make your job some 20% faster.

It is one of the few times I choose a specialised parallel stage instead of a transformer. If on the other hand you can deal with the 20% load or your range lookup doesn't multiply your data too badly you can stick with a transformer.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vmcburney wrote:The problem is that it performs all those transforms on all rows before the constraints are applied.
Well, that's just... silly. The Server variant is smart enough not to do that, I would have though the Parallel version would be even more smarter! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If I had DataStage running I would run a quick test to see. I am assuming that transforms are run first followed by constraints. You could test it by creating a transformer with a reject link, a constraint that blocked all rows and a transform function that triggered a rejection. If the constraint runs first you get no rejects, if the constraint runs last you get all rows rejected due to the failed transform.

Anyone got time to run one of these?
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Hi Vincent,

Looks like you are right! I tested out a job as you had mentioned and the entire set of input records got rejected.
vmcburney wrote:If I had DataStage running I would run a quick test to see. I am assuming that transforms are run first followed by constraints. You could test it by creating a transformer with a reject link, a constraint that blocked all rows and a transform function that triggered a rejection. If the constraint runs first you get no rejects, if the constraint runs last you get all rows rejected due to the failed transform.
So, it means that constraints runs only after the entire set of input data has passed through all the transforms - which would be a performance bottleneck in such scenarios.

Bird
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Thanks Vincent, I have one basic question on transformer which I often wonder about. What partitioning is best for transformer and what is the default one? When I sort the data in database stage and for some conversion functions I use transformer before JOIN stage would it unsort the data by default ? If so will the key partition in Transformer stage help to retain the sorted data?
Thanks
Kris

Where's the "Any" key?-Homer Simpson
SriKara
Premium Member
Premium Member
Posts: 30
Joined: Wed Jun 01, 2005 8:40 am
Location: UK

Post by SriKara »

What partitioning is best for transformer and what is the default one?
"Auto" partitioning is the default and in my opinion it is the best, unless you have a specific data scenario, which may warrant other partitioning methods.
When I sort the data in database stage and for some conversion functions I use transformer before JOIN stage would it unsort the data by default
An "Auto" partitioning method tries to retain the same partitioning method of the previous stage, though not necessarily. So we can explicitly specify "SAME" partitioning method in the transformer stage, to make sure.
But the tricky question can be, if we use SAME partitioning method, the record will stay in the same partition but will the sort order within the partition be preserved?

Whenever i design my jobs, i "hash-key partition and sort" the data initially and carry the it through different stages using "SAME" partitioning method. This is useful when the key columns stay the same over the various processing stages. My observation is that, the sorted records remain in order when "SAME" partitioning is used.

The question raised by Kris is rather interesting and dying to hear more comments from the seniors.
Regards,
SriKara
Post Reply