Date Range lookup

kris007 · Post by **kris007** » Mon Jul 31, 2006 12:24 pm

Hi,
I have a following situation:

Table 1(Primary link-not really a table input to a lookupstage):
key, purchdate
1,2006/07/26
Table2(lookup-reference-table is in db2):
key,effdate,expirydate
1,2006/06/24,2006/06/26
1,2006/06/26,2006/07/28

I have to get the records whose keys match and purchdate falls in between effdate and expirydate i.e.,
Table1.key=Table2.key and table2.effdate<table1.purchdate<=table2.expirydate.

Can any one suggest a design for this lookup.
Thanks in advance

meena · Post by **meena** » Mon Jul 31, 2006 12:43 pm

Hi Kris007,
Can you try doing a look up of the DB2 table to the input stream based on the primary key and write a constraint something like this:

Code: Select all

table2.effdate<table1.purchdate and table1.purchdate<=table2.expirydate.

have to get the records whose keys match and purchdate falls in between effdate and expirydate i.e.,
Table1.key=Table2.key and table2.effdate<table1.purchdate<=table2.expirydate.

I am not sure whether it works or not but trying to give an idea...

Thank you..

kris007 · Post by **kris007** » Mon Jul 31, 2006 12:57 pm

No, I cannot refer Table2 columns when putting condition in the lookup stage.
I can only put conditions on the input columns or prior reference links.

meena · Post by **meena** » Mon Jul 31, 2006 1:14 pm

I am sorry. I thought in server design but this is parallel job... Yes it is not possible to give a constraint on look up stage.
But I think you can make full outer join of the tables based on the key and pass this to filter stage and give the condition.....

kris007 wrote:No, I cannot refer Table2 columns when putting condition in the lookup stage.
I can only put conditions on the input columns or prior reference links.

Thank you

us1aslam1us · Post by **us1aslam1us** » Mon Jul 31, 2006 2:09 pm

Hi Kris,

Do a lookup on the key column and use a modify stage to check the range constraint. Orelse dump the data in the temp tables and run the sql query against it.

PX is

when trying to do non-equi-joins.

Sam

kris007 · Post by **kris007** » Mon Jul 31, 2006 2:18 pm

us1aslam1us wrote:Hi Kris,

Do a lookup on the key column and use a modify stage to check the range constraint.

Sam

Sam- Can I use modify stage to check range constraint? or Did you mean filter stage?

But I think you can make full outer join of the tables based on the key and pass this to filter stage and give the condition.....

Meena-I think I am gonna do that.
Thank you both.

us1aslam1us · Post by **us1aslam1us** » Mon Jul 31, 2006 2:22 pm

Sorry i mean to say filter stage.

Thanks
Sam

vmcburney · Post by **vmcburney** » Mon Jul 31, 2006 3:56 pm

That is spot on, you need a lookup stage followed by a filter stage. Resist the urge to put in a transformer with a constraint as it will be slower in this instance. Better to have a lookup - filter - transformer rather than a lookup - transformer.

DataStage 8 (Hawk) adds range lookup functionality to the lookup stage so you can lookup and filter in the one stage.

kris007 · Post by **kris007** » Mon Jul 31, 2006 5:30 pm

vmcburney wrote:Resist the urge to put in a transformer with a constraint as it will be slower in this instance.

Why is that? How does transformer function in parallel jobs. I mean in server it doesnt take much time.

vmcburney · Post by **vmcburney** » Mon Jul 31, 2006 7:11 pm

The transformer is a good allrounder: offering constraints, transforms, stage variables etc. The problem is that it performs all those transforms on all rows before the constraints are applied. This means if your input to the lookup has 100mill records, your range lookup expands it to 300mill records your transformer will transform all 300mill and then filters out the rows it doesn't need back down to 100mill.

With the lookup - filter - transformer you get 100mill into the lookup stage, 300mill into the filter stage and just 100mill into the transformer. This can make your job some 20% faster.

It is one of the few times I choose a specialised parallel stage instead of a transformer. If on the other hand you can deal with the 20% load or your range lookup doesn't multiply your data too badly you can stick with a transformer.

chulett · Post by **chulett** » Mon Jul 31, 2006 8:20 pm

vmcburney wrote:The problem is that it performs all those transforms on all rows before the constraints are applied.

Well, that's just... silly. The Server variant is smart enough not to do that, I would have though the Parallel version would be even more smarter!

vmcburney · Post by **vmcburney** » Mon Jul 31, 2006 9:28 pm

If I had DataStage running I would run a quick test to see. I am assuming that transforms are run first followed by constraints. You could test it by creating a transformer with a reject link, a constraint that blocked all rows and a transform function that triggered a rejection. If the constraint runs first you get no rejects, if the constraint runs last you get all rows rejected due to the failed transform.

Anyone got time to run one of these?

thebird · Post by **thebird** » Tue Aug 01, 2006 7:35 am

Hi Vincent,

Looks like you are right! I tested out a job as you had mentioned and the entire set of input records got rejected.

vmcburney wrote:If I had DataStage running I would run a quick test to see. I am assuming that transforms are run first followed by constraints. You could test it by creating a transformer with a reject link, a constraint that blocked all rows and a transform function that triggered a rejection. If the constraint runs first you get no rejects, if the constraint runs last you get all rows rejected due to the failed transform.

So, it means that constraints runs only after the entire set of input data has passed through all the transforms - which would be a performance bottleneck in such scenarios.

Bird

kris007 · Post by **kris007** » Tue Aug 01, 2006 9:21 am

Thanks Vincent, I have one basic question on transformer which I often wonder about. What partitioning is best for transformer and what is the default one? When I sort the data in database stage and for some conversion functions I use transformer before JOIN stage would it unsort the data by default ? If so will the key partition in Transformer stage help to retain the sorted data?
Thanks

SriKara · Post by **SriKara** » Tue Aug 01, 2006 10:11 pm

What partitioning is best for transformer and what is the default one?

"Auto" partitioning is the default and in my opinion it is the best, unless you have a specific data scenario, which may warrant other partitioning methods.

When I sort the data in database stage and for some conversion functions I use transformer before JOIN stage would it unsort the data by default

An "Auto" partitioning method tries to retain the same partitioning method of the previous stage, though not necessarily. So we can explicitly specify "SAME" partitioning method in the transformer stage, to make sure.
But the tricky question can be, if we use SAME partitioning method, the record will stay in the same partition but will the sort order within the partition be preserved?

Whenever i design my jobs, i "hash-key partition and sort" the data initially and carry the it through different stages using "SAME" partitioning method. This is useful when the key columns stay the same over the various processing stages. My observation is that, the sorted records remain in order when "SAME" partitioning is used.

The question raised by Kris is rather interesting and dying to hear more comments from the seniors.