Page 1 of 1

Order By before Hashed Stage

Posted: Thu Oct 23, 2008 3:49 am
by Amit_111
Hi All,

I have a simple question, Do we need to provide a sorted data as input to a Hashed Stage so that a proper and correct lookup logic can be applied on the hashed file????

Thanks in Advance.

Posted: Thu Oct 23, 2008 4:54 am
by ArndW
No. The hashing algorithm determines how the data is stored. If you sort your input stream by the key column you use for the subsequent lookup then you might gain a miniscule amount of time since each lookup will have a higher chance of hitting the same record, which is probably still in memory and quickly reachable.

Posted: Sat Nov 01, 2008 8:33 am
by Amit_111
But When I use my query as is and do Lookup on it .... I miss out some records which are actually present in the LookUp and which should have matched..............And when I order by my Lookup query and take data in the Hashed file all records wihich actually should match goes in the target.......What could be the issue then out here in the LookUp ??

Posted: Sat Nov 01, 2008 8:40 am
by ArndW
Can you explain your job design in more detail? The hashed-file lookup ordering should not affect the outcome.

Posted: Sat Nov 01, 2008 11:08 am
by chulett
Probably a victim of not understanding the destructive overwrite nature of the beast. Also known as "When Keys Collide". :wink:

Posted: Thu Nov 13, 2008 6:36 am
by Amit_111
Sorry Craig, I didn't get your point and what your were pointing towards.

I will just explain my Job design. I have some records from the Source tables. I have a Lookup query which brings the Policy Numbers and other requred feilds with it in a Hashed file on which a Lookup is done.

Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??

Posted: Thu Nov 13, 2008 8:22 am
by chulett
The "order by" won't affect the actual lookup itself but can affect the hashed file population. Where exactly do you "order by"? And "rejected" in this case means the lookup fails? :?

Posted: Sat Nov 15, 2008 3:30 am
by pxraja
Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
Hi,
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.

when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.

Posted: Sat Nov 15, 2008 3:33 am
by pxraja
Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
Hi,
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.

when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.

Posted: Sat Nov 15, 2008 8:28 am
by chulett
Something we've already tried to explain. Hence my comments about "destructive overwrite" (the official name and something easily searched for here) and "population v. lookup".

Posted: Mon Nov 17, 2008 12:01 am
by Amit_111
chulett wrote:The "order by" won't affect the actual lookup itself but can affect the hashed file population. Where exactly do you "order by"? And "rejected" in this case means the looku ...
I am doing the 'Order By' while selecting the records through the Oracle Stage i.e. just before populating them in the Hashed File. Yess the Rejected records I am talking about are the once which fails in a LookUp. The Column 'Policy Number' on which I am doing 'order by' is a Primary Key in the Source table itself. so a duplicate records will not be present for it.

Posted: Mon Nov 17, 2008 1:16 am
by chulett
What are the Key fields in the hashed file? That is the pertinent information needed here.