Order By before Hashed Stage
Moderators: chulett, rschirm, roy
Order By before Hashed Stage
Hi All,
I have a simple question, Do we need to provide a sorted data as input to a Hashed Stage so that a proper and correct lookup logic can be applied on the hashed file????
Thanks in Advance.
I have a simple question, Do we need to provide a sorted data as input to a Hashed Stage so that a proper and correct lookup logic can be applied on the hashed file????
Thanks in Advance.
No. The hashing algorithm determines how the data is stored. If you sort your input stream by the key column you use for the subsequent lookup then you might gain a miniscule amount of time since each lookup will have a higher chance of hitting the same record, which is probably still in memory and quickly reachable.
But When I use my query as is and do Lookup on it .... I miss out some records which are actually present in the LookUp and which should have matched..............And when I order by my Lookup query and take data in the Hashed file all records wihich actually should match goes in the target.......What could be the issue then out here in the LookUp ??
Sorry Craig, I didn't get your point and what your were pointing towards.
I will just explain my Job design. I have some records from the Source tables. I have a Lookup query which brings the Policy Numbers and other requred feilds with it in a Hashed file on which a Lookup is done.
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.
Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.
BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.
I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
I will just explain my Job design. I have some records from the Source tables. I have a Lookup query which brings the Policy Numbers and other requred feilds with it in a Hashed file on which a Lookup is done.
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.
Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.
BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.
I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
Hi,Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.
Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.
BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.
I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.
when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.
Hi,Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.
Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.
BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.
I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.
when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.
I am doing the 'Order By' while selecting the records through the Oracle Stage i.e. just before populating them in the Hashed File. Yess the Rejected records I am talking about are the once which fails in a LookUp. The Column 'Policy Number' on which I am doing 'order by' is a Primary Key in the Source table itself. so a duplicate records will not be present for it.chulett wrote:The "order by" won't affect the actual lookup itself but can affect the hashed file population. Where exactly do you "order by"? And "rejected" in this case means the looku ...