Order By before Hashed Stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amit_111
Participant
Posts: 134
Joined: Sat Mar 24, 2007 11:37 am

Order By before Hashed Stage

Post by Amit_111 »

Hi All,

I have a simple question, Do we need to provide a sorted data as input to a Hashed Stage so that a proper and correct lookup logic can be applied on the hashed file????

Thanks in Advance.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No. The hashing algorithm determines how the data is stored. If you sort your input stream by the key column you use for the subsequent lookup then you might gain a miniscule amount of time since each lookup will have a higher chance of hitting the same record, which is probably still in memory and quickly reachable.
Amit_111
Participant
Posts: 134
Joined: Sat Mar 24, 2007 11:37 am

Post by Amit_111 »

But When I use my query as is and do Lookup on it .... I miss out some records which are actually present in the LookUp and which should have matched..............And when I order by my Lookup query and take data in the Hashed file all records wihich actually should match goes in the target.......What could be the issue then out here in the LookUp ??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you explain your job design in more detail? The hashed-file lookup ordering should not affect the outcome.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Probably a victim of not understanding the destructive overwrite nature of the beast. Also known as "When Keys Collide". :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amit_111
Participant
Posts: 134
Joined: Sat Mar 24, 2007 11:37 am

Post by Amit_111 »

Sorry Craig, I didn't get your point and what your were pointing towards.

I will just explain my Job design. I have some records from the Source tables. I have a Lookup query which brings the Policy Numbers and other requred feilds with it in a Hashed file on which a Lookup is done.

Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The "order by" won't affect the actual lookup itself but can affect the hashed file population. Where exactly do you "order by"? And "rejected" in this case means the lookup fails? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pxraja
Participant
Posts: 142
Joined: Fri Mar 14, 2008 10:22 am

Post by pxraja »

Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
Hi,
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.

when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.
pxraja
Participant
Posts: 142
Joined: Fri Mar 14, 2008 10:22 am

Post by pxraja »

Amit_111 wrote:
Now I need to check if the records coming from source are present in my LookUp. If yes I need to take forward the corresponding fields also from the Lookup reocrds.

Here when I order By the Lookup records by Policy Number (Which is numeric DataType) I get the Lookup logic correct i.e. if records are present the corresponding records from LookUp go ahead.

BUt when I remove the Order By Clause, Some source records present in the Lookup are still getting rejected.

I didnt understand why this weird thing is happening and how the "Order By" clause is playing a role out here??
Hi,
what is the primary key field?
The above issue comes to me when the same key field comes twice hash stage will update the same record.

when you use order by you are getting the record you want, Is that the key column or check for redundancy of the records.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Something we've already tried to explain. Hence my comments about "destructive overwrite" (the official name and something easily searched for here) and "population v. lookup".
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amit_111
Participant
Posts: 134
Joined: Sat Mar 24, 2007 11:37 am

Post by Amit_111 »

chulett wrote:The "order by" won't affect the actual lookup itself but can affect the hashed file population. Where exactly do you "order by"? And "rejected" in this case means the looku ...
I am doing the 'Order By' while selecting the records through the Oracle Stage i.e. just before populating them in the Hashed File. Yess the Rejected records I am talking about are the once which fails in a LookUp. The Column 'Policy Number' on which I am doing 'order by' is a Primary Key in the Source table itself. so a duplicate records will not be present for it.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What are the Key fields in the hashed file? That is the pertinent information needed here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply