help needed with hash file with key combinations

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ashok
Participant
Posts: 43
Joined: Tue Jun 22, 2004 3:04 pm

help needed with hash file with key combinations

Post by ashok »

Hi,
I need help with below problem I am getting,

All data in file is in sorted in order on 4 columns (S_NO, VC_NO, CITY, DATE ),

I am trying to pull first record from combination of three keys (S_NO, VC_NO, CITY), when I try to use hash file stage it is pulling correct key combination but not the first record I need, let me give a example,

Example:
INPUT RECORDS:

S_NO | VC_NO | CITY| DATE|
1001 | X1002 | CITY1 | 2005/04/22
1001 | X1002 | CITY1 | 2001/05/01
1002 | X2002 | CITY2 | 2005/06/21
1003 | X3002 | CITY3 | 2005/04/22
1003 | X3002 | CITY3 | 2004/01/05


OUT RECORDS:

S_NO | VC_NO | CITY| DATE|
1002 | X2002 | CITY2 | 2005/06/21
1001 | X1002 | CITY1 | 2001/05/01
1003 | X3002 | CITY3 | 2005/04/22
This is what I am getting when I use hash stage with key combination of (S_NO, VC_NO, CITY)

What I need in output file is:

S_NO | VC_NO | CITY| DATE|
1001 | X1002 | CITY1 | 2005/04/22
1002 | X2002 | CITY2 | 2005/06/21
1003 | X3002 | CITY3 | 2005/04/22

Thank you,
Ashok
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to understand that hash files do not support duplicate key combinations and resolve all differences via Destructive Overwrite. This means that the last version of the key written to the hash is the winner and the one that you'll pull out later.

Resort your data such that the desired 'first' key combination record is written to the hash last and you'll get back your needed output.
-craig

"You can never have too many knives" -- Logan Nine Fingers
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

Remember the hash file will have the last record of multiple key records. As Craig suggested you need to sort your data in such a fashion that the record you need is the last record for that key combination...
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

You can avoid writing the 2nd record into a hash file by checking these columns using stage variables. If the 1st rec and 2nd rec key columns matches, then do not write the record into hash file.

HTWH.

Regards
Saravanan
vmalviya
Participant
Posts: 4
Joined: Mon Jul 04, 2005 12:00 am

Post by vmalviya »

Hi saravana cuold u give me idiathat how we can remov duplicate row using stage variable
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Hi,

Sort the input data using the key columns and declare 2 stage variables for each key column one would hold previous rec value and other one holds current rec value. Prev rec value would be null when you read the first rec and you need to handle this when the values are assigned to that variable. Have the constraint in the transformer that write the output rec if current & prev row value is different. By this method you can avoid writing the subsequent records for the same key.

HTWH.

Regards
Saravanan
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

If i have understood correctly, then you want only that record in hash file which is latest(latest-for a given key column the date column is max).

Soln: Just sort all your key columns ascending and the date column descending.
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Isn't this a good job for an Aggregator stage with some combination of Max or Last as the aggregate function?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply