Page 1 of 1

help needed with hash file with key combinations

Posted: Mon Jul 04, 2005 8:02 pm
by ashok
Hi,
I need help with below problem I am getting,

All data in file is in sorted in order on 4 columns (S_NO, VC_NO, CITY, DATE ),

I am trying to pull first record from combination of three keys (S_NO, VC_NO, CITY), when I try to use hash file stage it is pulling correct key combination but not the first record I need, let me give a example,

Example:
INPUT RECORDS:

S_NO | VC_NO | CITY| DATE|
1001 | X1002 | CITY1 | 2005/04/22
1001 | X1002 | CITY1 | 2001/05/01
1002 | X2002 | CITY2 | 2005/06/21
1003 | X3002 | CITY3 | 2005/04/22
1003 | X3002 | CITY3 | 2004/01/05


OUT RECORDS:

S_NO | VC_NO | CITY| DATE|
1002 | X2002 | CITY2 | 2005/06/21
1001 | X1002 | CITY1 | 2001/05/01
1003 | X3002 | CITY3 | 2005/04/22
This is what I am getting when I use hash stage with key combination of (S_NO, VC_NO, CITY)

What I need in output file is:

S_NO | VC_NO | CITY| DATE|
1001 | X1002 | CITY1 | 2005/04/22
1002 | X2002 | CITY2 | 2005/06/21
1003 | X3002 | CITY3 | 2005/04/22

Thank you,
Ashok

Posted: Mon Jul 04, 2005 8:22 pm
by chulett
You need to understand that hash files do not support duplicate key combinations and resolve all differences via Destructive Overwrite. This means that the last version of the key written to the hash is the winner and the one that you'll pull out later.

Resort your data such that the desired 'first' key combination record is written to the hash last and you'll get back your needed output.

Posted: Mon Jul 04, 2005 8:46 pm
by amsh76
Remember the hash file will have the last record of multiple key records. As Craig suggested you need to sort your data in such a fashion that the record you need is the last record for that key combination...

Posted: Tue Jul 05, 2005 12:33 am
by elavenil
You can avoid writing the 2nd record into a hash file by checking these columns using stage variables. If the 1st rec and 2nd rec key columns matches, then do not write the record into hash file.

HTWH.

Regards
Saravanan

Posted: Tue Jul 05, 2005 2:25 am
by vmalviya
Hi saravana cuold u give me idiathat how we can remov duplicate row using stage variable

Posted: Tue Jul 05, 2005 2:32 am
by elavenil
Hi,

Sort the input data using the key columns and declare 2 stage variables for each key column one would hold previous rec value and other one holds current rec value. Prev rec value would be null when you read the first rec and you need to handle this when the values are assigned to that variable. Have the constraint in the transformer that write the output rec if current & prev row value is different. By this method you can avoid writing the subsequent records for the same key.

HTWH.

Regards
Saravanan

Posted: Tue Jul 05, 2005 4:53 am
by talk2shaanc
If i have understood correctly, then you want only that record in hash file which is latest(latest-for a given key column the date column is max).

Soln: Just sort all your key columns ascending and the date column descending.

Posted: Tue Jul 05, 2005 3:08 pm
by ray.wurlod
Isn't this a good job for an Aggregator stage with some combination of Max or Last as the aggregate function?