Page 1 of 1

Unique record counter-Hash partitioning before transformer

Posted: Tue Apr 15, 2014 11:33 am
by Maximus_Jack
Hi
Before posting this, i tried myself and searched a lot but couldnt find a solution.

my scenario is,
Using an SQL i'm retriving around 80 columns and passing it to a transformer, in the transformer i need to assign a number in incremental order for every unique record that comes in based on a particular column.

but the catch is, i'm doing a hash paritioning in the input link of the transformer, so the number of rows that comes to each partition may not be the same, in that case the usual logic using numpartitions, @partitionnum is not working...

any help on this is appreciated

thanks
MJ

Posted: Tue Apr 15, 2014 12:17 pm
by zulfi123786
I am not sure If I understood it correctly, you need to count the number of unique records base on a column for the entire input data ?

If so, in the transformer as you are doing hash partitioning, have a stage variable to count the unique records per partition pass it to output and finally have a aggregator to find max of this value then reaggregare in sequential mode to find the sum of max values.

Posted: Tue Apr 15, 2014 1:21 pm
by Maximus_Jack
sorry if my explanation was not clear, now i have edited my post, hope it helps..

a number has to be generated for every unique value that comes in based on a particular, this is one of the requirement i need to handle, i dont want to create a seperate job for doing this functionality alone.. because the other columns has to go through a lot of transformation in the transformer, so i'm striving to get this done as part of this job itself.. is there any way?

thanks
MJ

Posted: Tue Apr 15, 2014 1:44 pm
by zulfi123786
If you dont want to settle with holes in the generated column then you can either run the job on sequential mode (which you wouldn't as if it was then this thread would not have existed :D ) or use NextSurrogateKey() with a block size of 1 which again kind of serializes your processing.

Posted: Tue Apr 15, 2014 4:48 pm
by Maximus_Jack
Hi all...

any help on this .... thanks MJ

Posted: Thu Apr 17, 2014 9:15 pm
by Maximus_Jack
Hi All... is there any solution for this ???

thanks
MJ

Posted: Fri Apr 18, 2014 12:55 pm
by bharathappriyan
Hi,
If i am not wrong, your reuqirement is to assign a number to unique value for a particular column. But it doesn't need to be in order.
if that is the case, when read the data from table create a column with rownum. copy the data in two links. In One link copy the rownum & unique value column and use remove duplicate based on the unique value. then join the data with other link. Or you can do that in the query itself using row_over() function.

Thanks,
Bharathappriyan

Posted: Tue Apr 22, 2014 4:14 am
by prasson_ibm
Hi,

My advice is define a stage veriable in Transformer stage.Hash partition the input column and compare the current record with the previous record,if both are different then increment the counter veriable otherwise let it be same.

Posted: Tue Apr 22, 2014 2:32 pm
by ray.wurlod
I'd use the ROWNUM or equivalent function in the SQL.