Page 1 of 1

Concatenating multiple rows from the same column

Posted: Tue Feb 12, 2008 8:23 am
by Mark_E
Hi, i am trying to concatenate multiple rows from the same column based on a number into a single coulmn. rows per number can be quite large.

Ex from:

col1 col2
1 aa
1 bb
1 cc
1 dd
2 ww
2 er

into this

1 aa bb cc dd
2 ww er

any pointers would be greatly appreciated.

thanks

Posted: Tue Feb 12, 2008 8:49 am
by chulett
Define 'quite large'.

For rows coming in in a guaranteed order by col1, you can use stage variables to do the concatenation. Standard 'group change detection' on Col1 and, while Col1 is unchanged, build up a list of Col2 values in another stage variable. When Col1 changes, dump out the previous aggregate and start over. Your problem will be what to do with the last group, as there won't be anything to trigger it. Typically, people append a know 'last' row to the input set and use that to know when you are reading the last record. That or pass in a total record count as a job parameter and know you are at the end when you read that record number.

For a 'sortless' solution, I use a hashed file. Use a reference lookup to check to see if you have seen Col1 yet. When you first hit any Col1 value (lookup.NOTFOUND) write the record with the first Col2 value to that same hashed file. For every 'hit' append the current Col2 value to the lookup (aggregate) Col2 values from the hashed file. When you are done, read back the contents of the hashed file. Make sure your lookup isn't cached for this to work properly.

Posted: Tue Feb 12, 2008 9:14 am
by Mark_E
hi,

quite large as possibly 50 plus.

thanks for all the information

Posted: Tue Feb 12, 2008 11:48 am
by Teej
50? Ptth.

Just use a Stage Variable within Transformer to keep on appending based on your keys, outputting the results, then do a remove duplicate to eliminate everything but the last record that came out.

Posted: Tue Feb 12, 2008 12:38 pm
by nivas
Three steps involved.
1. In the SQL order the records on first column.
2. Inside the transform use stage variables and concatenate second columns for the same first column value.
3. out put the records from transformer to Hash File making the first column as the key column in hash file.

Posted: Tue Feb 12, 2008 3:39 pm
by ray.wurlod
Build the rows in Transformer stage then run them through an Aggregator stage set to deliver the Last row from each group.

Posted: Wed Feb 13, 2008 5:07 am
by asitagrawal
Hi Ray,

Could you please help me to understand the approach suggested by you.

In a similar case, I had already used the approach suggested by Chulett i.e using the hashed file.

Posted: Wed Feb 13, 2008 8:15 am
by chulett
Hadn't thought of that approach. I would think it means...

Starting with sorted input, use stage variables to concatenate values per 'key' but don't worry about constraining the output. You'd get records like...

1, A
1, AB
1, ABC
2, A
2, AB

Etc. Then send the rows through an aggregator, group on the key field and set the aggregation method for the data field to 'Last'.

:D

Posted: Wed Feb 13, 2008 8:18 am
by asitagrawal
Like this approach too.... Fantastic !! :idea:

Posted: Wed Feb 13, 2008 10:44 am
by Mark_E
hi guys, thanks for all your advice. i got it working by creating it in the stagevariables and keeping the last row as suggested.

thanks again

Posted: Wed Feb 13, 2008 10:47 am
by ArndW
Mark_e - just remember to add a dummy line to the data so that the last and final group change is detected, otherwise your output will be missing one or more lines.

Posted: Wed Feb 13, 2008 3:29 pm
by ray.wurlod
"Last" in the Aggregator means you don't need to do that. :D