Concatenating multiple rows from the same column

Mark_E · Post by **Mark_E** » Tue Feb 12, 2008 8:23 am

Hi, i am trying to concatenate multiple rows from the same column based on a number into a single coulmn. rows per number can be quite large.

Ex from:

col1 col2
1 aa
1 bb
1 cc
1 dd
2 ww
2 er

into this

1 aa bb cc dd
2 ww er

any pointers would be greatly appreciated.

thanks

chulett · Post by **chulett** » Tue Feb 12, 2008 8:49 am

Define 'quite large'.

For rows coming in in a guaranteed order by col1, you can use stage variables to do the concatenation. Standard 'group change detection' on Col1 and, while Col1 is unchanged, build up a list of Col2 values in another stage variable. When Col1 changes, dump out the previous aggregate and start over. Your problem will be what to do with the last group, as there won't be anything to trigger it. Typically, people append a know 'last' row to the input set and use that to know when you are reading the last record. That or pass in a total record count as a job parameter and know you are at the end when you read that record number.

For a 'sortless' solution, I use a hashed file. Use a reference lookup to check to see if you have seen Col1 yet. When you first hit any Col1 value (lookup.NOTFOUND) write the record with the first Col2 value to that same hashed file. For every 'hit' append the current Col2 value to the lookup (aggregate) Col2 values from the hashed file. When you are done, read back the contents of the hashed file. Make sure your lookup isn't cached for this to work properly.

Mark_E · Post by **Mark_E** » Tue Feb 12, 2008 9:14 am

hi,

quite large as possibly 50 plus.

thanks for all the information

Teej · Post by **Teej** » Tue Feb 12, 2008 11:48 am

50? Ptth.

Just use a Stage Variable within Transformer to keep on appending based on your keys, outputting the results, then do a remove duplicate to eliminate everything but the last record that came out.

nivas · Post by **nivas** » Tue Feb 12, 2008 12:38 pm

Three steps involved.
1. In the SQL order the records on first column.
2. Inside the transform use stage variables and concatenate second columns for the same first column value.
3. out put the records from transformer to Hash File making the first column as the key column in hash file.

ray.wurlod · Post by **ray.wurlod** » Tue Feb 12, 2008 3:39 pm

Build the rows in Transformer stage then run them through an Aggregator stage set to deliver the Last row from each group.

asitagrawal · Post by **asitagrawal** » Wed Feb 13, 2008 5:07 am

Hi Ray,

Could you please help me to understand the approach suggested by you.

In a similar case, I had already used the approach suggested by Chulett i.e using the hashed file.

chulett · Post by **chulett** » Wed Feb 13, 2008 8:15 am

Hadn't thought of that approach. I would think it means...

Starting with sorted input, use stage variables to concatenate values per 'key' but don't worry about constraining the output. You'd get records like...

1, A
1, AB
1, ABC
2, A
2, AB

Etc. Then send the rows through an aggregator, group on the key field and set the aggregation method for the data field to 'Last'.

:D

asitagrawal · Post by **asitagrawal** » Wed Feb 13, 2008 8:18 am

Like this approach too.... Fantastic !!

Mark_E · Post by **Mark_E** » Wed Feb 13, 2008 10:44 am

hi guys, thanks for all your advice. i got it working by creating it in the stagevariables and keeping the last row as suggested.

thanks again

ArndW · Post by **ArndW** » Wed Feb 13, 2008 10:47 am

Mark_e - just remember to add a dummy line to the data so that the last and final group change is detected, otherwise your output will be missing one or more lines.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 13, 2008 3:29 pm

"Last" in the Aggregator means you don't need to do that. :D