Concatenating multiple rows from the same column

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Mark_E
Premium Member
Premium Member
Posts: 72
Joined: Fri Jan 23, 2004 3:04 am

Concatenating multiple rows from the same column

Post by Mark_E »

Hi, i am trying to concatenate multiple rows from the same column based on a number into a single coulmn. rows per number can be quite large.

Ex from:

col1 col2
1 aa
1 bb
1 cc
1 dd
2 ww
2 er

into this

1 aa bb cc dd
2 ww er

any pointers would be greatly appreciated.

thanks
mark_e
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define 'quite large'.

For rows coming in in a guaranteed order by col1, you can use stage variables to do the concatenation. Standard 'group change detection' on Col1 and, while Col1 is unchanged, build up a list of Col2 values in another stage variable. When Col1 changes, dump out the previous aggregate and start over. Your problem will be what to do with the last group, as there won't be anything to trigger it. Typically, people append a know 'last' row to the input set and use that to know when you are reading the last record. That or pass in a total record count as a job parameter and know you are at the end when you read that record number.

For a 'sortless' solution, I use a hashed file. Use a reference lookup to check to see if you have seen Col1 yet. When you first hit any Col1 value (lookup.NOTFOUND) write the record with the first Col2 value to that same hashed file. For every 'hit' append the current Col2 value to the lookup (aggregate) Col2 values from the hashed file. When you are done, read back the contents of the hashed file. Make sure your lookup isn't cached for this to work properly.
Last edited by chulett on Tue Feb 12, 2008 9:31 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mark_E
Premium Member
Premium Member
Posts: 72
Joined: Fri Jan 23, 2004 3:04 am

Post by Mark_E »

hi,

quite large as possibly 50 plus.

thanks for all the information
mark_e
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

50? Ptth.

Just use a Stage Variable within Transformer to keep on appending based on your keys, outputting the results, then do a remove duplicate to eliminate everything but the last record that came out.
nivas
Participant
Posts: 117
Joined: Sun Mar 21, 2004 4:40 pm

Post by nivas »

Three steps involved.
1. In the SQL order the records on first column.
2. Inside the transform use stage variables and concatenate second columns for the same first column value.
3. out put the records from transformer to Hash File making the first column as the key column in hash file.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Build the rows in Transformer stage then run them through an Aggregator stage set to deliver the Last row from each group.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Hi Ray,

Could you please help me to understand the approach suggested by you.

In a similar case, I had already used the approach suggested by Chulett i.e using the hashed file.
Share to Learn, and Learn to Share.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hadn't thought of that approach. I would think it means...

Starting with sorted input, use stage variables to concatenate values per 'key' but don't worry about constraining the output. You'd get records like...

1, A
1, AB
1, ABC
2, A
2, AB

Etc. Then send the rows through an aggregator, group on the key field and set the aggregation method for the data field to 'Last'.

:D
-craig

"You can never have too many knives" -- Logan Nine Fingers
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Like this approach too.... Fantastic !! :idea:
Share to Learn, and Learn to Share.
Mark_E
Premium Member
Premium Member
Posts: 72
Joined: Fri Jan 23, 2004 3:04 am

Post by Mark_E »

hi guys, thanks for all your advice. i got it working by creating it in the stagevariables and keeping the last row as suggested.

thanks again
mark_e
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Mark_e - just remember to add a dummy line to the data so that the last and final group change is detected, otherwise your output will be missing one or more lines.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Last" in the Aggregator means you don't need to do that. :D
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply