Concatenating multiple rows from the same column
Moderators: chulett, rschirm, roy
Concatenating multiple rows from the same column
Hi, i am trying to concatenate multiple rows from the same column based on a number into a single coulmn. rows per number can be quite large.
Ex from:
col1 col2
1 aa
1 bb
1 cc
1 dd
2 ww
2 er
into this
1 aa bb cc dd
2 ww er
any pointers would be greatly appreciated.
thanks
Ex from:
col1 col2
1 aa
1 bb
1 cc
1 dd
2 ww
2 er
into this
1 aa bb cc dd
2 ww er
any pointers would be greatly appreciated.
thanks
mark_e
Define 'quite large'.
For rows coming in in a guaranteed order by col1, you can use stage variables to do the concatenation. Standard 'group change detection' on Col1 and, while Col1 is unchanged, build up a list of Col2 values in another stage variable. When Col1 changes, dump out the previous aggregate and start over. Your problem will be what to do with the last group, as there won't be anything to trigger it. Typically, people append a know 'last' row to the input set and use that to know when you are reading the last record. That or pass in a total record count as a job parameter and know you are at the end when you read that record number.
For a 'sortless' solution, I use a hashed file. Use a reference lookup to check to see if you have seen Col1 yet. When you first hit any Col1 value (lookup.NOTFOUND) write the record with the first Col2 value to that same hashed file. For every 'hit' append the current Col2 value to the lookup (aggregate) Col2 values from the hashed file. When you are done, read back the contents of the hashed file. Make sure your lookup isn't cached for this to work properly.
For rows coming in in a guaranteed order by col1, you can use stage variables to do the concatenation. Standard 'group change detection' on Col1 and, while Col1 is unchanged, build up a list of Col2 values in another stage variable. When Col1 changes, dump out the previous aggregate and start over. Your problem will be what to do with the last group, as there won't be anything to trigger it. Typically, people append a know 'last' row to the input set and use that to know when you are reading the last record. That or pass in a total record count as a job parameter and know you are at the end when you read that record number.
For a 'sortless' solution, I use a hashed file. Use a reference lookup to check to see if you have seen Col1 yet. When you first hit any Col1 value (lookup.NOTFOUND) write the record with the first Col2 value to that same hashed file. For every 'hit' append the current Col2 value to the lookup (aggregate) Col2 values from the hashed file. When you are done, read back the contents of the hashed file. Make sure your lookup isn't cached for this to work properly.
Last edited by chulett on Tue Feb 12, 2008 9:31 am, edited 1 time in total.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
Hadn't thought of that approach. I would think it means...
Starting with sorted input, use stage variables to concatenate values per 'key' but don't worry about constraining the output. You'd get records like...
1, A
1, AB
1, ABC
2, A
2, AB
Etc. Then send the rows through an aggregator, group on the key field and set the aggregation method for the data field to 'Last'.
:D
Starting with sorted input, use stage variables to concatenate values per 'key' but don't worry about constraining the output. You'd get records like...
1, A
1, AB
1, ABC
2, A
2, AB
Etc. Then send the rows through an aggregator, group on the key field and set the aggregation method for the data field to 'Last'.
:D
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
Mark_e - just remember to add a dummy line to the data so that the last and final group change is detected, otherwise your output will be missing one or more lines.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: