Generating Multiple rows in output from single row in input
Moderators: chulett, rschirm, roy
Generating Multiple rows in output from single row in input
Hi All
In our project requirement, where there is a single row in input containing 3 fields F1, F2 and F3.
I have to generate an output file where there are four fields F1,F2,F3 and F4.
Now depending on the value in the input field F1, single/multiple output rows will be generated.
For eg.
The input file structure is
F1, F2, F3
The values are:
3 , a, b
Since the value of F1 is 3, i will need 3 rows in output as below:
F1, F2, F3 , F4
3, a , b , 1
3, a , b , 2
3, a , b , 3
Note: The value of F1 can be anything ; there is no upper limit for this.
Any advice towards achieving this will be of great help.
Regards,
Dhiren
In our project requirement, where there is a single row in input containing 3 fields F1, F2 and F3.
I have to generate an output file where there are four fields F1,F2,F3 and F4.
Now depending on the value in the input field F1, single/multiple output rows will be generated.
For eg.
The input file structure is
F1, F2, F3
The values are:
3 , a, b
Since the value of F1 is 3, i will need 3 rows in output as below:
F1, F2, F3 , F4
3, a , b , 1
3, a , b , 2
3, a , b , 3
Note: The value of F1 can be anything ; there is no upper limit for this.
Any advice towards achieving this will be of great help.
Regards,
Dhiren
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I think you're going to need a custom stage for this. My first thought was that you have a horizontal pivot here, but you would need to generate an arbitrary number of columns in the row (based on the value of F1) prior to the Pivot stage. And DataStage doesn't work with an arbitrary number of columns in a row. Is there a finite upper limit to F1 values? If so, this approach may be viable. F1, F2 and F3 would be pivot keys. (Yes, I note that you indicated that there is no upper limit, but it's always worth checking.)
In the custom stage you would need a loop that generated F1 rows for each input row, with an incrementing value for F4 reset to 1 for each new row being processed.
In the custom stage you would need a loop that generated F1 rows for each input row, with an incrementing value for F4 reset to 1 for each new row being processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
How do you feel about a row with 10003 columns? The last 10000 will have very similar derivations: If x > F1 Then @NULL Else x (for x running from 1 to 10000). Not pretty, I think the custom stage is for you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 407
- Joined: Mon Jun 27, 2005 8:54 am
- Location: Walker, Michigan
- Contact:
Can you use an external target stage to write to a delimited sequential file? Can you use awk? If so, I think this awk tidbit might do the trick.
Code: Select all
#!/bin/ksh
FNAME="${1}"
AFS="${2}"
awk -vAFS="${AFS}" -vFNAME="${FNAME}" 'BEGIN{FS=AFS;OFS=AFS;}
{
COUNT=1;
while ( COUNT <= $1 )
{
print $1,$2,$3,COUNT >FNAME;
COUNT++;
}
}'
exit 0
-
- Participant
- Posts: 1
- Joined: Fri Feb 02, 2007 4:33 am
hi Dhiren,
i think you can do it by using dimentation table also. you need to create a table which contain all possible value of F1 fields than join(inner join) this table with source table. After that u need to apply filter records which satisfy the condition f1<=3 like. try this may be this can help u.[/quote]
i think you can do it by using dimentation table also. you need to create a table which contain all possible value of F1 fields than join(inner join) this table with source table. After that u need to apply filter records which satisfy the condition f1<=3 like. try this may be this can help u.[/quote]
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Actually C wont work. Come to think of it, C variables wont hold values with line feeds. The workaround could have been writing to a file but that would require the execution to be sequential. Parallel writes wont work. I guess its either a Basic transformer or an awk script. C will work but it has to be sequential.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
That's news to me. So my sprintf(LineCount, "%d lines written.\n\n", lines); statement, which has worked for many years, isn't valid?
Last edited by ray.wurlod on Wed Feb 14, 2007 6:24 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Well I wrote a small piece of code in the morning and was stuck in between how to return a value with all rows. I was getting some junk data. Therefore my misinterepretation and hence my assumption that I made earlier. But your example Ray, was like a wake up call as I use that all the time too. I looked at my code again and added strcpy(All,"") in my code below. It just got rid of the junk data that I was getting.
Anyways, here is the complete routine.
dhiren, you can use this code after you do thorough testing. I tested it at home by manually passing data. It should word just fine.
Anyways, here is the complete routine.
Code: Select all
//Header Files
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
//Function that generates the rows
char * generateRows(char* F1, char* F2, char* F3)
{
//Initialize variables
const int SIZE = 1000;
int intF1;
char* temp = (char* )malloc(4*SIZE);
//typecasting of F1 to get the number of duplicates
intF1 = atoi(F1);
char* All = (char* )malloc(intF1*SIZE);
//Initialize variable
strcpy(All, "");
//Duplicating records and storing in variable All
for (int i = 1; i <= intF1 ; i++)
{
sprintf(temp, "%s,%s,%s,%i\n", F1, F2, F3, i);
strcat(All, temp);
}
free(temp);
return All;
}
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I think a buildop would be much easier. Then there are also no restrictions on sequential vs. parallel. That always throws a red flag for me.
We use buildops to handle situations similar to this, where the COBOL copybook has a single record with an occurs statement embedded. In these cases we'll read one record in and then interrogate the data within the occurs to determine if a new record should be created. If there are a max of 10, we may create 0 to 10 records as output from the single input.
Dhiren, your situation can be very easily handled in a buildop - I think this may be what Ray was talking about.
I used the following pieces to construct the buildop:
Input schema/table definition:
Output schema/table definition:
Per-record logic:
Only 2 things to keep in mind if you want to manually control how the records are output. 1) drop the record at the beginning - this essentially clears the buffer so you can populate it yourself. 2) For each record you create, transfer the data to the buffer and output (write) it.
If you are interested, here is a link where we used similar logic to handle an occurs in our input data:
Splitting One Record to Many
Multiple record entries to be produced from a single row.
Hope this helps.
Brad.
We use buildops to handle situations similar to this, where the COBOL copybook has a single record with an occurs statement embedded. In these cases we'll read one record in and then interrogate the data within the occurs to determine if a new record should be created. If there are a max of 10, we may create 0 to 10 records as output from the single input.
Dhiren, your situation can be very easily handled in a buildop - I think this may be what Ray was talking about.
I used the following pieces to construct the buildop:
Input schema/table definition:
Code: Select all
record
(
f1:int32;
f2:string[1];
f3:string[1];
)
Code: Select all
record
(
f1:int32;
f2:string[1];
f3:string[1];
f4:int32;
)
Code: Select all
// Drop default record, output will be handled manually
dropRecord();
int i;
for (i = 1; i <= in.f1; i++) {
out.f1 = in.f1;
out.f2 = in.f2;
out.f3 = in.f3;
out.f4 = i;
// Transfer record to buffer and output
transferRecord();
outputRecord();
}
If you are interested, here is a link where we used similar logic to handle an occurs in our input data:
Splitting One Record to Many
Multiple record entries to be produced from a single row.
Hope this helps.
Brad.
No restrictions for sequential vs. parallel in a routine as well. I think it depends upon how a person wants to do it.bcarlson wrote:I think a buildop would be much easier. Then there are also no restrictions on sequential vs. parallel. That always throws a red flag for me.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.