Generating Multiple rows in output from single row in input

dhiren · Post by **dhiren** » Fri Nov 11, 2005 3:34 am

Hi All
In our project requirement, where there is a single row in input containing 3 fields F1, F2 and F3.

I have to generate an output file where there are four fields F1,F2,F3 and F4.

Now depending on the value in the input field F1, single/multiple output rows will be generated.

For eg.
The input file structure is
F1, F2, F3

The values are:
3 , a, b

Since the value of F1 is 3, i will need 3 rows in output as below:

F1, F2, F3 , F4
3, a , b , 1
3, a , b , 2
3, a , b , 3

Note: The value of F1 can be anything ; there is no upper limit for this.

Any advice towards achieving this will be of great help.

Regards,
Dhiren

ray.wurlod · Post by **ray.wurlod** » Fri Nov 11, 2005 4:18 am

I think you're going to need a custom stage for this. My first thought was that you have a horizontal pivot here, but you would need to generate an arbitrary number of columns in the row (based on the value of F1) prior to the Pivot stage. And DataStage doesn't work with an arbitrary number of columns in a row. Is there a finite upper limit to F1 values? If so, this approach may be viable. F1, F2 and F3 would be pivot keys. (Yes, I note that you indicated that there is no upper limit, but it's always worth checking.)

In the custom stage you would need a loop that generated F1 rows for each input row, with an incrementing value for F4 reset to 1 for each new row being processed.

dhiren · Post by **dhiren** » Fri Nov 11, 2005 4:29 am

Ray,
Since we are dealing with the input processed data, i see that there will be no change in the upper limit.
The upper limit can be assumed to be 10000 here.

Regards,
Dhiren

ray.wurlod · Post by **ray.wurlod** » Fri Nov 11, 2005 6:51 am

How do you feel about a row with 10003 columns? The last 10000 will have very similar derivations: If x > F1 Then @NULL Else x (for x running from 1 to 10000). Not pretty, I think the custom stage is for you.

Ultramundane · Post by **Ultramundane** » Fri Nov 11, 2005 9:36 am

Can you use an external target stage to write to a delimited sequential file? Can you use awk? If so, I think this awk tidbit might do the trick.

Code: Select all

#!/bin/ksh

FNAME="${1}"
AFS="${2}"

awk -vAFS="${AFS}" -vFNAME="${FNAME}" 'BEGIN{FS=AFS;OFS=AFS;}
{
   COUNT=1;
   while ( COUNT <= $1 )
   {
      print $1,$2,$3,COUNT >FNAME;
      COUNT++;
   }
}'

exit 0

bhavna.jha · Post by **bhavna.jha** » Wed Feb 14, 2007 12:43 am

hi Dhiren,

i think you can do it by using dimentation table also. you need to create a table which contain all possible value of F1 fields than join(inner join) this table with source table. After that u need to apply filter records which satisfy the condition f1<=3 like. try this may be this can help u.[/quote]

DSguru2B · Post by **DSguru2B** » Wed Feb 14, 2007 7:51 am

awk will definately do the trick. You can also write a C routine which takes in your three columns and returns the generated rows.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 14, 2007 2:38 pm

Or you can write the same C code in a Build stage, which is my original suggestion.

DSguru2B · Post by **DSguru2B** » Wed Feb 14, 2007 2:51 pm

Actually C wont work. Come to think of it, C variables wont hold values with line feeds. The workaround could have been writing to a file but that would require the execution to be sequential. Parallel writes wont work. I guess its either a Basic transformer or an awk script. C will work but it has to be sequential.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 14, 2007 2:57 pm

That's news to me. So my sprintf(LineCount, "%d lines written.\n\n", lines); statement, which has worked for many years, isn't valid?

DSguru2B · Post by **DSguru2B** » Wed Feb 14, 2007 3:07 pm

My apologies. I was trying to build the same but was messing up at a place. I take my words back. If I get this done in a while Ill post the code.
My apologies once again.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 14, 2007 3:08 pm

Ah, the ever-hasty DSguruji!

DSguru2B · Post by **DSguru2B** » Wed Feb 14, 2007 4:57 pm

Well I wrote a small piece of code in the morning and was stuck in between how to return a value with all rows. I was getting some junk data. Therefore my misinterepretation and hence my assumption that I made earlier. But your example Ray, was like a wake up call as I use that all the time too. I looked at my code again and added strcpy(All,"") in my code below. It just got rid of the junk data that I was getting.
Anyways, here is the complete routine.

Code: Select all

//Header Files
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

//Function that generates the rows
char * generateRows(char* F1, char* F2, char* F3)
{
  //Initialize variables
  const int SIZE = 1000;
  int intF1;
  
  char* temp = (char* )malloc(4*SIZE);

 //typecasting of F1 to get the number of duplicates
  intF1 = atoi(F1);

  char* All = (char* )malloc(intF1*SIZE);

//Initialize variable
  strcpy(All, "");

//Duplicating records and storing in variable All
  for (int i = 1; i <= intF1 ; i++)
  {
     sprintf(temp, "%s,%s,%s,%i\n", F1, F2, F3, i);
     strcat(All, temp);
	 
  }

  free(temp);
  return All;
}

dhiren, you can use this code after you do thorough testing. I tested it at home by manually passing data. It should word just fine.

bcarlson · Post by **bcarlson** » Wed Feb 21, 2007 4:09 pm

I think a buildop would be much easier. Then there are also no restrictions on sequential vs. parallel. That always throws a red flag for me.

We use buildops to handle situations similar to this, where the COBOL copybook has a single record with an occurs statement embedded. In these cases we'll read one record in and then interrogate the data within the occurs to determine if a new record should be created. If there are a max of 10, we may create 0 to 10 records as output from the single input.

Dhiren, your situation can be very easily handled in a buildop - I think this may be what Ray was talking about.

I used the following pieces to construct the buildop:

Input schema/table definition:

Code: Select all

record
(
        f1:int32;
        f2:string[1];
        f3:string[1];
)

Output schema/table definition:

Code: Select all

record
(
        f1:int32;
        f2:string[1];
        f3:string[1];
        f4:int32;
)

Per-record logic:

Code: Select all

// Drop default record, output will be handled manually 
dropRecord(); 

int i;

for (i = 1; i <= in.f1; i++) {
    out.f1 = in.f1;
    out.f2 = in.f2;
    out.f3 = in.f3;
    
    out.f4 = i;

    // Transfer record to buffer and output
    transferRecord(); 
    outputRecord();
}

Only 2 things to keep in mind if you want to manually control how the records are output. 1) drop the record at the beginning - this essentially clears the buffer so you can populate it yourself. 2) For each record you create, transfer the data to the buffer and output (write) it.

If you are interested, here is a link where we used similar logic to handle an occurs in our input data:

Splitting One Record to Many

Multiple record entries to be produced from a single row.

Hope this helps.

Brad.

DSguru2B · Post by **DSguru2B** » Wed Feb 21, 2007 4:17 pm

bcarlson wrote:I think a buildop would be much easier. Then there are also no restrictions on sequential vs. parallel. That always throws a red flag for me.

No restrictions for sequential vs. parallel in a routine as well. I think it depends upon how a person wants to do it.