Multiple record entries to be produced from a single row.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Multiple record entries to be produced from a single row.

Post by thebird »

Hi,

I have a comma delimited source , which is to be loaded to a dataset.

I have 4 fields with different telephone numbers coming in, and have to create different entries in the dataset depending upon the number of telephone numbers.

If there is only 1 telephone, then 1 entry; if there are 2 telephones then 2 entries have to be made in the dataset.

How can this be done and which is the stage to be used for this? Can we use the Pivot stage for this requirement?

Thanks in advance.

Regards,

Thebird.
alhamilton
Participant
Posts: 12
Joined: Fri Apr 30, 2004 9:11 am

Post by alhamilton »

There may be several ways. In the past, I set up a table with one number column 1 - whatever. Then I used an odbc multi row lookup on that table. Once you have the opportunity to output multiple records per input record, you can use the output constraint to determine if you should.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

You could use the pivot stage. You also could use a multiple output from a transformer (or filter if you can use it to verify the qualifying information), and funnel the results.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Just wanted to throw out another option, since I haven't seen it mentioend yet, here or in other related topics. We use Buildops at our shop. They handle the ETL all at once in one buildop stage rather than a multitude of other stages.

A lot of our data comes from source that use occurences. Mostly mainframe sources with COBOL copybooks using OCCURS clauses. Fun stuff for a Unix guy. :wink:

If we have occurences, we import the file so that the "occurs" fields are loaded as vectors (arrays). That makes it a lot easier to process in the buildop. Sometimes there are several fields within the occurs. In that case it gets imported as a vector of subrecords. Then you can use the Split-Subrecord stage to create a vector of each field in the subrecord.

Anyway, for this example, let's keep it simple with just one repeated field. The input schema ends up looking like this:

Code: Select all

record (
    account_nbr: decimal[20,0];
    phone_nbr[5]: decimal[10,0];
)
Then the code for the buildop:

Code: Select all

// Drop default record, output handled manually 
dropRecord();      
                   
// Phone Number occurs up to 5 times. 
// - Create new record if phone_nbr > 0

occurs_lmt = 5; 
occurs_cnt = 0; 

for (int i=0; i<occurs_lmt; i++) { 
    if ( in.phone_nbr[i] <=  0 )
    {           
        // Record is empty, so skip it 
        continue;  
    }              

    // Move NOT NULL or key fields 
    out.account_nbr = in.account_nbr; 
    out.phone_occr_nbr = i + 1; 
                   
    // Move occurence fields 
    out.phone_nbr = in.phone_nbr[i]; 

    // Write record 
    transferRecord(); 
    outputRecord(); 
                   
    // Increment occurs counter
    occurs_cnt++;

}
By default, PX writes the current record to an output buffer and then automatically outputs it to the stream. When you need to create multiple records out of many, you have to tell it to ignore (or drop) that buffered record (dropRecord()) and let you do it explicitly. In the code above, if the phone number is 0 we do not output a record. If it is > 0, then we create a record. The transferRecord() and outputRecord() take the current record and write it to the buffer and output it to the stream (your output link).

Your output looks like this:

Code: Select all

record (
    account_nbr: decimal[20,0];
    phone_occr_nbr: smallint;
    phone_nbr: decimal[10,0];
)
I know not everyone likes buildops or writing C code. But they really are pretty easy. We started off using Torrent about 5-6 years ago, so when DataStage came along it was far easier to continue with the buildops than to switch to all sorts of built-in stages. In the long run, it seems to be much more efficient. Even for maintenance, since most of your ETL logic can be contained within one buildop.

Hope this helps.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

I do support the concept of using buildops, but when you do not have an expert C/C++ staff, it would be a maintenance nightmare.

Plus migration of buildops have its own interesting set of problems.

There are advantages and disadvantages to everything, naturally.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Yes, it helps to have some C/C++ background, but most of our staff had little or no background when they started. All it takes is one or two people that DO know C and/or C++ and the ball starts rolling.

I just don't want people to get scared off when they see "C/C++". For the most part, it (the buildop code) is extremely simple.
Post Reply