Hi,
I have a comma delimited source , which is to be loaded to a dataset.
I have 4 fields with different telephone numbers coming in, and have to create different entries in the dataset depending upon the number of telephone numbers.
If there is only 1 telephone, then 1 entry; if there are 2 telephones then 2 entries have to be made in the dataset.
How can this be done and which is the stage to be used for this? Can we use the Pivot stage for this requirement?
Thanks in advance.
Regards,
Thebird.
Multiple record entries to be produced from a single row.
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 12
- Joined: Fri Apr 30, 2004 9:11 am
Just wanted to throw out another option, since I haven't seen it mentioend yet, here or in other related topics. We use Buildops at our shop. They handle the ETL all at once in one buildop stage rather than a multitude of other stages.
A lot of our data comes from source that use occurences. Mostly mainframe sources with COBOL copybooks using OCCURS clauses. Fun stuff for a Unix guy.
If we have occurences, we import the file so that the "occurs" fields are loaded as vectors (arrays). That makes it a lot easier to process in the buildop. Sometimes there are several fields within the occurs. In that case it gets imported as a vector of subrecords. Then you can use the Split-Subrecord stage to create a vector of each field in the subrecord.
Anyway, for this example, let's keep it simple with just one repeated field. The input schema ends up looking like this:
Then the code for the buildop:
By default, PX writes the current record to an output buffer and then automatically outputs it to the stream. When you need to create multiple records out of many, you have to tell it to ignore (or drop) that buffered record (dropRecord()) and let you do it explicitly. In the code above, if the phone number is 0 we do not output a record. If it is > 0, then we create a record. The transferRecord() and outputRecord() take the current record and write it to the buffer and output it to the stream (your output link).
Your output looks like this:
I know not everyone likes buildops or writing C code. But they really are pretty easy. We started off using Torrent about 5-6 years ago, so when DataStage came along it was far easier to continue with the buildops than to switch to all sorts of built-in stages. In the long run, it seems to be much more efficient. Even for maintenance, since most of your ETL logic can be contained within one buildop.
Hope this helps.
A lot of our data comes from source that use occurences. Mostly mainframe sources with COBOL copybooks using OCCURS clauses. Fun stuff for a Unix guy.
![Wink :wink:](./images/smilies/icon_wink.gif)
If we have occurences, we import the file so that the "occurs" fields are loaded as vectors (arrays). That makes it a lot easier to process in the buildop. Sometimes there are several fields within the occurs. In that case it gets imported as a vector of subrecords. Then you can use the Split-Subrecord stage to create a vector of each field in the subrecord.
Anyway, for this example, let's keep it simple with just one repeated field. The input schema ends up looking like this:
Code: Select all
record (
account_nbr: decimal[20,0];
phone_nbr[5]: decimal[10,0];
)
Code: Select all
// Drop default record, output handled manually
dropRecord();
// Phone Number occurs up to 5 times.
// - Create new record if phone_nbr > 0
occurs_lmt = 5;
occurs_cnt = 0;
for (int i=0; i<occurs_lmt; i++) {
if ( in.phone_nbr[i] <= 0 )
{
// Record is empty, so skip it
continue;
}
// Move NOT NULL or key fields
out.account_nbr = in.account_nbr;
out.phone_occr_nbr = i + 1;
// Move occurence fields
out.phone_nbr = in.phone_nbr[i];
// Write record
transferRecord();
outputRecord();
// Increment occurs counter
occurs_cnt++;
}
Your output looks like this:
Code: Select all
record (
account_nbr: decimal[20,0];
phone_occr_nbr: smallint;
phone_nbr: decimal[10,0];
)
Hope this helps.
Yes, it helps to have some C/C++ background, but most of our staff had little or no background when they started. All it takes is one or two people that DO know C and/or C++ and the ball starts rolling.
I just don't want people to get scared off when they see "C/C++". For the most part, it (the buildop code) is extremely simple.
I just don't want people to get scared off when they see "C/C++". For the most part, it (the buildop code) is extremely simple.