creating multiple output record layouts based on rec type

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

cspeter8
Participant
Posts: 5
Joined: Wed Nov 02, 2005 12:31 pm

creating multiple output record layouts based on rec type

Post by cspeter8 »

I have a large input file containing multple record types. Each record type has a different record layout. Is there a way to make one read-pass through the input file, and for each record-type encountered, select a specific output link, with a record layout correct for that record-type?

I am trying to avoid mapping my output columns via substringing from a single big data input field. I'd like to hava one big data input field after the rectype field, and overlay on that same record-buffer one of 5 different record-layouts on the output side, depending on which output link is selected.

Or is there a way to do this in 5 different stages, with just one output link in each stage?

I found a handy way to do this in Datastage server with complex flat files, but Datastage PX complex flat files don't allow different record layouts on each output link that overlay the same storage area the way Datastage server does.

thanks
Stephen Peterson
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Why don't you just containerize the Server CFF stage and use in your PX job?

Or

Is there a problem using a Server job to just normalize the data into separate files and then have PX jobs pick it up from there?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
cspeter8
Participant
Posts: 5
Joined: Wed Nov 02, 2005 12:31 pm

Post by cspeter8 »

I've never used containers, but could give it a try to see if I can make it work that way. This option sounds better than creating a seperate server job to do the preliminary processing - I'd like to avoid landing the files unnecessarily.
Stephen Peterson
cspeter8
Participant
Posts: 5
Joined: Wed Nov 02, 2005 12:31 pm

Post by cspeter8 »

Looking into containers, it looks like they all require one input link and one output link only for the container's interface to the outside.

This doesn't seem workable - I need 5 output links on my container. Am I missing something in your suggestion?

Thanks
Stephen Peterson
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Five containers, each with a different record format output.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
cspeter8
Participant
Posts: 5
Joined: Wed Nov 02, 2005 12:31 pm

Post by cspeter8 »

Kenneth,

Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?

Thanks
Stephen Peterson
bsreenu
Participant
Posts: 22
Joined: Mon Aug 16, 2004 3:57 pm

CFF-creating multiple output record layouts based on rec typ

Post by bsreenu »

We have the same situation as described above and we are unable to use CFF stage (in PX).
Is there any solution yet?

I want to know if CFF stage(in PX) can handle multiple record types?And how? Each record type has a different record layout. And each records is delimited by newline char (\n)

Thanks..!
seanc217
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 15, 2005 9:22 am

Post by seanc217 »

I have the same problem. I wound up writing a generic file splitter to split the record types out. In some of the files I had to further split them out. It was tedious to write the code, but it works well. The prototype we wrote in VB. I just converted it to Java because we run both in a Windows and Unix environment.

The hawk beta CFF Stage is supposed to support multi-file formats. I have not had a chance to try it out yet.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

cspeter8 wrote:Kenneth,

Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?

Thanks
Never replied, sorry. I don't like containers, but they're handy if a Server stage, like the CFF, has more features and you'd rather use it that the PX CFF stage. Since you have to read one row at a time from the stage, a parallel read is really not an option, so the Server stage inside a container is an option.

5 stages reading the same file seems inefficient, but you're dealing with record types so that's the situation you can't change. If you could have different record types go to different output links, leveraging a single read from the source, that'd be cool. Alas, ...
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
lfong
Premium Member
Premium Member
Posts: 42
Joined: Fri Sep 09, 2005 7:48 am

Post by lfong »

dt12946 a quick question.
I have a record where the record type is at the end of the record. I need to use the column input stage to input colums for the fields before the record type field.
what is the best solution for this.

Thanks
bpsprecher
Premium Member
Premium Member
Posts: 21
Joined: Mon Mar 08, 2004 11:42 am

Post by bpsprecher »

lfong,

Prior to your Column Import you need to be able to separate the record type from the rest of the data. I suppose you are asking how to determine where your record type is because the records aren't fixed length. You can place a Transformer (or Modify, if brave) between your Sequential File & Switch (or Filter) stages to peel off your value.

Use a derivation for your Record_Type column like:
ALLDATA[2]

to strip off the last 2 bytes of a record.

Good luck,
Brian
Brian Sprecher
IBM
Lenexa, KS
lfong
Premium Member
Premium Member
Posts: 42
Joined: Fri Sep 09, 2005 7:48 am

Post by lfong »

dt12946,
this is the situation I have:
Record A
field1 pic x(4)
field2 pic 99
rectyp pic x
field5 pic x(5)
field6 pic 9(9)

Record B
field1 pic 9
field2 pic 99999
rectyp pic x
rec4 pic x(14)
field6 pic 9(9)


How would I split a record into these two records above based on the rectyp.

Thanks
bpsprecher
Premium Member
Premium Member
Posts: 21
Joined: Mon Mar 08, 2004 11:42 am

Post by bpsprecher »

Here's how I'd do it...

Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)

Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.

Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).

This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).

Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.

Brian
Brian Sprecher
IBM
Lenexa, KS
lfong
Premium Member
Premium Member
Posts: 42
Joined: Fri Sep 09, 2005 7:48 am

Post by lfong »

dt12946 wrote:Here's how I'd do it...

Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)

Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.

Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).

This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).

Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.

Brian
I just set up this solution and came up with a problem. The output from the first column import stage becomes the input for the second column import stage.
lfong
Premium Member
Premium Member
Posts: 42
Joined: Fri Sep 09, 2005 7:48 am

Post by lfong »

Not a problem, when I imported the target record type, I was able to map everything.

Thanks
Post Reply