creating multiple output record layouts based on rec type

cspeter8 · Post by **cspeter8** » Wed Nov 02, 2005 1:15 pm

I have a large input file containing multple record types. Each record type has a different record layout. Is there a way to make one read-pass through the input file, and for each record-type encountered, select a specific output link, with a record layout correct for that record-type?

I am trying to avoid mapping my output columns via substringing from a single big data input field. I'd like to hava one big data input field after the rectype field, and overlay on that same record-buffer one of 5 different record-layouts on the output side, depending on which output link is selected.

Or is there a way to do this in 5 different stages, with just one output link in each stage?

I found a handy way to do this in Datastage server with complex flat files, but Datastage PX complex flat files don't allow different record layouts on each output link that overlay the same storage area the way Datastage server does.

thanks

kcbland · Post by **kcbland** » Wed Nov 02, 2005 1:22 pm

Why don't you just containerize the Server CFF stage and use in your PX job?

Or

Is there a problem using a Server job to just normalize the data into separate files and then have PX jobs pick it up from there?

cspeter8 · Post by **cspeter8** » Wed Nov 02, 2005 2:09 pm

I've never used containers, but could give it a try to see if I can make it work that way. This option sounds better than creating a seperate server job to do the preliminary processing - I'd like to avoid landing the files unnecessarily.

cspeter8 · Post by **cspeter8** » Wed Nov 02, 2005 2:29 pm

Looking into containers, it looks like they all require one input link and one output link only for the container's interface to the outside.

This doesn't seem workable - I need 5 output links on my container. Am I missing something in your suggestion?

Thanks

kcbland · Post by **kcbland** » Wed Nov 02, 2005 2:43 pm

Five containers, each with a different record format output.

cspeter8 · Post by **cspeter8** » Thu Nov 03, 2005 8:03 am

Kenneth,

Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?

Thanks

bsreenu · Post by **bsreenu** » Wed Jan 18, 2006 12:20 pm

We have the same situation as described above and we are unable to use CFF stage (in PX).
Is there any solution yet?

I want to know if CFF stage(in PX) can handle multiple record types?And how? Each record type has a different record layout. And each records is delimited by newline char (\n)

Thanks..!

seanc217 · Post by **seanc217** » Wed Jan 18, 2006 12:33 pm

I have the same problem. I wound up writing a generic file splitter to split the record types out. In some of the files I had to further split them out. It was tedious to write the code, but it works well. The prototype we wrote in VB. I just converted it to Java because we run both in a Windows and Unix environment.

The hawk beta CFF Stage is supposed to support multi-file formats. I have not had a chance to try it out yet.

kcbland · Post by **kcbland** » Wed Jan 18, 2006 12:54 pm

cspeter8 wrote:Kenneth,

Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?

Thanks

Never replied, sorry. I don't like containers, but they're handy if a Server stage, like the CFF, has more features and you'd rather use it that the PX CFF stage. Since you have to read one row at a time from the stage, a parallel read is really not an option, so the Server stage inside a container is an option.

5 stages reading the same file seems inefficient, but you're dealing with record types so that's the situation you can't change. If you could have different record types go to different output links, leveraging a single read from the source, that'd be cool. Alas, ...

lfong · Post by **lfong** » Wed Oct 17, 2007 1:15 pm

dt12946 a quick question.
I have a record where the record type is at the end of the record. I need to use the column input stage to input colums for the fields before the record type field.
what is the best solution for this.

Thanks

bpsprecher · Post by **bpsprecher** » Wed Oct 17, 2007 1:54 pm

lfong,

Prior to your Column Import you need to be able to separate the record type from the rest of the data. I suppose you are asking how to determine where your record type is because the records aren't fixed length. You can place a Transformer (or Modify, if brave) between your Sequential File & Switch (or Filter) stages to peel off your value.

Use a derivation for your Record_Type column like:
ALLDATA[2]

to strip off the last 2 bytes of a record.

Good luck,
Brian

lfong · Post by **lfong** » Wed Oct 17, 2007 2:54 pm

dt12946,
this is the situation I have:
Record A
field1 pic x(4)
field2 pic 99
rectyp pic x
field5 pic x(5)
field6 pic 9(9)

Record B
field1 pic 9
field2 pic 99999
rectyp pic x
rec4 pic x(14)
field6 pic 9(9)

How would I split a record into these two records above based on the rectyp.

Thanks

bpsprecher · Post by **bpsprecher** » Wed Oct 17, 2007 3:16 pm

Here's how I'd do it...

Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)

Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.

Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).

This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).

Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.

Brian

lfong · Post by **lfong** » Thu Oct 18, 2007 8:29 am

dt12946 wrote:Here's how I'd do it...

Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)

Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.

Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).

This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).

Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.

Brian

I just set up this solution and came up with a problem. The output from the first column import stage becomes the input for the second column import stage.

lfong · Post by **lfong** » Thu Oct 18, 2007 9:00 am

Not a problem, when I imported the target record type, I was able to map everything.

Thanks

DSXchange

creating multiple output record layouts based on rec type

creating multiple output record layouts based on rec type

CFF-creating multiple output record layouts based on rec typ