creating multiple output record layouts based on rec type
Moderators: chulett, rschirm, roy
creating multiple output record layouts based on rec type
I have a large input file containing multple record types. Each record type has a different record layout. Is there a way to make one read-pass through the input file, and for each record-type encountered, select a specific output link, with a record layout correct for that record-type?
I am trying to avoid mapping my output columns via substringing from a single big data input field. I'd like to hava one big data input field after the rectype field, and overlay on that same record-buffer one of 5 different record-layouts on the output side, depending on which output link is selected.
Or is there a way to do this in 5 different stages, with just one output link in each stage?
I found a handy way to do this in Datastage server with complex flat files, but Datastage PX complex flat files don't allow different record layouts on each output link that overlay the same storage area the way Datastage server does.
thanks
I am trying to avoid mapping my output columns via substringing from a single big data input field. I'd like to hava one big data input field after the rectype field, and overlay on that same record-buffer one of 5 different record-layouts on the output side, depending on which output link is selected.
Or is there a way to do this in 5 different stages, with just one output link in each stage?
I found a handy way to do this in Datastage server with complex flat files, but Datastage PX complex flat files don't allow different record layouts on each output link that overlay the same storage area the way Datastage server does.
thanks
Stephen Peterson
Why don't you just containerize the Server CFF stage and use in your PX job?
Or
Is there a problem using a Server job to just normalize the data into separate files and then have PX jobs pick it up from there?
Or
Is there a problem using a Server job to just normalize the data into separate files and then have PX jobs pick it up from there?
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Kenneth,
Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?
Thanks
Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?
Thanks
Stephen Peterson
CFF-creating multiple output record layouts based on rec typ
We have the same situation as described above and we are unable to use CFF stage (in PX).
Is there any solution yet?
I want to know if CFF stage(in PX) can handle multiple record types?And how? Each record type has a different record layout. And each records is delimited by newline char (\n)
Thanks..!
Is there any solution yet?
I want to know if CFF stage(in PX) can handle multiple record types?And how? Each record type has a different record layout. And each records is delimited by newline char (\n)
Thanks..!
I have the same problem. I wound up writing a generic file splitter to split the record types out. In some of the files I had to further split them out. It was tedious to write the code, but it works well. The prototype we wrote in VB. I just converted it to Java because we run both in a Windows and Unix environment.
The hawk beta CFF Stage is supposed to support multi-file formats. I have not had a chance to try it out yet.
The hawk beta CFF Stage is supposed to support multi-file formats. I have not had a chance to try it out yet.
Never replied, sorry. I don't like containers, but they're handy if a Server stage, like the CFF, has more features and you'd rather use it that the PX CFF stage. Since you have to read one row at a time from the stage, a parallel read is really not an option, so the Server stage inside a container is an option.cspeter8 wrote:Kenneth,
Five containers suggests to me 5 passes through the data. This doesn't seem very efficient. For sake of discussion, if efficiency did not matter, then why not just grep the data for the specific record-type in each of 5 sequential stages? Where's the benefit of using containers? Can you elaborate more?
Thanks
5 stages reading the same file seems inefficient, but you're dealing with record types so that's the situation you can't change. If you could have different record types go to different output links, leveraging a single read from the source, that'd be cool. Alas, ...
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Premium Member
- Posts: 21
- Joined: Mon Mar 08, 2004 11:42 am
lfong,
Prior to your Column Import you need to be able to separate the record type from the rest of the data. I suppose you are asking how to determine where your record type is because the records aren't fixed length. You can place a Transformer (or Modify, if brave) between your Sequential File & Switch (or Filter) stages to peel off your value.
Use a derivation for your Record_Type column like:
ALLDATA[2]
to strip off the last 2 bytes of a record.
Good luck,
Brian
Prior to your Column Import you need to be able to separate the record type from the rest of the data. I suppose you are asking how to determine where your record type is because the records aren't fixed length. You can place a Transformer (or Modify, if brave) between your Sequential File & Switch (or Filter) stages to peel off your value.
Use a derivation for your Record_Type column like:
ALLDATA[2]
to strip off the last 2 bytes of a record.
Good luck,
Brian
Brian Sprecher
IBM
Lenexa, KS
IBM
Lenexa, KS
-
- Premium Member
- Posts: 21
- Joined: Mon Mar 08, 2004 11:42 am
Here's how I'd do it...
Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)
Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.
Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).
This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).
Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.
Brian
Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)
Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.
Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).
This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).
Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.
Brian
Brian Sprecher
IBM
Lenexa, KS
IBM
Lenexa, KS
I just set up this solution and came up with a problem. The output from the first column import stage becomes the input for the second column import stage.dt12946 wrote:Here's how I'd do it...
Start with a Sequential File stage with the following metadata:
COL1 Char(6)
RecTyp Char(1)
COL3 VarChar(23)
Then run the stream through a Switch stage to produce 2 links (streams); one for each RecTyp.
Then run the 2 links through 2 Column Import stages (giving a total of 4 Column Import stages in your job). The first Column Import will break up COL1 into field1 & field2, the second Column Import will break up COL3 into field5 & field6 (for Record A) and rec4 & field6 (for Record B).
This is possible because you know the exact location of RecTyp in every record; it's the 7th byte. Also, COL1 can be Char, but COL3 needs to be VarChar since its size will vary (14 bytes for Record A and 23 bytes for Record B).
Here's a tip -- define your metadata on the output link of the Column Import stage prior to doing anything else. It makes this scary-looking stage actually one of the easiest to use.
Brian