folding a single record into multiple records at fixed width

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

folding a single record into multiple records at fixed width

Post by news78 »

We get a data flat file which has just one line. We need to fold/split this record at a fixed width and load the records into a oracle table.

E.g.
1A000AER000UTQQ should get converted to(assuming width of 5)
1A000
AER00
0UTQQ
Then these 3 records will be loaded into a table.

Whats the best way to go about this? One option is to use unix fold command from within datastage and then load the records. The volume is considerable, we are expecting number of records(after the split) ~ 9 million rows

Any suggestions?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Is it a single column input and single column output?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

DSguru2B wrote:Is it a single column input and single column output?
Yes it is Single column output. Input as I said is one line, which I assume can be considered as single col in input.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

DSguru2B wrote:Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.
Thing that am concerned about is the volume. As I said this one line will be large. Approx 603000000 characters. Which after split gets converted to approx 9 million rows. Does DS read the entire line in memory or this will not be a performance impact?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

DSguru2B wrote:Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.

OK this may be a silly question, but is there any function to substring(I tried link.record[1,5]) after "every" 5 chars as you suggested or do I need to write a custom routine to achieve this?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Just the way you showed.

Code: Select all

in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

DSguru2B wrote:Just the way you showed.

Code: Select all

in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...
OK. Two points:
A. I tried
in.col[1,5]:char(013):char(010):in.col[6,5]
this does not work in parallel job it prints just first record. In server job, it works fine but prints a ^M character at end of each line.
Then in Server job the following works fine:
in.col[1,5]:char(010):in.col[6,5]

Any idea why parallel is not working. My job design is
[SeqFile] > [Transformer] > [SeqFile]

B. I can't go with the hardcoded approach above, of specifing each limit, since its a huge line. I guess i will need to create a string using some custom routine that will append newline at every 5 chars and then pass that string to the output.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

O yes. It is unix and not windows :oops: . You only need char(010).
Check the row count of your target. How do you know its not showing up?
Also try to run in sequential mode and see if its working.

To parse the input record, you need to write a custom C program that inserts a new line character after every 5th position. If you decide to go with a server job, you need to do the same with in a routine.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Just a thought.

Couldn't you treat the input as fixed length with a 'record length' of 5? If you look at a regular fixed length file, it looks like one gigantic record with an end-of-file termination. What is the difference between that and your 1 line input record?

Set the file type to fixed length with a record length of 5.

Brad.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Brad - you have hit upon the solution that I was thinking about as well. Just declare the original with one column of fixed width and no line terminators, then read it in.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

.... except that there IS a line terminator as every 16th character in the source file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: folding a single record into multiple records at fixed w

Post by ArndW »

Ray -
news78 wrote:We get a data flat file which has just one line...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Wait for "that was only an example, we actually have one column but multiple rows".
:wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply