Page 1 of 2

folding a single record into multiple records at fixed width

Posted: Tue Jun 19, 2007 10:01 am
by news78
We get a data flat file which has just one line. We need to fold/split this record at a fixed width and load the records into a oracle table.

E.g.
1A000AER000UTQQ should get converted to(assuming width of 5)
1A000
AER00
0UTQQ
Then these 3 records will be loaded into a table.

Whats the best way to go about this? One option is to use unix fold command from within datastage and then load the records. The volume is considerable, we are expecting number of records(after the split) ~ 9 million rows

Any suggestions?

Posted: Tue Jun 19, 2007 10:05 am
by DSguru2B
Is it a single column input and single column output?

Posted: Tue Jun 19, 2007 10:38 am
by news78
DSguru2B wrote:Is it a single column input and single column output?
Yes it is Single column output. Input as I said is one line, which I assume can be considered as single col in input.

Posted: Tue Jun 19, 2007 10:49 am
by DSguru2B
Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.

Posted: Tue Jun 19, 2007 11:01 am
by news78
DSguru2B wrote:Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.
Thing that am concerned about is the volume. As I said this one line will be large. Approx 603000000 characters. Which after split gets converted to approx 9 million rows. Does DS read the entire line in memory or this will not be a performance impact?

Posted: Tue Jun 19, 2007 11:14 am
by DSguru2B
Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.

Posted: Tue Jun 19, 2007 12:11 pm
by news78
DSguru2B wrote:Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.

OK this may be a silly question, but is there any function to substring(I tried link.record[1,5]) after "every" 5 chars as you suggested or do I need to write a custom routine to achieve this?

Posted: Tue Jun 19, 2007 12:44 pm
by DSguru2B
Just the way you showed.

Code: Select all

in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...

Posted: Tue Jun 19, 2007 1:41 pm
by news78
DSguru2B wrote:Just the way you showed.

Code: Select all

in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...
OK. Two points:
A. I tried
in.col[1,5]:char(013):char(010):in.col[6,5]
this does not work in parallel job it prints just first record. In server job, it works fine but prints a ^M character at end of each line.
Then in Server job the following works fine:
in.col[1,5]:char(010):in.col[6,5]

Any idea why parallel is not working. My job design is
[SeqFile] > [Transformer] > [SeqFile]

B. I can't go with the hardcoded approach above, of specifing each limit, since its a huge line. I guess i will need to create a string using some custom routine that will append newline at every 5 chars and then pass that string to the output.

Posted: Tue Jun 19, 2007 1:58 pm
by DSguru2B
O yes. It is unix and not windows :oops: . You only need char(010).
Check the row count of your target. How do you know its not showing up?
Also try to run in sequential mode and see if its working.

To parse the input record, you need to write a custom C program that inserts a new line character after every 5th position. If you decide to go with a server job, you need to do the same with in a routine.

Posted: Tue Jun 19, 2007 3:49 pm
by bcarlson
Just a thought.

Couldn't you treat the input as fixed length with a 'record length' of 5? If you look at a regular fixed length file, it looks like one gigantic record with an end-of-file termination. What is the difference between that and your 1 line input record?

Set the file type to fixed length with a record length of 5.

Brad.

Posted: Tue Jun 19, 2007 3:57 pm
by ArndW
Brad - you have hit upon the solution that I was thinking about as well. Just declare the original with one column of fixed width and no line terminators, then read it in.

Posted: Tue Jun 19, 2007 7:17 pm
by ray.wurlod
.... except that there IS a line terminator as every 16th character in the source file.

Re: folding a single record into multiple records at fixed w

Posted: Tue Jun 19, 2007 11:09 pm
by ArndW
Ray -
news78 wrote:We get a data flat file which has just one line...

Posted: Wed Jun 20, 2007 1:24 am
by ray.wurlod
Wait for "that was only an example, we actually have one column but multiple rows".
:wink: