folding a single record into multiple records at fixed width

news78 · Post by **news78** » Tue Jun 19, 2007 10:01 am

We get a data flat file which has just one line. We need to fold/split this record at a fixed width and load the records into a oracle table.

E.g.
1A000AER000UTQQ should get converted to(assuming width of 5)
1A000
AER00
0UTQQ
Then these 3 records will be loaded into a table.

Whats the best way to go about this? One option is to use unix fold command from within datastage and then load the records. The volume is considerable, we are expecting number of records(after the split) ~ 9 million rows

Any suggestions?

DSguru2B · Post by **DSguru2B** » Tue Jun 19, 2007 10:05 am

Is it a single column input and single column output?

news78 · Post by **news78** » Tue Jun 19, 2007 10:38 am

DSguru2B wrote:Is it a single column input and single column output?

Yes it is Single column output. Input as I said is one line, which I assume can be considered as single col in input.

DSguru2B · Post by **DSguru2B** » Tue Jun 19, 2007 10:49 am

Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.

news78 · Post by **news78** » Tue Jun 19, 2007 11:01 am

DSguru2B wrote:Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.

Thing that am concerned about is the volume. As I said this one line will be large. Approx 603000000 characters. Which after split gets converted to approx 9 million rows. Does DS read the entire line in memory or this will not be a performance impact?

DSguru2B · Post by **DSguru2B** » Tue Jun 19, 2007 11:14 am

Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.

news78 · Post by **news78** » Tue Jun 19, 2007 12:11 pm

DSguru2B wrote:Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.

OK this may be a silly question, but is there any function to substring(I tried link.record[1,5]) after "every" 5 chars as you suggested or do I need to write a custom routine to achieve this?

DSguru2B · Post by **DSguru2B** » Tue Jun 19, 2007 12:44 pm

Just the way you showed.

Code: Select all

in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...

news78 · Post by **news78** » Tue Jun 19, 2007 1:41 pm

DSguru2B wrote:Just the way you showed.
Code: Select all
in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...

OK. Two points:
A. I tried
in.col[1,5]:char(013):char(010):in.col[6,5]
this does not work in parallel job it prints just first record. In server job, it works fine but prints a ^M character at end of each line.
Then in Server job the following works fine:
in.col[1,5]:char(010):in.col[6,5]

Any idea why parallel is not working. My job design is
[SeqFile] > [Transformer] > [SeqFile]

B. I can't go with the hardcoded approach above, of specifing each limit, since its a huge line. I guess i will need to create a string using some custom routine that will append newline at every 5 chars and then pass that string to the output.

DSguru2B · Post by **DSguru2B** » Tue Jun 19, 2007 1:58 pm

O yes. It is unix and not windows

. You only need char(010).
Check the row count of your target. How do you know its not showing up?
Also try to run in sequential mode and see if its working.

To parse the input record, you need to write a custom C program that inserts a new line character after every 5th position. If you decide to go with a server job, you need to do the same with in a routine.

bcarlson · Post by **bcarlson** » Tue Jun 19, 2007 3:49 pm

Just a thought.

Couldn't you treat the input as fixed length with a 'record length' of 5? If you look at a regular fixed length file, it looks like one gigantic record with an end-of-file termination. What is the difference between that and your 1 line input record?

Set the file type to fixed length with a record length of 5.

Brad.

ArndW · Post by **ArndW** » Tue Jun 19, 2007 3:57 pm

Brad - you have hit upon the solution that I was thinking about as well. Just declare the original with one column of fixed width and no line terminators, then read it in.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 19, 2007 7:17 pm

.... except that there IS a line terminator as every 16th character in the source file.

ArndW · Post by **ArndW** » Tue Jun 19, 2007 11:09 pm

Ray -

news78 wrote:We get a data flat file which has just one line...

ray.wurlod · Post by **ray.wurlod** » Wed Jun 20, 2007 1:24 am

Wait for "that was only an example, we actually have one column but multiple rows".

DSXchange

folding a single record into multiple records at fixed width

folding a single record into multiple records at fixed width

Re: folding a single record into multiple records at fixed w