folding a single record into multiple records at fixed width
Moderators: chulett, rschirm, roy
folding a single record into multiple records at fixed width
We get a data flat file which has just one line. We need to fold/split this record at a fixed width and load the records into a oracle table.
E.g.
1A000AER000UTQQ should get converted to(assuming width of 5)
1A000
AER00
0UTQQ
Then these 3 records will be loaded into a table.
Whats the best way to go about this? One option is to use unix fold command from within datastage and then load the records. The volume is considerable, we are expecting number of records(after the split) ~ 9 million rows
Any suggestions?
E.g.
1A000AER000UTQQ should get converted to(assuming width of 5)
1A000
AER00
0UTQQ
Then these 3 records will be loaded into a table.
Whats the best way to go about this? One option is to use unix fold command from within datastage and then load the records. The volume is considerable, we are expecting number of records(after the split) ~ 9 million rows
Any suggestions?
Thing that am concerned about is the volume. As I said this one line will be large. Approx 603000000 characters. Which after split gets converted to approx 9 million rows. Does DS read the entire line in memory or this will not be a performance impact?DSguru2B wrote:Well then in the transformer, you can substring and add Char(013):Char(010) after every fifth position. This way you are adding a unix new line character which will split them.
DSguru2B wrote:Substrings are pretty fast. It should not have any performance impact. Run a test on 25% of your expected input.
OK this may be a silly question, but is there any function to substring(I tried link.record[1,5]) after "every" 5 chars as you suggested or do I need to write a custom routine to achieve this?
Just the way you showed.
Code: Select all
in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
OK. Two points:DSguru2B wrote:Just the way you showed.Code: Select all
in.col[1,5]:char(013):char(010):in.col[6,5]:char(013):char(010) and so on...
A. I tried
in.col[1,5]:char(013):char(010):in.col[6,5]
this does not work in parallel job it prints just first record. In server job, it works fine but prints a ^M character at end of each line.
Then in Server job the following works fine:
in.col[1,5]:char(010):in.col[6,5]
Any idea why parallel is not working. My job design is
[SeqFile] > [Transformer] > [SeqFile]
B. I can't go with the hardcoded approach above, of specifing each limit, since its a huge line. I guess i will need to create a string using some custom routine that will append newline at every 5 chars and then pass that string to the output.
O yes. It is unix and not windows . You only need char(010).
Check the row count of your target. How do you know its not showing up?
Also try to run in sequential mode and see if its working.
To parse the input record, you need to write a custom C program that inserts a new line character after every 5th position. If you decide to go with a server job, you need to do the same with in a routine.
Check the row count of your target. How do you know its not showing up?
Also try to run in sequential mode and see if its working.
To parse the input record, you need to write a custom C program that inserts a new line character after every 5th position. If you decide to go with a server job, you need to do the same with in a routine.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Just a thought.
Couldn't you treat the input as fixed length with a 'record length' of 5? If you look at a regular fixed length file, it looks like one gigantic record with an end-of-file termination. What is the difference between that and your 1 line input record?
Set the file type to fixed length with a record length of 5.
Brad.
Couldn't you treat the input as fixed length with a 'record length' of 5? If you look at a regular fixed length file, it looks like one gigantic record with an end-of-file termination. What is the difference between that and your 1 line input record?
Set the file type to fixed length with a record length of 5.
Brad.
Brad - you have hit upon the solution that I was thinking about as well. Just declare the original with one column of fixed width and no line terminators, then read it in.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: folding a single record into multiple records at fixed w
Ray -
news78 wrote:We get a data flat file which has just one line...
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: