Flat file loading on UNIX

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rkumar28
Participant
Posts: 43
Joined: Tue Mar 30, 2004 9:39 am

Flat file loading on UNIX

Post by rkumar28 »

Hi All,

I have a situation where I am doing the load from flat file to teradata tables using Datastage. I have more than 300,000 rows in the flat file that has about 16 columns. The flat file is pipe dilimited and is on UNIX. This flat file is created from two different source with same metadata.

The problem is third column(10 digit character) got switched with the fourth column(three digit character) somewhere in the middle of the flatfile i.e. somewhere in the file the fourth column became the third column and vice versa. This probably happened when the data from two sources are merged. I am trying to put all the rows with three digit character in fourth column into one file and ten digit character rows in another file. Is there a unix script/command that I can use to divide this file into two different file so that I can load all data sucessfully. Due to some problem I am not able to generate the file again.

I will really appreciate any help in this regards.

Thanks
Raj
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you really need to create two files? I would think you could work with the file as is, if the mix-up is as simple as it sounds. :?

Declare both the 3rd and 4th fields as 10 characters on the input side. Check the LENgth of the actual data and swap fields if the data in the 3rd column is 3 digits long. Do this in the middle of your processing job and either continue on to Teradata or write out a new (corrected) file that can then be bulk loaded.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Post by rsrikant »

Hi Craig,

What if the 10 digit character in the 3rd column has a value of length <=3?

If all the rows have a value of length >3 in the 10 digit character column, probably what you said can work out.

By what raj mentioned it's not clear whether all the rows are of length >3 in the 10 digit character column or not.

Correct me if I am wrong.

Thanks,
Srikanth
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We shall see, when Raj next visits. :wink: That's why I prefaced things like I did - if the fields are of a consistant length, 10 versus 3, it should be easy to correct.

After that, it will depend on exactly what is in the fields. '10 digit character' doesn't tell me specifically if it is 10 digits or 10 characters, meaning numeric or text or alphanumeric data, or if (as you mentioned) the data is actually of some size other than exactly 3 or 10.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's an easy DataStage job solution. Read the file and write another, with four output columns.
Field001 and Field002 are directly derived from input.
Field003 is derived as

Code: Select all

If Len(inlink.Field003) = 10 Then inlink.Field003 Else inlink.Field004
and Field004 is derived as

Code: Select all

If Len(inlink.Field004) = 3 Then inlink.Field004 Else inlink.Field003
A more efficient design would use a stage variable (called, say, svTest) that determines (and is derived as)

Code: Select all

(Len(inlink.Field003) = 10)
Then, output.Field003 is derived as

Code: Select all

If svTest Then inlink.Field003 Else inlink.Field004 
and output.Field004 is derived as

Code: Select all

If svTest Then inlink.004 Else inlink.Field003
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Post by rsrikant »

Vow!!! that's simply a great solution.

Thanks Ray.. I am adding to my favorites. :D

Srikanth
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wow Ray? That's exactly what I said. (where's a big pouty face when you need one?) :cry: Ray just provided the actual code to go along with it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkumar28
Participant
Posts: 43
Joined: Tue Mar 30, 2004 9:39 am

Flat file loading on UNIX

Post by rkumar28 »

Thanks Ray, Craig and Srikant...for providing the info. This really helped.

Also, wanted to share....I found something today to do this in Unix using awk utility:

awk -F'|' '{if(length($3==3)f="outfile1;else f="outfile2;print > f}' INPUT_Source_File

Its look for 3 digit number in column 3 and put data in outfile1 and everything else goes into the outfile2. But I feel doing it with datastage as Ray suggested is more simpler than creating two different files.

chulett wrote:Wow Ray? That's exactly what I said. (where's a big pouty face when you need one?) :cry: Ray just provided the actual code to go along with it.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could have used print with $3 and $4 in awk, thereby only needing one output file. But, heck, you've got DataStage. 8)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply