Continuous Delimiters

rcil · Post by **rcil** » Sun May 15, 2005 11:22 pm

Hello All,

I need help in reading the input flat file which is in the format below

123;000456;32.11;89765;;;;;

I like to write it to two different files one taking out the extra semicolon and the second populating 0 to those extra columns

Code: Select all

Result:
1) 123;000456;32.11;89765
2)123;000456;32.11;89765;0;0;0;0;

thanks

ArndW · Post by **ArndW** » Mon May 16, 2005 12:50 am

If you read your sourcefile with a definition of just one column then a simple derivation of "TRIM(In.BigColumn,";","T") to remove extraneous delimiters. In the second case define all of the columns explicitly in your input column descriptions and then output the fields as numeric

ray.wurlod · Post by **ray.wurlod** » Mon May 16, 2005 2:27 am

For the first file trim the trailing ";" characters.

Code: Select all

Trim(TheString, ";", "T")

For the second file, convert to a dynamic array, empty strings to zero, then back to delimited.

Code: Select all

Convert(@FM, ";", Oconvs(Convert(";", @FM, TheString),"S;*;0;0"))

rcil · Post by **rcil** » Mon May 16, 2005 10:52 pm

Thanks for all the help. The problem I have is I am not able to read my source file through datastage. The file is in the format below with carriage return (^M) at the end of each line and all the rows are not with the same number of columns

Code: Select all

123;abc;42001;10.00;56789;;;;^M
128;bdc;32111;12.00;^M
890;fgh;53000;19.00;78987;;;;^M
368;bdc;85141;22.00;^M

I tried couple of ways to read the data like removed the ^M using dos2unix command, tried to put everything into one single string making the format to fixed width but none of the techniques worked. Most of the time it throws error as too many columns and sometime datasource is empty. I tried to use the sed command to remove the trailing ';' but couldn't get it right.

Hope I will get some help.

thanks

ArndW · Post by **ArndW** » Tue May 17, 2005 12:26 am

rcil,

you would need to define this file as having one column that is defined as longer than the maximum data length. You can set your line terminators to use just CR and not <CR><LF> (use Unix file type).

Then use the OCONV in a transform as listed above. No need to use UNIX commands.

kris · Post by **kris** » Tue May 17, 2005 5:04 pm

Code: Select all

123;abc;42001;10.00;56789;;;;^M
128;bdc;32111;12.00;^M
890;fgh;53000;19.00;78987;;;;^M
368;bdc;85141;22.00;^M

Those ^M characters are inserted when you copy these files on your PC.

If this is some thing you need to worry about in the real time processing, then you should choose the approach which Arnd suggested.

If it is one time job and you want to get rid of those ^M's. You could actually fix your file and process.

you can fix your file from command line

Code: Select all

sed -e "s/^M//" yourfilename > newfilename

Note: Do not copy paste this. '^M' in the code is actually two characters <cntrl>V<cntrl>M. so type <cntrl>V<cntrl>M and you would see only ^M.

Kris~