Line breaks within fields in a file

DataStage_Sterling · Post by **DataStage_Sterling** » Wed Apr 13, 2016 9:17 am

We are migrating data from Access database to DataStage 8.7. Data is extracted in form of tab delimited .txt files from Access database. These files are read and loaded into Oracle database using DataStage

We are facing problem with ^M within the fields. (^M at the end of the records is not an issue as we are able to successfully read it with Record Delimiter String = DOS format)

<B>Job Design </B>
Seq File -> Transformer -> Oracle connector

<B> Source Data example </B>
1 These are comments Charleston WV^M
2 These are comments in^M
second line New York NY^M
3 One line comments Chicago IL^M

Line 1 and 3 are read correctly in DataStage, however line 2 errors out

Code: Select all

Delimiter for field "FIELD3" not found; input: {second}, at offset:

I tried resolving in a unix pre-processing script but was not successful. Is there a way to force DataStage read line 2 completely?

Thanks for your time

Satish

qt_ky · Post by **qt_ky** » Wed Apr 13, 2016 9:22 am

The usual advice is to use a Server job or a Server Shared Container within a Parallel job. The Server job's Sequential File stage has properties on the metadata tab to handle line terminators.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 13, 2016 5:06 pm

Why/how was your pre-processing unsuccessful?

Using the server Sequential File stage is an easy solution (provided that the string is quoted).

DataStage_Sterling · Post by **DataStage_Sterling** » Wed Apr 13, 2016 8:40 pm

Here is the closest thing I got for this, but it did not serve my purpose. (I am not a UNIX expert by the way)
http://stackoverflow.com/questions/6081 ... 7#36607027

1. Server jobs are no longer allowed in our company
2. From documentation, I see that shared containers are possible only in SMP. We have a grid architecture
3. The strings are not quoted, as the data already has quotes in many fields
https://www.ibm.com/support/knowledgece ... iners.html

UCDI · Post by **UCDI** » Thu Apr 14, 2016 8:23 am

preprocess it in a simple C (or insert your favorite language) program then.
from what you posted, you can read 1 integer, then start a loop. Go until you read another integer dump the data as a single line, when you find the next integer, insert a line break and repeat.... its probably 10 lines of code. There are limits to what a shell script can do, but a real language can solve this in less time that it took me to type this post.