Line breaks within fields in a file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DataStage_Sterling
Participant
Posts: 26
Joined: Wed Jul 17, 2013 9:00 am

Line breaks within fields in a file

Post by DataStage_Sterling »

We are migrating data from Access database to DataStage 8.7. Data is extracted in form of tab delimited .txt files from Access database. These files are read and loaded into Oracle database using DataStage

We are facing problem with ^M within the fields. (^M at the end of the records is not an issue as we are able to successfully read it with Record Delimiter String = DOS format)

<B>Job Design </B>
Seq File -> Transformer -> Oracle connector


<B> Source Data example </B>
1 These are comments Charleston WV^M
2 These are comments in^M
second line New York NY^M
3 One line comments Chicago IL^M

Line 1 and 3 are read correctly in DataStage, however line 2 errors out

Code: Select all

Delimiter for field "FIELD3" not found; input: {second}, at offset: 
I tried resolving in a unix pre-processing script but was not successful. Is there a way to force DataStage read line 2 completely?

Thanks for your time

Satish
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

The usual advice is to use a Server job or a Server Shared Container within a Parallel job. The Server job's Sequential File stage has properties on the metadata tab to handle line terminators.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why/how was your pre-processing unsuccessful?

Using the server Sequential File stage is an easy solution (provided that the string is quoted).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DataStage_Sterling
Participant
Posts: 26
Joined: Wed Jul 17, 2013 9:00 am

Post by DataStage_Sterling »

Here is the closest thing I got for this, but it did not serve my purpose. (I am not a UNIX expert by the way)
http://stackoverflow.com/questions/6081 ... 7#36607027

1. Server jobs are no longer allowed in our company
2. From documentation, I see that shared containers are possible only in SMP. We have a grid architecture
3. The strings are not quoted, as the data already has quotes in many fields
https://www.ibm.com/support/knowledgece ... iners.html
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

preprocess it in a simple C (or insert your favorite language) program then.
from what you posted, you can read 1 integer, then start a loop. Go until you read another integer dump the data as a single line, when you find the next integer, insert a line break and repeat.... its probably 10 lines of code. There are limits to what a shell script can do, but a real language can solve this in less time that it took me to type this post.
Post Reply