New line character issue for sequential file stage

austin_316 · Post by **austin_316** » Mon Jul 11, 2011 7:31 am

Hi,
we are facing issue when we used a sequential file stage in on of our jobs.
what we actually doing in the job is we fetch data from a oracle stage and use remove duplicates stage and then store the data in the sequential file stage

Code: Select all

OracleStage1------------>Funnel----->RemoveDuplicates----->Copy----->SequentialFile
                         ^
                         | 
                         |
                         |
                 Oraclestage2

There are some columns in the table which have \n newline charater in the data. So when iam storing the value in sequential file it is taking the record as 2 records instead of 1.

Data in sequential file

Col1||Col2||Col3||Col4
rec1||abd||def||ghi
rec2||abc
asd||kol||llp
rec3||jghg||jghr||kjgng

We are storing this data in sequential file as it is an input for a jar file which we are calling in after job subroutine.
Can anyone please tell me how can we replace this newline character with come combination of characters like ';:;:;:'.
we dont have any transformer stage in the job and we wanted to do this without using the transformer stage.

any help is highly appreciated.

thanks.

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 8:18 am

One thing you can do is , in the source itself you can correct it by using the below query

Code: Select all

UPDATE table_name SET cl_name=REPLACE(cl_name,chr(10),'') where INSTR(cl_name,chr(10))>0

Ravi.K · Post by **Ravi.K** » Mon Jul 11, 2011 9:12 am

While writing to Sequential file use the below command at fileter option.

tr -d '\n'

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 10:12 am

Ravi.K wrote:While writing to Sequential file use the below command at fileter option.

tr -d '\n'

If you use this, all the records in the source table are loaded into the target as single record.
because '\n' between one record and another record also got removed.so everything comes in a single line

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 10:28 am

i want to remove the '\n' character in particular field and load into file.
How to achieve this in transformer stage?
i have tried Exchange() and tr -d in filter command.
It's not giving expected result.

Any other suggestions welcome!

Thanks

FranklinE · Post by **FranklinE** » Mon Jul 11, 2011 10:48 am

An old trick I learned from an assembler programmer that seems good regardless of the envirnoment:

1) Use some other special character for the end-of-record mark, one you are sure will not appear in the rest of the data.

2) Go ahead and remove all '\n' from every record.

3) Convert the end-of-record mark to '\n'.

Good luck.

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 10:53 am

FranklinE wrote:An old trick I learned from an assembler programmer that seems good regardless of the envirnoment:

1) Use some other special character for the end-of-record mark, one you are
.

How to do this?
in which stage we need to do this?

Thanks

FranklinE · Post by **FranklinE** » Mon Jul 11, 2011 11:36 am

It depends on where the '\n' is causing you the trouble. Without seeing your code, I would expect one of two possibilities:

1) If you don't control this until the file stage, that is where you set the alternate value for end-of-record, in Format/Record level/Record delimiter string. After that, you use another job or a Unix script to strip the bad '\n' and finish with converting the end-of-record mark.

2) When you read from Oracle, use a transformer to remove the character. When you get to the file stage, the only place it should appear is at the end of each record because your file stage will put it there.

There are details that you have to figure out in between. Good luck.

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 11:50 am

Thanks franklin.

I have planned to remove \n in source oracle stage itself by using replace(col_name,chr(10),'').

ray.wurlod · Post by **ray.wurlod** » Mon Jul 11, 2011 4:07 pm

Have you checked with your client whether they want to keep those newline characters? Or is yours a site with no data governance processes whatsoever?

pandeesh · Post by **pandeesh** » Mon Jul 11, 2011 7:38 pm

ray.wurlod wrote:Have you checked with your client whether they want to keep those newline characters? Or is yours a site with no data governance processes whatsoever?

Now a days, we are manually updating the particular column and then we are running the job.
it'll be fine if we handle this in source stage itself.
however there ll not be any impact in original table.
Just it ll write in a sequential file without \n

chulett · Post by **chulett** » Mon Jul 11, 2011 9:14 pm

Wait... suddenly this is your thread? You really need to stop doing that.

austin_316 · Post by **austin_316** » Mon Jul 11, 2011 9:32 pm

ray.wurlod wrote:Have you checked with your client whether they want to keep those newline characters? Or is yours a site with no data governance processes whatsoever?

We are actually having this problem. I want to convert those \n to some combination of characters when iam passing it to the sequential file and after the jar file is done working on it i want to convert those combination of characters back to \n. Also our problem is we are not sure in which columns this newline character might appear. so we want to do this conversion for all columns. any way to do this other than transformer? even in transformer i think i have to mention the convert() function for all the column derivations.

samyamkrishna · Post by **samyamkrishna** » Mon Jul 11, 2011 11:06 pm

Hi,

You can create a wrapper stage

having this tr "\n" "abc" and pass all the column through the wrapper.

Regards,
Samyam

ray.wurlod · Post by **ray.wurlod** » Tue Jul 12, 2011 1:36 am

That's not how tr works!

DSXchange