Page 1 of 1

External source stage skipping blank lines

Posted: Wed Feb 15, 2012 11:58 am
by chetan.c
Hi ,

Im using the below command to in the source program column in External source stage.
tar -xOvf /home/files/Reques_folder.tar

The output from the external source stage is skipping the Blank lines.

However when i enter the same command in unix prompt it is giving the correct data from the files including Blank lines.
Further setting in External source stage:
Final delimiter:End
Record Delimiter:UNIX Newline
Delimiter:None
Quote:None

I'm trying to read the data into one single column.

Kindly let me what could be the issue.


Thanks,
Chetan

Re: External source stage skipping blank lines

Posted: Wed Feb 15, 2012 10:57 pm
by chetan.c
Hi ,

Is it a problem in the configuration?
Please guide me for a solution.

Thanks,
Chetan.C

Posted: Thu Feb 16, 2012 1:00 am
by kandyshandy
You mean to say there are blank lines (empty lines or just a new line character) in your data files?

External Source stage is skipping/ignoring those lines?

What is the datatype of your column?

Posted: Thu Feb 16, 2012 3:36 am
by chetan.c
Actually there is ^M characters in the data.

When i use tar -xOvf /home/TL012938524.tar | sed 's/^M/ /g' on command prompt i'm getting the correct output but if i use it in external source stage its not getting removed.

Thanks

Posted: Thu Feb 16, 2012 4:04 am
by kandyshandy
So it is the other way of what you told earlier.

Ok. Is it expecting a backslash infront of the special character? :roll:

Posted: Thu Feb 16, 2012 4:24 am
by ray.wurlod
Specify your record delimiter string property in the Sequential File stage as "DOS style".

Posted: Mon Feb 20, 2012 6:36 am
by chetan.c
ray.wurlod wrote:Specify your record delimiter string property in the Sequential File stage as "DOS style". ...
Hi Ray,
I tried using DOS style but still the same problem.

Also when i load a tar file which has around 7000 records,the job aborts saying consumed more than 100000 bytes .
I read the posts here and understood about the APT_MAX_DELIMITED_READ_SIZE environment varible.

But my question is if the stage cannot find the record delimiter then why does it show data from file in different rows when i view data?.

Thanks,
Chetan.C

Posted: Mon Feb 20, 2012 3:03 pm
by Kryt0n
Would suggest Unix newline is right if it isn't finding Windows newline, are you sure these are actually blank lines and not just the runover of the previous line?
Try running a couple of lines (including the blank ones) through "od -c" or "od -x" and check what each is delimited by.

Posted: Tue Feb 21, 2012 2:13 am
by chetan.c
hi Kryt0n,

I just passed the file throught od -xc and i got the below output.
The First two lines of the files looks like this.
[iamie]
TLTSID=E53F96B42B72102BB3D22D89C82DEAEC.

The second portion in the code is with a blank line and next line which has data.

So it is DOS style right ?As there is \r\n.
And in the blank line it has \r\n\r\n.

But one thing that still bothers me is why the exteranl source stage cant find the record delimiter? i read about the stage documention and even tried \r\n in the external source stage.

Can you let me know where i'm going wrong?.

Thanks.

Code: Select all

0000000 695b 6d61 6569 0d5d 540a 544c 4953 3d44
          [   i   a   m   i   e   ]  \r  \n   T   L   T   S   I   D   =
0000260 0a0d 0a0d 655b 766e 0d5d 520a 5145 425f
         \r  \n  \r  \n   [   e   n   v   ]  \r  \n   R   E   Q   _   B

Posted: Tue Feb 21, 2012 3:22 pm
by Kryt0n
Definitely Windows and definitely blank line from that...

Currently if you set to Unix newline, it processes the populated lines (with a Control-M at the end) but Windows newline errors with no record delimiter?

Posted: Wed Feb 22, 2012 2:09 am
by chetan.c
Yes windows newline erros with no record delimiter.
I did read a post here about the error of Consuming 100000 bytes for which the user increased the APT_MAX_DELIMITED_READ_SIZE and the job worked.

But i dont want to do that atleast now without finding out why this is happening.

Any thoughts?

thanks,
Chetan.C

Posted: Wed Feb 22, 2012 3:11 pm
by Kryt0n
Really can't say, and as I'm working on Windows, can't really replicate.

My suggestion would be to do the tar before hand and then read (using sequential file stage) the subsequent file it extracts...

Posted: Wed Feb 22, 2012 11:06 pm
by chetan.c
Thanks for the response.

But actually the client does not want to write these files to the disk.
So going on this route.


Thanks,
Chetan

Posted: Thu Feb 23, 2012 8:02 am
by chetan.c
The issue with the external source stage about consuming more bytes than defined i resolved that error by increasing APT_MAX_DELIMITED_READ_SIZE and APT_DEFAULT_TRANSPORT_BLOCK_SIZE.

That seems to have solved the Consuming more than 100000 bytes error.

But still did not get why it says record delimiter not found.

Anways since now i'm facing a new error about job running very slow will be opening a new thread.

Thanks,
Chetan