External source stage skipping blank lines

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

External source stage skipping blank lines

Post by chetan.c »

Hi ,

Im using the below command to in the source program column in External source stage.
tar -xOvf /home/files/Reques_folder.tar

The output from the external source stage is skipping the Blank lines.

However when i enter the same command in unix prompt it is giving the correct data from the files including Blank lines.
Further setting in External source stage:
Final delimiter:End
Record Delimiter:UNIX Newline
Delimiter:None
Quote:None

I'm trying to read the data into one single column.

Kindly let me what could be the issue.


Thanks,
Chetan
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Re: External source stage skipping blank lines

Post by chetan.c »

Hi ,

Is it a problem in the configuration?
Please guide me for a solution.

Thanks,
Chetan.C
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

You mean to say there are blank lines (empty lines or just a new line character) in your data files?

External Source stage is skipping/ignoring those lines?

What is the datatype of your column?
Kandy
_________________
Try and Try again…You will succeed atlast!!
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Actually there is ^M characters in the data.

When i use tar -xOvf /home/TL012938524.tar | sed 's/^M/ /g' on command prompt i'm getting the correct output but if i use it in external source stage its not getting removed.

Thanks
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

So it is the other way of what you told earlier.

Ok. Is it expecting a backslash infront of the special character? :roll:
Kandy
_________________
Try and Try again…You will succeed atlast!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Specify your record delimiter string property in the Sequential File stage as "DOS style".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

ray.wurlod wrote:Specify your record delimiter string property in the Sequential File stage as "DOS style". ...
Hi Ray,
I tried using DOS style but still the same problem.

Also when i load a tar file which has around 7000 records,the job aborts saying consumed more than 100000 bytes .
I read the posts here and understood about the APT_MAX_DELIMITED_READ_SIZE environment varible.

But my question is if the stage cannot find the record delimiter then why does it show data from file in different rows when i view data?.

Thanks,
Chetan.C
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Would suggest Unix newline is right if it isn't finding Windows newline, are you sure these are actually blank lines and not just the runover of the previous line?
Try running a couple of lines (including the blank ones) through "od -c" or "od -x" and check what each is delimited by.
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

hi Kryt0n,

I just passed the file throught od -xc and i got the below output.
The First two lines of the files looks like this.
[iamie]
TLTSID=E53F96B42B72102BB3D22D89C82DEAEC.

The second portion in the code is with a blank line and next line which has data.

So it is DOS style right ?As there is \r\n.
And in the blank line it has \r\n\r\n.

But one thing that still bothers me is why the exteranl source stage cant find the record delimiter? i read about the stage documention and even tried \r\n in the external source stage.

Can you let me know where i'm going wrong?.

Thanks.

Code: Select all

0000000 695b 6d61 6569 0d5d 540a 544c 4953 3d44
          [   i   a   m   i   e   ]  \r  \n   T   L   T   S   I   D   =
0000260 0a0d 0a0d 655b 766e 0d5d 520a 5145 425f
         \r  \n  \r  \n   [   e   n   v   ]  \r  \n   R   E   Q   _   B
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Definitely Windows and definitely blank line from that...

Currently if you set to Unix newline, it processes the populated lines (with a Control-M at the end) but Windows newline errors with no record delimiter?
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Yes windows newline erros with no record delimiter.
I did read a post here about the error of Consuming 100000 bytes for which the user increased the APT_MAX_DELIMITED_READ_SIZE and the job worked.

But i dont want to do that atleast now without finding out why this is happening.

Any thoughts?

thanks,
Chetan.C
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Really can't say, and as I'm working on Windows, can't really replicate.

My suggestion would be to do the tar before hand and then read (using sequential file stage) the subsequent file it extracts...
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Thanks for the response.

But actually the client does not want to write these files to the disk.
So going on this route.


Thanks,
Chetan
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

The issue with the external source stage about consuming more bytes than defined i resolved that error by increasing APT_MAX_DELIMITED_READ_SIZE and APT_DEFAULT_TRANSPORT_BLOCK_SIZE.

That seems to have solved the Consuming more than 100000 bytes error.

But still did not get why it says record delimiter not found.

Anways since now i'm facing a new error about job running very slow will be opening a new thread.

Thanks,
Chetan
Post Reply