Regarding unzip in WinNT

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Regarding unzip in WinNT

Post by DSRajesh »

Hi,

I am doing FTP the .gz file from unix server using ftp stage with ascii format and trying to unzip that using command stage using gunzip command ....it is unzipping the file but the problem is it is returning the same number of rows as in source into target where after unzip we need get more number of rows in target than source(I hope).

Will gunzip work fine in WinNT environment as My OS is WinNT.

I am uanble to find the reason why am getting same number rows.

Can any one help me here?
RD
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

gunzip will correctly decompress the file on windows if you have it installed. You are most likely seeing an issue because UNIX file line termination is usually <LF> while windows default is <CR><LF>.

If that isn't your problem, perhaps you could explain why you expect different numbers of rows.
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Hi ArndW

I am FTPying the .gz file say May07.gz (it has an ascii file May07 which is zipped ) from unix server using FTP stage with properties

of Data representaion as ascii and Line termination as Unix LF as well as in command stage i am using gunzip May07 and taking sequential file as target where i am setting the Line termination as Unix Style(LF).

The coulmn name i am taking is as Data .

This is what i am doing in my datastage job.

Job is running successfully.

But the i am unable to view the data in correct format say text foramt and as well getting 23914 rows which is same in source and target.

Can you please help me how to sort this issue.
RD
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I don't know what the issue is.

You have FTP'd the file from UNIX to Windows and issued gunzip to unzip the file. Can you use notepad.exe or wordpad.exe to view this file? if you do a "wc -l" on both UNIX and Windows copies are they identical?

If they are identical, what happens when you try a view-data in your job? Do you get metadata or data errors?
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Hi ArndW

The issue i am facing is :

I need the data in txt file after unzipping in txt format.but i am able to see the data in txt file in unrecognizable format.

another doubt i have after observing the job is : FTP stage is retrieving 23914 rows and after unzipping also i am getting 23914 rows in sequential file.after unzipping also can we have same number of rows as in zip file?

or we will be getting more rows in target file as it is unzipped size of the file should increase.

Can you please let me know
RD
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You didn't mention that you were using the FTP DataStage stage instead of a unix/windows FTP command before. The FTP stage is best when transporting data on a row-by-row basis for processing. It is not intended for what you are doing. I suggest you use a normal FTP command to get your file, then gunzip it, then process it in DataStage.
If you insist on using the FTP stage, then you need to ensure that you are doing a binary transfer and you should write the output to an external stage that gunzips the stream into a pipe, which the same job can then read. I do not recommend doing this; I've used that approach in the past and it is unstable and error prone and needlessly complex.
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Hi ArndW,

In remote unix server ,The zip file size is 24 MB and when i unzip the same its showing 39MB.

But the same zip file when i extracted using FTP Stage the file size is same as in Unix server.

But when i unzip it in datasatge job ,it should show 39 MB as in unix server after unzip.but it is showing only 24 MB even after unzip.

Can i know how can i solve this.

Please help me in this regard
RD
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Can anybody help on this please
RD
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The problem is with your assumption that the FTP stage unzips anything. It does not. The FTP stage is intended to transfer a file a row at a time, and feed those rows into your job.

Since that's not what you want, either unzip the file at source (perhaps using a "before transfer" command in the FTP stage), or write a script to effect the FTP so that the compressed file arrives on the DataStage server machine, where it can be unzipped either using an operating system level command (such as gunzip) or a filter command (such as gunzip) in a Sequential File stage, which will read stdout of its filter command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Hi Ray,

Here i am using FTP stage just to get the .gz file and after that i am using command stage to unzip that .

The command stage is unzipping.but after unzipping the file size should be more which is happening in unix server when i unzip.

How can i acheive this

can you pls help me
RD
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You've now had two people recommend not using the FTP stage. You state that the command stage is unzipping... but that the file is corrupt. You need to find out where your error is happening. Try FTPing a NON-zipped file. Does that work? If yes, then your unzip portion is introducing the problem (I doubt it). If no, you need to understand how to work the FTP stage or take the advice and don't use it in this case.
k1980pc
Participant
Posts: 9
Joined: Fri Jun 15, 2007 3:31 am

Re: Regarding unzip in WinNT

Post by k1980pc »

DSRajesh wrote:Hi,

I am doing FTP the .gz file from unix server using ftp stage with ascii format and trying to unzip that using command stage using gunzip command ....
I guess it boils down to this...

A .gz file is a binary file. It needs to be ftp'd in bin mode(not ascii). Once you ftp it in ascii mode, the file is corrupt. The command stage is not able to unzip it and you have a file which is of the same size as one on the server.

Try changing ftp mode to binary or auto and it should work - though you don't need ftp stage for all this.
RELAXEN UND WATSCHEN DER BLINKENLICHTEN
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

DSRajesh wrote:another doubt i have after observing the job is : FTP stage is retrieving 23914 rows and after unzipping also i am getting 23914 rows in sequential file.after unzipping also can we have same number of rows as in zip file?
:? What? There are no 'rows' in a zip file. You are seriously confused and making this way more difficult that it needs to be.

Unzip the file then use the FTP stage to transfer the file. Or ftp the compressed file outside of DataStage, then you can gunzip either in the Filter command of the Sequential File stage or again do that as part of your ftp script post transfer.
-craig

"You can never have too many knives" -- Logan Nine Fingers
k1980pc
Participant
Posts: 9
Joined: Fri Jun 15, 2007 3:31 am

Post by k1980pc »

chulett wrote:
DSRajesh wrote:another doubt i have after observing the job is : FTP stage is retrieving 23914 rows and after unzipping also i am getting 23914 rows in sequential file.after unzipping also can we have same number of rows as in zip file?
:? What? There are no 'rows' in a zip file......
Though unix or mks toolkit will show output for wc -l myzipfile.gz... just to confuse mere mortals like us. I think it should throw some error message like unzip me first :D

DSRajesh,
do a gzcat gzipfile.gz|wc -l to get the real rowcount
RELAXEN UND WATSCHEN DER BLINKENLICHTEN
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Dear All,

When i try to access the .gz file in binary mode ,it is thrwoing error:

Unable buffer_to_row error message.

Can any one help me why am i getting this error
RD
Post Reply