Final output in a file greater than 2GB size

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pkothana
Participant
Posts: 50
Joined: Tue Oct 14, 2003 6:12 am

Final output in a file greater than 2GB size

Post by pkothana »

Hi All,
It is required to read from a file set, do some trasformations and send the final output in a file to third party. The output can be bigger that 2GB but as we have a Operating System restriction of file size can't be greater than 2GB, i was wondering in what format we can have the output and how can we send that across to 3rd party?
Appreciate if anybody can suggest the best possible way of doing this.

Thanks & Regards
Pinkesh
Peytot
Participant
Posts: 145
Joined: Wed Jun 04, 2003 7:56 am
Location: France

Post by Peytot »

You can Zip your file in PX :?:
or split your data into your transform, the first 2M in file A, 2-4M in File B, ...

Pey
kjanes
Participant
Posts: 144
Joined: Wed Nov 06, 2002 2:16 pm

Post by kjanes »

Are you running DS on UNIX/AIX or Windows? AIX and probably UNIX do not have a 2GB OS limitation.

We are building files over 2GB on AIX.
Kevin Janes
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Final output in a file greater than 2GB size

Post by bigpoppa »

You can zip your files on output or you could write out a fileset instead of a single file. Writing a fileset is an export option.

-BP
jseclen
Participant
Posts: 133
Joined: Wed Mar 05, 2003 4:19 pm
Location: Lima - Peru. Sudamerica
Contact:

Re: Final output in a file greater than 2GB size

Post by jseclen »

Hi Pinkesh,

In your job you can define 2 links, in the first link use as constraint INROWNUM < 100000 and the second INROWNUM > 100000

(These values are referencials, you have to define the real range)

After, you can use both files as input to the other process. :lol:
Saludos,

Miguel Seclén
Lima - Peru
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Final output in a file greater than 2GB size

Post by Teej »

bigpoppa wrote:You can zip your files on output or you could write out a fileset instead of a single file. Writing a fileset is an export option.
Oh? We have been doing it by creating buildop stages to handle the file output. I will definitely have to research that feature (filesets) further tomorrow.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
pkothana
Participant
Posts: 50
Joined: Tue Oct 14, 2003 6:12 am

Post by pkothana »

Hi,
Thanks for your suggestions and time.

I tried both the options i.e. Compressing a file and using a Fileset.

Problem with Compress stage is that it needs minimum 1 output stream. I could not find a stage that can hold the output of Compress Stage (i.e. data in zipped format)

Problem with fileset stage is that it creates the files at the location specified at the node. Also we can only have as many files in the file set as the no. of nodes. I tried giving the maximum file size in the file set stage, but the job aborts if the output is of more size than max size specified.
What I want is if the output size is 6 GB tan either I should be able to create 3 files of 2 GB each or I should be able to compress the file and send it to specified location.

Can anybody help me out in this............

Thanks & Regards

Pinkesh
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

pkothana wrote:Problem with fileset stage is that it creates the files at the location specified at the node. Also we can only have as many files in the file set as the no. of nodes. I tried giving the maximum file size in the file set stage, but the job aborts if the output is of more size than max size specified.
What I want is if the output size is 6 GB tan either I should be able to create 3 files of 2 GB each or I should be able to compress the file and send it to specified location.
Okay, I was able to play with Fileset yesterday, and here's my discovery:

Filesets can be allocated to multiple mountpoints per nodes (as long as the "one file per node" option is kept at False. So, basically within your configuration file, you need something like this:

Code: Select all

"node1"
...
files "[space]" {}
scratch "[space]" {}
export "/mount1..." {}
export "/mount2..." {}
export "/mount3..." {}
...
I know it's export, but I forgot what the other two normal options are. Keep throwing multiple mountpoints, and you CAN use the same mountpoints for each node (the naming conventions prevent it from overwriting each other).

However, you can not really control partitioning that way. So if you have 3 gb, the above examle will still give you 3 files if you ran one node. Other than that, the solution solves your needs.

I decided that Filesets are not for our needs, as we need a far more granular control of the output (a file per value of a specified key field, and ability to name the prefix our way.) The data we're throwing out generally find its way to our clients, so we need that control. If we have both capabilities while preserving the concept of the fileset stage, it would find its way into our feed programs. Until that happens, we use our own BuildOPS stages.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply