Final output in a file greater than 2GB size

pkothana · Post by **pkothana** » Fri Nov 14, 2003 3:37 am

Hi All,
It is required to read from a file set, do some trasformations and send the final output in a file to third party. The output can be bigger that 2GB but as we have a Operating System restriction of file size can't be greater than 2GB, i was wondering in what format we can have the output and how can we send that across to 3rd party?
Appreciate if anybody can suggest the best possible way of doing this.

Thanks & Regards
Pinkesh

Peytot · Post by **Peytot** » Fri Nov 14, 2003 7:25 am

You can Zip your file in PX

or split your data into your transform, the first 2M in file A, 2-4M in File B, ...

Pey

kjanes · Post by **kjanes** » Fri Nov 14, 2003 7:38 am

Are you running DS on UNIX/AIX or Windows? AIX and probably UNIX do not have a 2GB OS limitation.

We are building files over 2GB on AIX.

bigpoppa · Post by **bigpoppa** » Fri Nov 14, 2003 8:28 am

You can zip your files on output or you could write out a fileset instead of a single file. Writing a fileset is an export option.

-BP

jseclen · Post by **jseclen** » Fri Nov 14, 2003 10:15 am

Hi Pinkesh,

In your job you can define 2 links, in the first link use as constraint INROWNUM < 100000 and the second INROWNUM > 100000

(These values are referencials, you have to define the real range)

After, you can use both files as input to the other process.

Teej · Post by **Teej** » Sun Nov 16, 2003 9:28 pm

bigpoppa wrote:You can zip your files on output or you could write out a fileset instead of a single file. Writing a fileset is an export option.

Oh? We have been doing it by creating buildop stages to handle the file output. I will definitely have to research that feature (filesets) further tomorrow.

-T.J.

pkothana · Post by **pkothana** » Mon Nov 17, 2003 11:28 pm

Hi,
Thanks for your suggestions and time.

I tried both the options i.e. Compressing a file and using a Fileset.

Problem with Compress stage is that it needs minimum 1 output stream. I could not find a stage that can hold the output of Compress Stage (i.e. data in zipped format)

Problem with fileset stage is that it creates the files at the location specified at the node. Also we can only have as many files in the file set as the no. of nodes. I tried giving the maximum file size in the file set stage, but the job aborts if the output is of more size than max size specified.
What I want is if the output size is 6 GB tan either I should be able to create 3 files of 2 GB each or I should be able to compress the file and send it to specified location.

Can anybody help me out in this............

Thanks & Regards

Pinkesh

Teej · Post by **Teej** » Tue Nov 18, 2003 6:39 am

pkothana wrote:Problem with fileset stage is that it creates the files at the location specified at the node. Also we can only have as many files in the file set as the no. of nodes. I tried giving the maximum file size in the file set stage, but the job aborts if the output is of more size than max size specified.
What I want is if the output size is 6 GB tan either I should be able to create 3 files of 2 GB each or I should be able to compress the file and send it to specified location.

Okay, I was able to play with Fileset yesterday, and here's my discovery:

Filesets can be allocated to multiple mountpoints per nodes (as long as the "one file per node" option is kept at False. So, basically within your configuration file, you need something like this:

Code: Select all

"node1"
...
files "[space]" {}
scratch "[space]" {}
export "/mount1..." {}
export "/mount2..." {}
export "/mount3..." {}
...

I know it's export, but I forgot what the other two normal options are. Keep throwing multiple mountpoints, and you CAN use the same mountpoints for each node (the naming conventions prevent it from overwriting each other).

However, you can not really control partitioning that way. So if you have 3 gb, the above examle will still give you 3 files if you ran one node. Other than that, the solution solves your needs.

I decided that Filesets are not for our needs, as we need a far more granular control of the output (a file per value of a specified key field, and ability to name the prefix our way.) The data we're throwing out generally find its way to our clients, so we need that control. If we have both capabilities while preserving the concept of the fileset stage, it would find its way into our feed programs. Until that happens, we use our own BuildOPS stages.

-T.J.

DSXchange