Page 1 of 1

flat files, memory

Posted: Thu Apr 27, 2006 6:32 am
by tostay2003
Hi all,

Got a doubt regarding flat files. Would be pleased if someone could answer it. Say i have got 2 tables in oracle coming from different schemas. One being the lookup.

I know that using hash files and flat files speeds up the processing time. At the same time I am concerned about the size of these files. As they will be in server repository.

How should we plan the design (whether to incorporate flat files, hash files or oracles stages directly). Based on factors of

a) existing allowed memory to the flat files by UNIX (is there restriction by datastage on it?)
b) keeping in mind that the information in the tables might increase as days goes on.

Hope I posed the question well and in the right section of the forum.

Regards

Posted: Thu Apr 27, 2006 6:46 am
by chulett
There's no DataStage restriction on flat file sizes. Your operating system or environment may inpose a limit on file sizes.

I assuming when you say 'memory' you actually are referring to 'disk storage'. Always best to land or hash data, in my opinion. If you are concerned about the amount of space being used and the fact that they will be 'in the server repository' then:

1) Park them on disks separate from your server installation. Yes, they will need to be on the same machine, but they don't have to be (and shouldn't be) where you installed DataStage - i.e. in the Projects - if that's what you mean by 'in the server repository'.

2) Buy more disk.

Posted: Thu Apr 27, 2006 6:47 am
by DSguru2B
If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents. :)

Posted: Thu Apr 27, 2006 6:49 am
by ArndW
Since you are doing a lookup you are going to be restricted to using either a Hashed file or another type of table lookup.

The limit on sequential file sizes is imposed by your operating system, some systems still have a 2Gb limit in place, on many systems this limit can be changed either at the filesystem level and/or at the user level.

DataStage hashed files default to be limited at around 2Gb of data as well, but you can declare 64BIT when creating them in order to allow larger files to be used.

{oops, didn't see the other responses when I typed this}

Posted: Thu Apr 27, 2006 7:09 am
by kcbland
DSguru2B wrote:If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents. :)
To clarify, only if you use a single OCI stage with join query will the entire work be done within the database. If you use an OCI reference lookup stage it's just like a reference cursor. For every input row the OCI reference lookup will query the database, which is probably nowhere near the speed we can achieve with a hashed file method.

Posted: Thu Apr 27, 2006 7:11 am
by DSguru2B
Actually thats what i meant, in a single OCI stage.

Posted: Thu Apr 27, 2006 7:39 am
by tostay2003
Thanks everyone,

Chulett, yes it is disk storage, sorry wasnt careful to write properly.

ArndW, how do we declare 64 Bit, I mean where?

One more question,

is it advisable to create hash files & flat files in other job or in the same job?

Regards

Posted: Thu Apr 27, 2006 7:49 am
by tostay2003
I just had a look at some of the flat files and they r very huge approx 440000000 bytes (420 GB). How come? Does this mean that the Unix Administrators can change the size limit on the files???

Posted: Thu Apr 27, 2006 8:24 am
by ArndW
The default file limit depends upon the flavour of UNIX you are using and the version. It seems that files greater the 2Gb are enabled on your system.

I'll save you the effort of doing a search for 64BIT and point you at this sample thread that deals with how to create a 64BIT hashed file to store more than 2Gb of information.

Posted: Thu Apr 27, 2006 9:53 am
by tostay2003
Thanks every

it was very helpful

Posted: Thu Apr 27, 2006 11:18 pm
by sb_akarmarkar
You can also use hash calculator utility , where you can specify the size of your file and hash caculator dispalys syntax to create hash file.


Thanks,
Anupam
akarmarkar@smart-bridge.co.in