flat files, memory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

flat files, memory

Post by tostay2003 »

Hi all,

Got a doubt regarding flat files. Would be pleased if someone could answer it. Say i have got 2 tables in oracle coming from different schemas. One being the lookup.

I know that using hash files and flat files speeds up the processing time. At the same time I am concerned about the size of these files. As they will be in server repository.

How should we plan the design (whether to incorporate flat files, hash files or oracles stages directly). Based on factors of

a) existing allowed memory to the flat files by UNIX (is there restriction by datastage on it?)
b) keeping in mind that the information in the tables might increase as days goes on.

Hope I posed the question well and in the right section of the forum.

Regards
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no DataStage restriction on flat file sizes. Your operating system or environment may inpose a limit on file sizes.

I assuming when you say 'memory' you actually are referring to 'disk storage'. Always best to land or hash data, in my opinion. If you are concerned about the amount of space being used and the fact that they will be 'in the server repository' then:

1) Park them on disks separate from your server installation. Yes, they will need to be on the same machine, but they don't have to be (and shouldn't be) where you installed DataStage - i.e. in the Projects - if that's what you mean by 'in the server repository'.

2) Buy more disk.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents. :)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since you are doing a lookup you are going to be restricted to using either a Hashed file or another type of table lookup.

The limit on sequential file sizes is imposed by your operating system, some systems still have a 2Gb limit in place, on many systems this limit can be changed either at the filesystem level and/or at the user level.

DataStage hashed files default to be limited at around 2Gb of data as well, but you can declare 64BIT when creating them in order to allow larger files to be used.

{oops, didn't see the other responses when I typed this}
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

DSguru2B wrote:If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents. :)
To clarify, only if you use a single OCI stage with join query will the entire work be done within the database. If you use an OCI reference lookup stage it's just like a reference cursor. For every input row the OCI reference lookup will query the database, which is probably nowhere near the speed we can achieve with a hashed file method.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Actually thats what i meant, in a single OCI stage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

Thanks everyone,

Chulett, yes it is disk storage, sorry wasnt careful to write properly.

ArndW, how do we declare 64 Bit, I mean where?

One more question,

is it advisable to create hash files & flat files in other job or in the same job?

Regards
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

I just had a look at some of the flat files and they r very huge approx 440000000 bytes (420 GB). How come? Does this mean that the Unix Administrators can change the size limit on the files???
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The default file limit depends upon the flavour of UNIX you are using and the version. It seems that files greater the 2Gb are enabled on your system.

I'll save you the effort of doing a search for 64BIT and point you at this sample thread that deals with how to create a 64BIT hashed file to store more than 2Gb of information.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

Thanks every

it was very helpful
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

You can also use hash calculator utility , where you can specify the size of your file and hash caculator dispalys syntax to create hash file.


Thanks,
Anupam
akarmarkar@smart-bridge.co.in
Post Reply