flat files, memory
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 97
- Joined: Tue Feb 21, 2006 6:45 am
flat files, memory
Hi all,
Got a doubt regarding flat files. Would be pleased if someone could answer it. Say i have got 2 tables in oracle coming from different schemas. One being the lookup.
I know that using hash files and flat files speeds up the processing time. At the same time I am concerned about the size of these files. As they will be in server repository.
How should we plan the design (whether to incorporate flat files, hash files or oracles stages directly). Based on factors of
a) existing allowed memory to the flat files by UNIX (is there restriction by datastage on it?)
b) keeping in mind that the information in the tables might increase as days goes on.
Hope I posed the question well and in the right section of the forum.
Regards
Got a doubt regarding flat files. Would be pleased if someone could answer it. Say i have got 2 tables in oracle coming from different schemas. One being the lookup.
I know that using hash files and flat files speeds up the processing time. At the same time I am concerned about the size of these files. As they will be in server repository.
How should we plan the design (whether to incorporate flat files, hash files or oracles stages directly). Based on factors of
a) existing allowed memory to the flat files by UNIX (is there restriction by datastage on it?)
b) keeping in mind that the information in the tables might increase as days goes on.
Hope I posed the question well and in the right section of the forum.
Regards
There's no DataStage restriction on flat file sizes. Your operating system or environment may inpose a limit on file sizes.
I assuming when you say 'memory' you actually are referring to 'disk storage'. Always best to land or hash data, in my opinion. If you are concerned about the amount of space being used and the fact that they will be 'in the server repository' then:
1) Park them on disks separate from your server installation. Yes, they will need to be on the same machine, but they don't have to be (and shouldn't be) where you installed DataStage - i.e. in the Projects - if that's what you mean by 'in the server repository'.
2) Buy more disk.
I assuming when you say 'memory' you actually are referring to 'disk storage'. Always best to land or hash data, in my opinion. If you are concerned about the amount of space being used and the fact that they will be 'in the server repository' then:
1) Park them on disks separate from your server installation. Yes, they will need to be on the same machine, but they don't have to be (and shouldn't be) where you installed DataStage - i.e. in the Projects - if that's what you mean by 'in the server repository'.
2) Buy more disk.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents.![Smile :)](./images/smilies/icon_smile.gif)
My 2 cents.
![Smile :)](./images/smilies/icon_smile.gif)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Since you are doing a lookup you are going to be restricted to using either a Hashed file or another type of table lookup.
The limit on sequential file sizes is imposed by your operating system, some systems still have a 2Gb limit in place, on many systems this limit can be changed either at the filesystem level and/or at the user level.
DataStage hashed files default to be limited at around 2Gb of data as well, but you can declare 64BIT when creating them in order to allow larger files to be used.
{oops, didn't see the other responses when I typed this}
The limit on sequential file sizes is imposed by your operating system, some systems still have a 2Gb limit in place, on many systems this limit can be changed either at the filesystem level and/or at the user level.
DataStage hashed files default to be limited at around 2Gb of data as well, but you can declare 64BIT when creating them in order to allow larger files to be used.
{oops, didn't see the other responses when I typed this}
To clarify, only if you use a single OCI stage with join query will the entire work be done within the database. If you use an OCI reference lookup stage it's just like a reference cursor. For every input row the OCI reference lookup will query the database, which is probably nowhere near the speed we can achieve with a hashed file method.DSguru2B wrote:If its in the same database, just differenct schemas, and the size of both input and lookup are huge, i would do a lookup in just the OCI stage. This way you are avoiding bringing in all the data first to the DataStage server, and then doing the join. All your doing in this case is just passing the sql statement to the database, leaving the join to the database and getting the result back.
My 2 cents.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 97
- Joined: Tue Feb 21, 2006 6:45 am
-
- Participant
- Posts: 97
- Joined: Tue Feb 21, 2006 6:45 am
The default file limit depends upon the flavour of UNIX you are using and the version. It seems that files greater the 2Gb are enabled on your system.
I'll save you the effort of doing a search for 64BIT and point you at this sample thread that deals with how to create a 64BIT hashed file to store more than 2Gb of information.
I'll save you the effort of doing a search for 64BIT and point you at this sample thread that deals with how to create a 64BIT hashed file to store more than 2Gb of information.
-
- Participant
- Posts: 97
- Joined: Tue Feb 21, 2006 6:45 am
-
- Participant
- Posts: 232
- Joined: Fri Sep 30, 2005 4:52 am
- Contact:
You can also use hash calculator utility , where you can specify the size of your file and hash caculator dispalys syntax to create hash file.
Thanks,
Anupam
akarmarkar@smart-bridge.co.in
Thanks,
Anupam
akarmarkar@smart-bridge.co.in