Thanks for any input.
![Idea :idea:](./images/smilies/icon_idea.gif)
Mona
Moderators: chulett, rschirm, roy
RamonaReed wrote:Some of our dimension tables have over 60 fields, that are all part of the key.
Thanks for any input.![]()
Mona
RamonaReed wrote:I am trying to find out what is more efficient, using hash tables as lookups or using OCI lookups.
Is there some set rule as to when we should use hash tables or when to use an OCI lookup?
Thanks for any input.![]()
Mona
Code: Select all
seq --> xfm --> hash
Code: Select all
MOD(inlink.customer_key, NumberOfPartitions) = MyPartitionNumber -1
Code: Select all
stg_CUSTOMER_#MyPartitionNumber#
Code: Select all
seq --> xfm w/lookups --> xfm w/lookups --> xfm w/lookups --> seq (inserts file)
--> seq (updates file)
Code: Select all
MOD(inlink.customer_key, NumberOfPartitions) = MyPartitionNumber -1
I was wondering why you do not have a compound key in the hash file. This works very well and usually eliminates duplicates. Lets say your key is country plus telephone number. You may never have a duplicate but if it was store plus customer then you could with store 10 and customer 11 and store 101 and customer 1. DataStage uses a @TM or char(251) as the key field separator. Usually this is never part of the data stream. You may be doing this but I think a lot of users are unclear how simple compound keys are in hash files and the flaws to concantenating them into a single field.When creating the real key to integer key map one good way is to concatenate the real keys into a character string and put it onto the dimension table. Then, when you do lookups you merely need to string together the key in the job and lookup on one char string in the hash table. The char string usually gives you pretty good performance on the lookup. I have called the string field 'dim_char_ky_fld' (since 1994).
Yes. One master job, with N instantiated clones. The job uses a partitioning algorithm to only get a portion of the source data.peternolan9 wrote: In your suggestions you would have the input file read by a transformer and have 'mypartitionnumber' passes as a parameter, is that correct?
Sure. No matter what, your job control has to know about N instances of a master job to run. Each instance will require an invocation id. Your routine, which fetches parameters from a file, could easily generate an extra parameter to add to its list of parameters and values. If you make your invocation ID contain the partition number, then you can derive the partition number from the invocation ID.peternolan9 wrote: My question is this....we have a really neat way of invoking jobs which has a routine fetch the parameters from a file and pass them to the job. So we don't have the ability to pass a parameter to a job like 'mypartitionnumber'. Is there something that gets allocated to the job that we could read like some 'instance number' or some other way other than a parameter that we could tell the job which number it is?
For example, flat files, hash files etc must refer to different files...
Really?peternolan9 wrote: Also, my understanding on caching is that there is a 1GB limit for ALL hash files at any single point in time. So breaking hash files into smaller portions but trynig to load all the small portions at once will not help....Am I mistaken on this? We will have a 40-48GB machine when we get to where we are going so we literally have tons and tons of memory...and my inital plan was to have a small number of large hash files shared by all jobs....this didn't work out so I have a large number of smaller hash files, but many of these hash files are in the 400-500MB range so we will run over the 1GB limit rigth away.....or is there something we are missign?