Lookup Speed

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mcolen
Premium Member
Premium Member
Posts: 31
Joined: Wed Aug 11, 2004 8:59 am
Location: Florida

Lookup Speed

Post by mcolen »

Is there any speed difference in using a stage variable of If not(lnkLkUpBranch.NOTFOUND) and lnkLkUpBranch.Source_System_Id = 15 Then @TRUE Else @FALSE or setting as a constraint on a write to a hashed file of -- not(lnkLkUpBranch.NOTFOUND) and lnkLkUpBranch.Source_System_Id = 15 ????
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I dont think it will make a considerable amount of difference. Its just that stage varaibles get executed first, so the @TRUE or @FALSE will be decided before the row comes in.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mcolen
Premium Member
Premium Member
Posts: 31
Joined: Wed Aug 11, 2004 8:59 am
Location: Florida

Speed

Post by mcolen »

seems to make a large speed diff 220m source rows coming from hashed file to 2 hashed files based on source_cd
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

Hi
I prefer to write a constraint instead of other option..
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

I find improved times when I put the logic in stage variables..may be someone can explain better
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Re: Speed

Post by Krazykoolrohit »

mcolen wrote:seems to make a large speed diff 220m source rows coming from hashed file to 2 hashed files based on source_cd
It will make a differenece. true

but not something you would stand up and take notice of. It should be minor. thats what DSguru2B meant i suppose.
mcolen
Premium Member
Premium Member
Posts: 31
Joined: Wed Aug 11, 2004 8:59 am
Location: Florida

Speed

Post by mcolen »

Since I am speaking of a diff of over 600 rows a second with this amount of data we could be speaking of over 2 days diff to source this data
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

so what is the question now? You have the results with you. There will be a considerable amount of difference when you deal with very large number of rows as in your case. The speed difference increases as the data increases. How much? Thats a different question.
Kris

Where's the "Any" key?-Homer Simpson
mcolen
Premium Member
Premium Member
Posts: 31
Joined: Wed Aug 11, 2004 8:59 am
Location: Florida

Speed

Post by mcolen »

I may have the answer but I was hoping for a confirm that I am correct from one of the DS experts ie Ray, Craig, Ken
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Each constraint is ALWAYS evaluated for every source row. Each stage variable is evaluated for every source row. If you have multiple output links, it makes sense to use a stage variable to hold a result if that result is needed in more than one constraint.

If you have one output link, there should be no difference between a constraint and a constraint using a stage variable except for the minor difference of one extra line to assign the stage variable each row.

You may have experienced a dirty cache in your testing. If you ran the job once, you may have had data sitting in the OS's reading/writing cache and felt the benefit when running the job a second time. This "warmed up" cache is handy sometimes, but probably not in your experiment.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Technically speaking it shouldnt make much of a difference. But yes for large amount of records you might see that little difference add up and sum up.
I havent really played around with it so cant give you a definate answer. Maybe our gurus can provide some insight.
Actuallly if you think about it, the condition gets evaluated twice. Once in the stage variables and then in the constraint. It should be slower as compared to faster. Hmm, wierd :roll:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply