User-defined queries

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

User-defined queries

Post by bala_135 »

Hello All,

I have an Oracle table(X) which has 4 columns.
Two lookupfiles are created from this table.
For each lookupfile(i am using datasets),we are having user-defined queries with diffrent filter conditions.

Ouestion:-Which is the good practice.
1.Fetching all the records from the table and dumping on to one dataset once and using this dataset in the transformation job to filter the user-defined filter conditions.
2.Creating two datasets separatelty with the appropriate user-defined queries in the table(X) twice.

Assuming the records from table(X) is more than a million.

Regards,
Bala.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Unless you are performing a sparse lookup you will always be using two virtual Data Sets for two lookups via a Lookup stage. You can see this happening in the score; each is built using a LUTCreate operator.

So, if the queries are different, your second approach is probably to be preferred, as it would minimize the number of rows in each virtual Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Hello Ray,

Thanks for the response.I have to re-frame my question I am not doing a sparse lookup.My question is can i create two different lookup files for diffrent userdefined queries or one dataset and use the filter condition in transformer stage or filter stage to yield optimised performance.Then the lookup process.

Regards,
Bala.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can do either. That's my point. Whichever you do you get TWO virtual Data Sets. It's got nothing whatsoever with sparse lookup - it's how normal lookups work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Thanks Ray,

So i can go with any approach.Pefromance wise it works the same way.Just to clarify one more point
One the first occasion i am touching the database twice.
second one I am touching it once but fetching more rows and giving extra work to the datastage Engine.Any comments on this.

Regards,
Bala.
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

If the queries are completely different then use the second approach orelse use the first one with filter stage to parse the data depending on the filters into two different datasets.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Hi uslaslam1us/Ray,

Thanks are lot.
I have got a fair idea.
But my curiosity to strengthen it I am continuing it.
The queries are diffrent but each touch the same table.
I'll give examples.

query1:-Select Eno from Emp where sal>5000;
query2:-Select Dno from Emp where sal<5000
;

so i can do this in two ways.
1.Select Eno,Dno from Emp ;(one user-defined query)
Put the result onto the dataset and then use filter conditions at the ETL level.
2.Treat these queries differenly and put it onto different datasets.
Then perform lookups.
My current scenario is similar to the one.I wish to know the pros and cons touching the database twice vs filter large number of records at ETL.

Note:-These are just examples my queries are complex.

Regards,
Bala.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your second approach requires half the memory at run time of your first approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply