multiple access of a dataset within a job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

multiple access of a dataset within a job

Post by koolnitz »

Hi,

In my job, I am accessing a dataset for lookup purpose at five places. At all the Lookup, key columns are different. Dataset contains around 0.4 million records. Is it recommendable to access the same copy of dataset simultaneously while looking-up, as in my scenario?

Secondly, as lookup dataset is not quite small, I wanted to use Merge or Join instead of Lookup. But I want to capture all the rejected records while matching. So, left with Lookup option only. Join doesn't support Reject link, Merge rejects Update link's records not Master link's. Again, I had to go with Normal lookup as I cannot use Sparse lookup with dataset.

Any suggestions for better design?

Thanks in advance!
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

On one hand, there shouldn't be any issue referencing the same dataset multiple times in the same job - that really isn't any different than multiple individual jobs accessing the same dataset concurrently and that works just fine.

On the other hand, the copy stage would allow you to read the dataset once and then have multiple copies in your process. The difficulty there is how to layout your job - you could end up with links going all over the place.

Brad.
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

I'm not sure if I fully understand your question, using multiple look-ups against one Dataset. I have used multiple look-ups for the same dataset using the Copy Stage. Each lookup is performed by using its own unique set of fields. This will enable you to capture your rejections and will centralize your lookup.

:wink:
Charlie
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Post by koolnitz »

Guys, thanks for the suggestions. Even I was thinking to use Copy stage - unfortunately my job looks like a spidernet, and becomes very unreadable.
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

I'm not sure it would be classed as good practice but until an enhancement is made that lets you put a bend in a link, you can achieve the same with a copy stage.

If the copy stage isn't doing anything but passing the columns straight through and Force = False it wil be optimised out when the job is compiled.

Of course if your job really is like a spiders web, rather than just a couple of links crossing over, it's probably better from a maintenance perspective to split the job into several smaller ones.
Post Reply