Need help in storing global parameters/variables

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Need help in storing global parameters/variables

Post by pdntsap »

We have a file and we need to store the distinct values of a particular column. We then need to filter the rows of a second file based on a particular column whose value can be any one of the distinct values stored from the first file.

Let us call our files F1 and F2 and columns C1 and C2.
We read the distinct columns of C1 from file F1. Let us say the distinct values are 1,2,3. The file F2 will be filtered based on C2 being 1 or 2 or 3.

I am thinking that the above logic cannot be developed as a single job. I maybe need to develop a server job to store the distinct values of C1 into a UserStatus variable and then call the UserStatus variable in a parallel job to filter the rows of F2. I have not yet tried this approach. Any solutions to the above logic will be greatly appreciated.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just a thought. How about a unique Sort on the first file then a Join with the second file? No need for global anything.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Yes. Unique sort and then join should work but I was looking for ways to avoid using the join stage due to the number of records involved and I also need to repeat the same steps in a number of places and so thought having them stored as global variables might help.
I will implement sort and join and provide an update.

Thanks.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

How many unique values are you expecting in F1? If you were considering dumping them into a UserStatus variable then there must not be too many. Perhaps you could use a lookup instead of the join and avoid needing to sort F2 (unless you have other business rules which require that anyway).

Are you repeating this logic several times within the same job? If so, unique sort F1 once and then use a copy stage to send the output to the various joins/lookups in the job. If across several jobs, a unique sort in a separate job to and dump into a dataset or lookup fileset for later consumption.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Unique sort and join solves the purpose but as I feared takes quite a bit of time.

Thanks James for your suggestions and I will try them also.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Some things just do take time. Nine women can not make a baby in one month.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply