Need help in storing global parameters/variables

pdntsap · Post by **pdntsap** » Wed Oct 19, 2011 4:10 pm

We have a file and we need to store the distinct values of a particular column. We then need to filter the rows of a second file based on a particular column whose value can be any one of the distinct values stored from the first file.

Let us call our files F1 and F2 and columns C1 and C2.
We read the distinct columns of C1 from file F1. Let us say the distinct values are 1,2,3. The file F2 will be filtered based on C2 being 1 or 2 or 3.

I am thinking that the above logic cannot be developed as a single job. I maybe need to develop a server job to store the distinct values of C1 into a UserStatus variable and then call the UserStatus variable in a parallel job to filter the rows of F2. I have not yet tried this approach. Any solutions to the above logic will be greatly appreciated.

Thanks.

ray.wurlod · Post by **ray.wurlod** » Wed Oct 19, 2011 4:24 pm

Just a thought. How about a unique Sort on the first file then a Join with the second file? No need for global anything.

pdntsap · Post by **pdntsap** » Thu Oct 20, 2011 9:15 am

Yes. Unique sort and then join should work but I was looking for ways to avoid using the join stage due to the number of records involved and I also need to repeat the same steps in a number of places and so thought having them stored as global variables might help.
I will implement sort and join and provide an update.

Thanks.

jwiles · Post by **jwiles** » Thu Oct 20, 2011 11:55 am

How many unique values are you expecting in F1? If you were considering dumping them into a UserStatus variable then there must not be too many. Perhaps you could use a lookup instead of the join and avoid needing to sort F2 (unless you have other business rules which require that anyway).

Are you repeating this logic several times within the same job? If so, unique sort F1 once and then use a copy stage to send the output to the various joins/lookups in the job. If across several jobs, a unique sort in a separate job to and dump into a dataset or lookup fileset for later consumption.

Regards,

pdntsap · Post by **pdntsap** » Fri Oct 21, 2011 7:01 am

Unique sort and join solves the purpose but as I feared takes quite a bit of time.

Thanks James for your suggestions and I will try them also.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 21, 2011 2:08 pm

Some things just do take time. Nine women can not make a baby in one month.