How to sort

somu_june · Post by **somu_june** » Fri Jan 06, 2006 3:24 pm

Hi,

I Want to split my source file in to smaller input files. I have material data in that. If I split my source files in to smaller ones then I may face duplicate materials in different files. So how to sort the file on different fields in a file first so that I can split the sorted file in to smaller ones and then I can use my logic to remove duplicates.

Thanks,
Somaraju

ArndW · Post by **ArndW** » Fri Jan 06, 2006 5:11 pm

Somaraju,

the UNIX "sort" command will let you sort on different fields, with or without the option of removing duplicates. The same applies to the DataStage SORT stage; you can choose your columns to sort on. As you've already stated, you need to sort prior to splitting the file into smaller ones.
What exactly are you asking?

somu_june · Post by **somu_june** » Fri Jan 06, 2006 6:47 pm

Hi Arnv,

I have a job that is having sort stage. The problem is it will sort only on that file. For example If iam having same material in first file and also the same material in the 10 th file will this sort stage sort on all files or it will sort columms in first file first then second and so on . if it is sorting on individual file then I will have duplicate materials.

thanks,
somaraju

kumar_s · Post by **kumar_s** » Fri Jan 06, 2006 10:19 pm

Hi,

"As you've already stated, you need to sort prior to splitting the file into smaller ones."

Pls look into Arnds statement.
Sort the whole file and later split it.

-Kumar

ray.wurlod · Post by **ray.wurlod** » Sat Jan 07, 2006 3:26 pm

Begin with a plan.

Exactly what is in the single source file? Exactly what do you want in the smaller files? How would you achieve this (in language, not in UNIX/DataStage terms)?

Then you can either convert this to appropriate commands and/or job designs.