Hi,
I Want to split my source file in to smaller input files. I have material data in that. If I split my source files in to smaller ones then I may face duplicate materials in different files. So how to sort the file on different fields in a file first so that I can split the sorted file in to smaller ones and then I can use my logic to remove duplicates.
Thanks,
Somaraju
How to sort
Moderators: chulett, rschirm, roy
How to sort
somaraju
Somaraju,
the UNIX "sort" command will let you sort on different fields, with or without the option of removing duplicates. The same applies to the DataStage SORT stage; you can choose your columns to sort on. As you've already stated, you need to sort prior to splitting the file into smaller ones.
What exactly are you asking?
the UNIX "sort" command will let you sort on different fields, with or without the option of removing duplicates. The same applies to the DataStage SORT stage; you can choose your columns to sort on. As you've already stated, you need to sort prior to splitting the file into smaller ones.
What exactly are you asking?
Hi Arnv,
I have a job that is having sort stage. The problem is it will sort only on that file. For example If iam having same material in first file and also the same material in the 10 th file will this sort stage sort on all files or it will sort columms in first file first then second and so on . if it is sorting on individual file then I will have duplicate materials.
thanks,
somaraju
I have a job that is having sort stage. The problem is it will sort only on that file. For example If iam having same material in first file and also the same material in the 10 th file will this sort stage sort on all files or it will sort columms in first file first then second and so on . if it is sorting on individual file then I will have duplicate materials.
thanks,
somaraju
somaraju
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
![Idea :idea:](./images/smilies/icon_idea.gif)
Exactly what is in the single source file? Exactly what do you want in the smaller files? How would you achieve this (in language, not in UNIX/DataStage terms)?
Then you can either convert this to appropriate commands and/or job designs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.