New to the DataStage world, I have the following requirement.
Creating a parallel job:
Seqfile1 > Transform > Seqfile2.
Incoming file like:
col1,col2,col3,col4,col5
Need to check that if col1:col2:col3 concatenated form a unique combinations. Is it possible to handle in the transform stage itself? If not,
What other stages can i use? Sort and Filter??
I will have 100k records in my file, so please consider performance while suggesting any solution
While you could do 'all' of this in a transformer, it's not really a beginner topic. Either way, sorting would be required. However, what is the rest of your requirement - you haven't specified what your output needs to be or what happens if a concatenated value isn't unique.
ps. 100k is a nit, the job will take longer to start up and shut down then to process the records, I'll wager. Me, I'd use a Server job.
-craig
"You can never have too many knives" -- Logan Nine Fingers
Good to know. Informatica? Concepts are so very similar that we could speak in that language.
So, it sounds like you don't really need to create another file for use by downstream processes, just validate that the original file has no duplicates. Which means you could write any duplicates to the target and then in the sequence see if the file is empty or not. This rather than 'failing' the job in some manner.
Is that correct?
-craig
"You can never have too many knives" -- Logan Nine Fingers
You will need to sort the data, unless your source data is guaranteed to be sorted on your columns that make up the composite.
While you can use something like the checksum stage to do this for you, I'd just put in a sort stage and sort on your 3 columns but add a key change indicator to the output. Then you can filter out any rows where the key change indicator is not set - those would be duplicates.