Composite columns should be Unique

s_avneet · Post by **s_avneet** » Fri Sep 16, 2016 7:57 am

Hello All

New to the DataStage world, I have the following requirement.

Creating a parallel job:
Seqfile1 > Transform > Seqfile2.

Incoming file like:
col1,col2,col3,col4,col5

Need to check that if col1:col2:col3 concatenated form a unique combinations. Is it possible to handle in the transform stage itself? If not,
What other stages can i use? Sort and Filter??

I will have 100k records in my file, so please consider performance while suggesting any solution

S

chulett · Post by **chulett** » Fri Sep 16, 2016 8:45 am

Welcome!

While you could do 'all' of this in a transformer, it's not really a beginner topic. Either way, sorting would be required. However, what is the rest of your requirement - you haven't specified what your output needs to be or what happens if a concatenated value isn't unique.

ps. 100k is a nit, the job will take longer to start up and shut down then to process the records, I'll wager. Me, I'd use a Server job.

s_avneet · Post by **s_avneet** » Fri Sep 16, 2016 9:20 am

Beginner to DataStage, kind of experienced in Integration world

Well, the non unique records to be captured as rejected and the job status will be returned back to the encapsulating sequence job.

If job status is ok(no records rejected) the file will be moved from staging_out folder to Outbound; using execute command stage.

Otherwise the job will be terminated, and the file from staging will be moved to backout.

chulett · Post by **chulett** » Fri Sep 16, 2016 9:40 am

Good to know. Informatica? Concepts are so very similar that we could speak in that language.

So, it sounds like you don't really need to create another file for use by downstream processes, just validate that the original file has no duplicates. Which means you could write any duplicates to the target and then in the sequence see if the file is empty or not. This rather than 'failing' the job in some manner.

Is that correct?

s_avneet · Post by **s_avneet** » Sat Sep 17, 2016 3:02 am

Not Informatica, I come from a world of ESB and DataPower

You are right Craig.
The major task is to check for duplicates and I am not able to get it done on composite columns.

ArndW · Post by **ArndW** » Sat Sep 17, 2016 4:17 am

You will need to sort the data, unless your source data is guaranteed to be sorted on your columns that make up the composite.
While you can use something like the checksum stage to do this for you, I'd just put in a sort stage and sort on your 3 columns but add a key change indicator to the output. Then you can filter out any rows where the key change indicator is not set - those would be duplicates.