Composite columns should be Unique

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
s_avneet
Participant
Posts: 22
Joined: Wed Aug 31, 2016 8:28 am

Composite columns should be Unique

Post by s_avneet »

Hello All

New to the DataStage world, I have the following requirement.

Creating a parallel job:
Seqfile1 > Transform > Seqfile2.

Incoming file like:
col1,col2,col3,col4,col5

Need to check that if col1:col2:col3 concatenated form a unique combinations. Is it possible to handle in the transform stage itself? If not,
What other stages can i use? Sort and Filter??

I will have 100k records in my file, so please consider performance while suggesting any solution :)


S
Avneet
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Welcome!

While you could do 'all' of this in a transformer, it's not really a beginner topic. Either way, sorting would be required. However, what is the rest of your requirement - you haven't specified what your output needs to be or what happens if a concatenated value isn't unique.

ps. 100k is a nit, the job will take longer to start up and shut down then to process the records, I'll wager. Me, I'd use a Server job. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
s_avneet
Participant
Posts: 22
Joined: Wed Aug 31, 2016 8:28 am

Post by s_avneet »

Beginner to DataStage, kind of experienced in Integration world :) :)

Well, the non unique records to be captured as rejected and the job status will be returned back to the encapsulating sequence job.

If job status is ok(no records rejected) the file will be moved from staging_out folder to Outbound; using execute command stage.

Otherwise the job will be terminated, and the file from staging will be moved to backout.
Avneet
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Good to know. Informatica? Concepts are so very similar that we could speak in that language. :wink:

So, it sounds like you don't really need to create another file for use by downstream processes, just validate that the original file has no duplicates. Which means you could write any duplicates to the target and then in the sequence see if the file is empty or not. This rather than 'failing' the job in some manner.

Is that correct?
-craig

"You can never have too many knives" -- Logan Nine Fingers
s_avneet
Participant
Posts: 22
Joined: Wed Aug 31, 2016 8:28 am

Post by s_avneet »

Not Informatica, I come from a world of ESB and DataPower :)

You are right Craig.
The major task is to check for duplicates and I am not able to get it done on composite columns.
Avneet
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You will need to sort the data, unless your source data is guaranteed to be sorted on your columns that make up the composite.
While you can use something like the checksum stage to do this for you, I'd just put in a sort stage and sort on your 3 columns but add a key change indicator to the output. Then you can filter out any rows where the key change indicator is not set - those would be duplicates.
Post Reply