Checking unique values in all records.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tom
Participant
Posts: 46
Joined: Fri Oct 14, 2005 12:38 am

Checking unique values in all records.

Post by tom »

Hi Dsxians,

I have a requirement to check the date column value is same for all the records
from source file using transformer stage in the extract job.
For eg:
run_date

01/23/2008
01/23/2008
01/23/2008
01/23/2008
01/23/2008

If it is not same for all the records,I want to stop the process.

I tried writing the changed values to a temp file using the below condition and check the file size at sequence
level and if there is changed value(ie temp file size greater than zero) i am controlling the process by writing a
message to log Utilitymesssgetolog(" SDSD").

Condition is

Stage variables-

svCurr --> run_date
svUni -->If svPrev <> svCurr Then '1' Else '0'
svPrev --> run_date

Constraint-->Svuni =1
But in this condition,even if all the date values are same the first two values are writing to the file.

01/23/2008
01/23/2008

Could you please help me to acheive this.Thanks.
Devlopers corner
rajngt
Participant
Posts: 32
Joined: Wed Jan 04, 2006 6:22 am

Re: Checking unique values in all records.

Post by rajngt »

Modify the stage variable as below:

Stage variables-

svCurr --> run_date
svUni -->If svPrev <> svCurr Then '1' Else '0'
svPrev --> svCurr
rajngt
Participant
Posts: 32
Joined: Wed Jan 04, 2006 6:22 am

Post by rajngt »

Also assign initial value for svPrev as null (or as you like) and update svUni as below

svUni --> if svPrev is null then '0' else (If svPrev <> svCurr Then '1' Else '0' )
tom
Participant
Posts: 46
Joined: Fri Oct 14, 2005 12:38 am

Post by tom »

Hi rajngt,

Thanks for your reply.

Still 2 records are writing to the temp file.

Is there any other alternative way to acheive this?

Thanks
tom
Devlopers corner
rajngt
Participant
Posts: 32
Joined: Wed Jan 04, 2006 6:22 am

Post by rajngt »

Hi Tom,

Since you have mentioned that the source is file and you are using logic to find the file size to kick off the job, my suggestion will be instead of creating one job to capture unique records to file and then checking the file size at the sequence level, try the below one:

in sequencer execute the below sample command:

cut -f<field number> -d'<delimiter>' <file name> | uniq |wc -l

If your file looks like:

name1,field21,01/23/2008
name2,field22,01/23/2008
name3,field23,01/23/2008
name4,field24,01/23/2008
name5,field25,01/23/2008
name5,field25,01/23/2008

Here the date column is 3rd one, delimiter is comma.
Hence you need to execute below command for this:

cut -f3 -d',' <file name> | uniq |wc -l

This has to return 1 as value otherwise you are having different value in date column. By using trigger option, you can decide whether to run the job or not.
Post Reply