DSXchange

Posted: **Fri May 06, 2005 6:41 am**

Hi,
I have a seq file like

X008999999999999999999 001
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 003
X008999999999999999999 002
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 002
Y008999999999999999999 004
Y008999999999999999999 005
X008999999999999999999 003
Y008999999999999999999 001
Y008999999999999999999 002

For every 'X' Record I can have multiple 'Y' Records. Last 3 digits shows the Seq_no of Record.
First 'X' Record.....001
Second 'X' Record.....002
Third 'X' Record.....003
I want to check for the duplicate 'Y' records in between two 'X' records.
In the above seq_file there is a duplicate 'Y' record with the seq_no 002, for the Second 'X' Record.
How do I check this.?.In this case I want to throw an error saying that there is a duplicates in the file.

PLEASE THROW SOME LIGHT ON THIS.
I have used some Stage Variable logic. But i am not succeed.

TIA.

Posted: **Fri May 06, 2005 6:50 am**

You can solve this using stage variable as mentioned below.

stgErrorFlag <- (default) 'N'
stgPrevSeqNo <- (default) '000'

stgErrorFlag <- if link.FirstChar = 'Y' and link.SeqNo = stgPrevSeqNo then 'Y' Else 'N'
stgPrevSeqNo <- if link.FirstChar = 'Y' then link.SeqNo Else '000'

In your link constraint, check for stgErrorFlag = 'Y' to write error rows.

Posted: **Sat May 07, 2005 5:46 am**

I find myself wondering if a Complex Flat File stage might help here.

But, if the source data are sorted, an approach using stage variables to detect change-or-otherwise in the Y records will definitely work.

Posted: **Sun May 08, 2005 1:01 am**

Hi,

If I have understood your requirement correctly

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Posted: **Sun May 08, 2005 3:54 am**

hi
For sorting thw input data, I have searched this forum and i found

sort -k 1, 1n -k 2, 2n something like this.
In this is 'k' s the key column name, 1 is the column position and 1n indicates the column is numeric. Is my assumption is correct or not?.
Could anybody explain me some more pls.

TIA

Sunshine2323 wrote:Hi,

If I have understood your requirement correctly then after sorting the data on the Sequnce Number

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Posted: **Sun May 08, 2005 3:58 am**

Hi,

Why are you not using the SORT STAGE?

Posted: **Sun May 08, 2005 4:15 am**

Hi,

-k keydef it defines a restricted sort key

The format of this defination is
field_start[type][,field_end[type]]

which defines a key begining at field_start and end at field_start.The characters at field_start and field_end are included in the key field.

You can use this option when you have an alphanumeric field and you want to sort only the numeric or only the alphabets part.

The option 1 and 1n mean 1st column from 1st to the last character i.e sort using all the characters of the field.

Hope this helps

Posted: **Sun May 08, 2005 5:01 am**

The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.

Posted: **Sun May 08, 2005 7:48 am**

ray.wurlod wrote:The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.

Which can make a huge difference if you get this wrong. Make sure you understand when you need a string sort and when you need a numeric sort and specify them accordingly... otherwise you'll send your 'sorted' data to an Aggregator (for example) and it will merrily implode with a 'Row out of Sequence' error.

A string sort is the default, keep that in mind when working with numeric keys that are not all the exact same length and zero filled.

As mentioned, read your man pages. Talk to other UNIX savvy folks where you work. Get it working outside of your DataStage job first then plug the appropriate command in.

Posted: **Sun May 08, 2005 8:07 am**

I don't think sorting is the primary issue to the original question. From the input data described above, I don't see any relationship between x records and y records. Even if the data described is in the correct order, how would you know which x record a y record belonged to? Until this relationship is established, an adequate solution can't be provided.

Posted: **Sun May 08, 2005 8:22 am**

dstechdev wrote:I don't think sorting is the primary issue to the original question.

Maybe yes, maybe no. As you noted, things are still a little... unsettled... yet.

However, the question of sorting came up so it was addressed.

Posted: **Sun May 08, 2005 10:17 am**

Hi,

Ya indeed you do not need to sort the data the below logic will work provided the input records arrive in the same order as specified in the question.

Also, if that is not the case then sorting will have to be done and a relation between X and Y will have to be established as mentioned earlier by dstechdev

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Posted: **Sun May 08, 2005 10:35 am**

What I meant to say here that if there indeed was a duplicate y record and it was erred off, you wouldn't know which x record/seq it belong to

DSXchange

Seq_File

Seq_File

Seq_File