Page 1 of 1

Seq_File

Posted: Fri May 06, 2005 6:41 am
by pandu80
Hi,
I have a seq file like

X008999999999999999999 001
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 003
X008999999999999999999 002
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 002
Y008999999999999999999 004
Y008999999999999999999 005
X008999999999999999999 003
Y008999999999999999999 001
Y008999999999999999999 002


For every 'X' Record I can have multiple 'Y' Records. Last 3 digits shows the Seq_no of Record.
First 'X' Record.....001
Second 'X' Record.....002
Third 'X' Record.....003
I want to check for the duplicate 'Y' records in between two 'X' records.
In the above seq_file there is a duplicate 'Y' record with the seq_no 002, for the Second 'X' Record.
How do I check this.?.In this case I want to throw an error saying that there is a duplicates in the file.

PLEASE THROW SOME LIGHT ON THIS.
I have used some Stage Variable logic. But i am not succeed.

TIA.

Posted: Fri May 06, 2005 6:50 am
by Sainath.Srinivasan
You can solve this using stage variable as mentioned below.

stgErrorFlag <- (default) 'N'
stgPrevSeqNo <- (default) '000'

stgErrorFlag <- if link.FirstChar = 'Y' and link.SeqNo = stgPrevSeqNo then 'Y' Else 'N'
stgPrevSeqNo <- if link.FirstChar = 'Y' then link.SeqNo Else '000'

In your link constraint, check for stgErrorFlag = 'Y' to write error rows.

Posted: Sat May 07, 2005 5:46 am
by ray.wurlod
I find myself wondering if a Complex Flat File stage might help here.

But, if the source data are sorted, an approach using stage variables to detect change-or-otherwise in the Y records will definitely work.

Posted: Sun May 08, 2005 1:01 am
by Sunshine2323
Hi,

If I have understood your requirement correctly

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Posted: Sun May 08, 2005 3:54 am
by pandu80
hi
For sorting thw input data, I have searched this forum and i found

sort -k 1, 1n -k 2, 2n something like this.
In this is 'k' s the key column name, 1 is the column position and 1n indicates the column is numeric. Is my assumption is correct or not?.
Could anybody explain me some more pls.

TIA
Sunshine2323 wrote:Hi,

If I have understood your requirement correctly then after sorting the data on the Sequnce Number

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Seq_File

Posted: Sun May 08, 2005 3:58 am
by Sunshine2323
Hi,

Why are you not using the SORT STAGE?

Posted: Sun May 08, 2005 4:15 am
by Sunshine2323
Hi,

-k keydef it defines a restricted sort key

The format of this defination is
field_start[type][,field_end[type]]

which defines a key begining at field_start and end at field_start.The characters at field_start and field_end are included in the key field.

You can use this option when you have an alphanumeric field and you want to sort only the numeric or only the alphabets part.

The option 1 and 1n mean 1st column from 1st to the last character i.e sort using all the characters of the field.

Hope this helps :)

Posted: Sun May 08, 2005 5:01 am
by ray.wurlod
The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.

Posted: Sun May 08, 2005 7:48 am
by chulett
ray.wurlod wrote:The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.
Which can make a huge difference if you get this wrong. Make sure you understand when you need a string sort and when you need a numeric sort and specify them accordingly... otherwise you'll send your 'sorted' data to an Aggregator (for example) and it will merrily implode with a 'Row out of Sequence' error. :wink: A string sort is the default, keep that in mind when working with numeric keys that are not all the exact same length and zero filled.

As mentioned, read your man pages. Talk to other UNIX savvy folks where you work. Get it working outside of your DataStage job first then plug the appropriate command in.

Posted: Sun May 08, 2005 8:07 am
by dstechdev
I don't think sorting is the primary issue to the original question. From the input data described above, I don't see any relationship between x records and y records. Even if the data described is in the correct order, how would you know which x record a y record belonged to? Until this relationship is established, an adequate solution can't be provided.

Posted: Sun May 08, 2005 8:22 am
by chulett
dstechdev wrote:I don't think sorting is the primary issue to the original question.
Maybe yes, maybe no. As you noted, things are still a little... unsettled... yet. :wink:

However, the question of sorting came up so it was addressed.

Posted: Sun May 08, 2005 10:17 am
by Sunshine2323
Hi,

Ya indeed you do not need to sort the data the below logic will work provided the input records arrive in the same order as specified in the question.

Also, if that is not the case then sorting will have to be done and a relation between X and Y will have to be established as mentioned earlier by dstechdev
You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")

Posted: Sun May 08, 2005 10:35 am
by dstechdev
What I meant to say here that if there indeed was a duplicate y record and it was erred off, you wouldn't know which x record/seq it belong to