Seq_File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pandu80
Participant
Posts: 50
Joined: Fri Apr 08, 2005 5:56 pm

Seq_File

Post by pandu80 »

Hi,
I have a seq file like

X008999999999999999999 001
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 003
X008999999999999999999 002
Y008999999999999999999 001
Y008999999999999999999 002
Y008999999999999999999 002
Y008999999999999999999 004
Y008999999999999999999 005
X008999999999999999999 003
Y008999999999999999999 001
Y008999999999999999999 002


For every 'X' Record I can have multiple 'Y' Records. Last 3 digits shows the Seq_no of Record.
First 'X' Record.....001
Second 'X' Record.....002
Third 'X' Record.....003
I want to check for the duplicate 'Y' records in between two 'X' records.
In the above seq_file there is a duplicate 'Y' record with the seq_no 002, for the Second 'X' Record.
How do I check this.?.In this case I want to throw an error saying that there is a duplicates in the file.

PLEASE THROW SOME LIGHT ON THIS.
I have used some Stage Variable logic. But i am not succeed.

TIA.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can solve this using stage variable as mentioned below.

stgErrorFlag <- (default) 'N'
stgPrevSeqNo <- (default) '000'

stgErrorFlag <- if link.FirstChar = 'Y' and link.SeqNo = stgPrevSeqNo then 'Y' Else 'N'
stgPrevSeqNo <- if link.FirstChar = 'Y' then link.SeqNo Else '000'

In your link constraint, check for stgErrorFlag = 'Y' to write error rows.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I find myself wondering if a Complex Flat File stage might help here.

But, if the source data are sorted, an approach using stage variables to detect change-or-otherwise in the Y records will definitely work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi,

If I have understood your requirement correctly

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")
Last edited by Sunshine2323 on Sun May 08, 2005 10:10 am, edited 1 time in total.
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
pandu80
Participant
Posts: 50
Joined: Fri Apr 08, 2005 5:56 pm

Post by pandu80 »

hi
For sorting thw input data, I have searched this forum and i found

sort -k 1, 1n -k 2, 2n something like this.
In this is 'k' s the key column name, 1 is the column position and 1n indicates the column is numeric. Is my assumption is correct or not?.
Could anybody explain me some more pls.

TIA
Sunshine2323 wrote:Hi,

If I have understood your requirement correctly then after sorting the data on the Sequnce Number

You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Seq_File

Post by Sunshine2323 »

Hi,

Why are you not using the SORT STAGE?
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi,

-k keydef it defines a restricted sort key

The format of this defination is
field_start[type][,field_end[type]]

which defines a key begining at field_start and end at field_start.The characters at field_start and field_end are included in the key field.

You can use this option when you have an alphanumeric field and you want to sort only the numeric or only the alphabets part.

The option 1 and 1n mean 1st column from 1st to the last character i.e sort using all the characters of the field.

Hope this helps :)
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:The sort command is slightly different on different platforms. Check man sort to verify what options pertain to your system. For example, a numeric sort may be specified with a -n option rather than suffixing the sort key specification.
Which can make a huge difference if you get this wrong. Make sure you understand when you need a string sort and when you need a numeric sort and specify them accordingly... otherwise you'll send your 'sorted' data to an Aggregator (for example) and it will merrily implode with a 'Row out of Sequence' error. :wink: A string sort is the default, keep that in mind when working with numeric keys that are not all the exact same length and zero filled.

As mentioned, read your man pages. Talk to other UNIX savvy folks where you work. Get it working outside of your DataStage job first then plug the appropriate command in.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dstechdev
Participant
Posts: 10
Joined: Thu May 27, 2004 6:54 am
Location: Plano, Texas

Post by dstechdev »

I don't think sorting is the primary issue to the original question. From the input data described above, I don't see any relationship between x records and y records. Even if the data described is in the correct order, how would you know which x record a y record belonged to? Until this relationship is established, an adequate solution can't be provided.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

dstechdev wrote:I don't think sorting is the primary issue to the original question.
Maybe yes, maybe no. As you noted, things are still a little... unsettled... yet. :wink:

However, the question of sorting came up so it was addressed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi,

Ya indeed you do not need to sort the data the below logic will work provided the input records arrive in the same order as specified in the question.

Also, if that is not the case then sorting will have to be done and a relation between X and Y will have to be established as mentioned earlier by dstechdev
You can define a stage variable say StgCheckDup with derivation,

if Left(DSLink3.Value,1)='Y' and DSLink3.Sequence<>"001" then if RowProcCompareWithPreviousValue(DSLink3.Sequence) then 1 else 0 else 0

In the output column derivation you can say

Value------If StgCheckDup=0 then DSLink3.Value else UtilityMessageToLog("Duplicate in file")
Sequence-----If StgCheckDup=0 then DSLink3.Sequence else UtilityMessageToLog("Duplicate in file")
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
dstechdev
Participant
Posts: 10
Joined: Thu May 27, 2004 6:54 am
Location: Plano, Texas

Post by dstechdev »

What I meant to say here that if there indeed was a duplicate y record and it was erred off, you wouldn't know which x record/seq it belong to
john
Post Reply