How do i compare ??

aladap12 · Post by **aladap12** » Tue Jun 24, 2008 1:55 pm

DS Guru's

I had a problem in my job.

I have three input columns like ID , date, seq

my requirement is like i have to check id and date columns for dupilcate entries. if i have duplicate entries on id and date.. then i have to check seq..

and further i have to write "xx" to new column in out put for greater seq num

How do i achive this in datastage

Thanks
Kr

sreddy · Post by **sreddy** » Tue Jun 24, 2008 3:48 pm

Aladap12

This is not Datastage issue, you have to write a SQL query to remove duplicates at Source Database Stage ( userdefined ).

DELETE FROM
table_name A
WHERE
a.rowid >
ANY (SELECT B.rowid FROM table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
); -----------------------------------------------------
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.key_values = B.key_values);

You can try ..

aladap12 wrote:DS Guru's

I had a problem in my job.

I have three input columns like ID , date, seq

my requirement is like i have to check id and date columns for dupilcate entries. if i have duplicate entries on id and date.. then i have to check seq..

and further i have to write "xx" to new column in out put for greater seq num

How do i achive this in datastage

Thanks
Kr

ray.wurlod · Post by **ray.wurlod** » Tue Jun 24, 2008 4:17 pm

Use sorted data, and stage variables in a Transformer stage to detect duplicates. Search the forum - this technique has been described before.

Please ignore the advice from sreddy; it is too narrowly focussed. For example there is no guarantee that the source is a database table.

chulett · Post by **chulett** » Tue Jun 24, 2008 5:13 pm

Never mind the fact that there's no indication of a need to delete anything, just detect.

aladap12 · Post by **aladap12** » Wed Jun 25, 2008 7:26 am

Thanks Ray.. I have searched in the fourm but i did not find any answer to my problem.. any way i would like to explain my issuse more elaborately...

My source data contains several records..

My job design like ...One driving table, few look up table .. and i am writing to hashed file.

Like this we have set of jobs..As per my business req i am processing all the records form driving table and getting some fields form lookup tables inorder to make complete records...

My Input data in like

col1 col2 col3
------------------------------

123 , 6/24/08 , 237

124 , 6/24/08 , 238

123 , 6/24/08 , 239

124 , 6/23/08 , 240

Right now i am selecting all the records and writing to hashed file

My client asked me to send records like

IF value in Col 1 and col 2 or same (Both fields should have duplicate entries ) then i have to check column 3..
further i have to create a new column in hashed file and send xx to filed for greater value in col3..

does we achive this datastage server jobs..Pleas let me know..

Appriciate your Help

Thanks
Kr

ray.wurlod wrote:Use sorted data, and stage variables in a Transformer stage to detect duplicates. Search the forum - this technique has been described before.

Please ignore the advice from sreddy; it is too narro ...

chulett · Post by **chulett** » Wed Jun 25, 2008 7:57 am

Give us an example of what the output data should look like, given your input data example.

jdmiceli · Post by **jdmiceli** » Wed Jun 25, 2008 9:37 am

Also, if the data is going to a hashed file, then the key value rows are simply going to overwrite each other as they occur. That is most likely not the desired result, but it is something to be aware of just in case.

aladap12 · Post by **aladap12** » Wed Jun 25, 2008 12:27 pm

chulett wrote:Give us an example of what the output data should look like, given your input data example. ...

Thanks Ray.. I have searched in the fourm but i did not find any answer to my problem.. any way i would like to explain my issuse more elaborately...

My source data contains several records..

My job design like ...One driving table, few look up table .. and i am writing to hashed file.

Like this we have set of jobs..As per my business req i am processing all the records form driving table and getting some fields form lookup tables inorder to make complete records...

My Input data in like

col1 col2 col3
------------------------------

123 , 6/24/08 , 237

124 , 6/24/08 , 238

123 , 6/24/08 , 239

124 , 6/23/08 , 240

Right now i am selecting all the records and writing to hashed file, addition to this

My client asked me to send records like

IF value in Col 1 and col 2 or same (Both fields should have duplicate entries ) then i have to check column 3..
further i have to create a new column in hashed file and send xx to filed for greater value in col3..

Out put should be like

col1 col2 col3 Col4
------------------------------

123 , 6/24/08 , 237 , 000

124 , 6/24/08 , 238 , 000

123 , 6/24/08 , 239 , 45

124 , 6/23/08 , 240 , 000

And here in my hash file i have around 70 other colums in which i have around 6 Unique (key) colums.

Hope you were understand my Point

does we achive this datastage server jobs..Pleas let me know..

Appriciate your Help
Thanks
Kr

ray.wurlod · Post by **ray.wurlod** » Wed Jun 25, 2008 4:02 pm

Exactly as I posted earlier. Use stage variables for check for "changed" (Col1 and Col2) and "larger" (Col3) in data sorted by Col1, Col2 and Col3.
Derive your output columns accordingly.

PhilHibbs · Post by **PhilHibbs** » Thu Jun 26, 2008 6:22 am

aladap12 wrote: My Input data in like

col1 col2 col3
------------------------------
123 , 6/24/08 , 237
124 , 6/24/08 , 238
123 , 6/24/08 , 239
124 , 6/23/08 , 240

Out put should be like

col1 col2 col3 Col4
------------------------------
123 , 6/24/08 , 237 , 000
124 , 6/24/08 , 238 , 000
123 , 6/24/08 , 239 , 45
124 , 6/23/08 , 240 , 000

What is the 45 value? Is this just an indicator that that row is a duplicate key and that the col3 value is greater? Is it indicating that the col3 value is the highest for that col1/col2 key pair? What if this was your input:

col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241

Would you want the first two occurrences of 123,6/24/08 to both have 000 in the output and the third have 45? If so, you could process your data once to find the maximum value for each key (sort it and write to a hashed file, for example), and then do a lookup to check whether the current record is the highest value for the key. Alternatively, sort it by the key and by descending col3, and on the first occurrence of a key, write out the record and the 45 value to indicate the highest value, and write zero for subsequent occurrences of the same key.

Key change logic is implemented by having three Stage Variables like this:

svKey = inputlink.col1 : inputlink.col2
svKeyChanged = COMPARE( svKey, svOldKey ) <> 0
svOldKey = svKey

This works by comparing the new value of svKey with the old value of svOldKey, because the derivations are performed in the order that they are in the Stage Variables box, so svOldKey will have the value "left over" from the previous record. Make sure you init your SV's! The COMPARE function is important since "0" and "000" compare as equal when using the = operator.

In order to do key change logic, your data needs to be sorted on the key, which your example is not. So, whatever your solution, you need to process the data twice, even if the first is just a sort on the key.

aladap12 · Post by **aladap12** » Thu Jun 26, 2008 7:20 am

ray.wurlod wrote:Exactly as I posted earlier. Use stage variables for check for "changed" (Col1 and Col2) and "larger" (Col3) in data sorted by Col1, Col2 and Col3.
Derive your output columns accordingly. ...

Thanks Ray...

Could you please explain me in detail, what i have to do to achive this??

Appriciate your Time and Help

Thanks
Kr

aladap12 · Post by **aladap12** » Thu Jun 26, 2008 7:30 am

What is the 45 value? Is this just an indicator that that row is a duplicate key and that the col3 value is greater? Is it indicating that the col3 value is the highest for that col1/col2 key pair? What if this was your input:

col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241

Would you want the first two occurrences of 123,6/24/08 to both have 000 in the output and the third have 45? If so, you could process your data once to find the maximum value for each key (sort it and write to a hashed file, for example), and then do a lookup to check whether the current record is the highest value for the key. Alternatively, sort it by the key and by descending col3, and on the first occurrence of a key, write out the record and the 45 value to indicate the highest value, and write zero for subsequent occurrences of the same key.

Let me answer your Question:

Value 45 is an indicator that says user, this record have highest seq num..
If values in col1 and col2 have duplicate entries then i have to check col3 for higher base_row_id and further i have to pass one indicator in the output file with value like 45(New column in output file)

Secondly
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241

If my in put is like above then my output should be like

col1 col2 col3 col4
------------------------------
123 , 6/24/08 , 237 000
123 , 6/24/08 , 239 000
123 , 6/24/08 , 241 45

Hope you understand my problem

Thanks
Kr

PhilHibbs · Post by **PhilHibbs** » Thu Jun 26, 2008 7:36 am

OK. The answer is the same: either you process the file once and keep a record (in a hashed file) of the highest col3 value for every key pair, or you process the file in descending order of col1, col2, col3 and use key change logic to detect a new key pair (which will always be the highest value for that key due to the descending sort).

DSXchange

How do i compare ??

How do i compare ??

Re: How do i compare ??

Thanks for your response Ray

Thanks for your reply

Re: Thanks for your reply

Thanks Ray..

Thanks for your reply