How do i compare ??
Moderators: chulett, rschirm, roy
How do i compare ??
DS Guru's
I had a problem in my job.
I have three input columns like ID , date, seq
my requirement is like i have to check id and date columns for dupilcate entries. if i have duplicate entries on id and date.. then i have to check seq..
and further i have to write "xx" to new column in out put for greater seq num
How do i achive this in datastage
Thanks
Kr
I had a problem in my job.
I have three input columns like ID , date, seq
my requirement is like i have to check id and date columns for dupilcate entries. if i have duplicate entries on id and date.. then i have to check seq..
and further i have to write "xx" to new column in out put for greater seq num
How do i achive this in datastage
Thanks
Kr
Re: How do i compare ??
Aladap12
This is not Datastage issue, you have to write a SQL query to remove duplicates at Source Database Stage ( userdefined ).
DELETE FROM
table_name A
WHERE
a.rowid >
ANY (SELECT B.rowid FROM table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
); -----------------------------------------------------
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.key_values = B.key_values);
You can try ..
This is not Datastage issue, you have to write a SQL query to remove duplicates at Source Database Stage ( userdefined ).
DELETE FROM
table_name A
WHERE
a.rowid >
ANY (SELECT B.rowid FROM table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
); -----------------------------------------------------
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.key_values = B.key_values);
You can try ..
aladap12 wrote:DS Guru's
I had a problem in my job.
I have three input columns like ID , date, seq
my requirement is like i have to check id and date columns for dupilcate entries. if i have duplicate entries on id and date.. then i have to check seq..
and further i have to write "xx" to new column in out put for greater seq num
How do i achive this in datastage
Thanks
Kr
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use sorted data, and stage variables in a Transformer stage to detect duplicates. Search the forum - this technique has been described before.
Please ignore the advice from sreddy; it is too narrowly focussed. For example there is no guarantee that the source is a database table.
Please ignore the advice from sreddy; it is too narrowly focussed. For example there is no guarantee that the source is a database table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for your response Ray
Thanks Ray.. I have searched in the fourm but i did not find any answer to my problem.. any way i would like to explain my issuse more elaborately...
My source data contains several records..
My job design like ...One driving table, few look up table .. and i am writing to hashed file.
Like this we have set of jobs..As per my business req i am processing all the records form driving table and getting some fields form lookup tables inorder to make complete records...
My Input data in like
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
124 , 6/24/08 , 238
123 , 6/24/08 , 239
124 , 6/23/08 , 240
Right now i am selecting all the records and writing to hashed file
My client asked me to send records like
IF value in Col 1 and col 2 or same (Both fields should have duplicate entries ) then i have to check column 3..
further i have to create a new column in hashed file and send xx to filed for greater value in col3..
does we achive this datastage server jobs..Pleas let me know..
Appriciate your Help
Thanks
Kr
My source data contains several records..
My job design like ...One driving table, few look up table .. and i am writing to hashed file.
Like this we have set of jobs..As per my business req i am processing all the records form driving table and getting some fields form lookup tables inorder to make complete records...
My Input data in like
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
124 , 6/24/08 , 238
123 , 6/24/08 , 239
124 , 6/23/08 , 240
Right now i am selecting all the records and writing to hashed file
My client asked me to send records like
IF value in Col 1 and col 2 or same (Both fields should have duplicate entries ) then i have to check column 3..
further i have to create a new column in hashed file and send xx to filed for greater value in col3..
does we achive this datastage server jobs..Pleas let me know..
Appriciate your Help
Thanks
Kr
ray.wurlod wrote:Use sorted data, and stage variables in a Transformer stage to detect duplicates. Search the forum - this technique has been described before.
Please ignore the advice from sreddy; it is too narro ...
Also, if the data is going to a hashed file, then the key value rows are simply going to overwrite each other as they occur. That is most likely not the desired result, but it is something to be aware of just in case.
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
Thanks for your reply
Thanks Ray.. I have searched in the fourm but i did not find any answer to my problem.. any way i would like to explain my issuse more elaborately...chulett wrote:Give us an example of what the output data should look like, given your input data example. ...
My source data contains several records..
My job design like ...One driving table, few look up table .. and i am writing to hashed file.
Like this we have set of jobs..As per my business req i am processing all the records form driving table and getting some fields form lookup tables inorder to make complete records...
My Input data in like
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
124 , 6/24/08 , 238
123 , 6/24/08 , 239
124 , 6/23/08 , 240
Right now i am selecting all the records and writing to hashed file, addition to this
My client asked me to send records like
IF value in Col 1 and col 2 or same (Both fields should have duplicate entries ) then i have to check column 3..
further i have to create a new column in hashed file and send xx to filed for greater value in col3..
Out put should be like
col1 col2 col3 Col4
------------------------------
123 , 6/24/08 , 237 , 000
124 , 6/24/08 , 238 , 000
123 , 6/24/08 , 239 , 45
124 , 6/23/08 , 240 , 000
And here in my hash file i have around 70 other colums in which i have around 6 Unique (key) colums.
Hope you were understand my Point
does we achive this datastage server jobs..Pleas let me know..
Appriciate your Help
Thanks
Kr
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Exactly as I posted earlier. Use stage variables for check for "changed" (Col1 and Col2) and "larger" (Col3) in data sorted by Col1, Col2 and Col3.
Derive your output columns accordingly.
Derive your output columns accordingly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: Thanks for your reply
What is the 45 value? Is this just an indicator that that row is a duplicate key and that the col3 value is greater? Is it indicating that the col3 value is the highest for that col1/col2 key pair? What if this was your input:aladap12 wrote: My Input data in like
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
124 , 6/24/08 , 238
123 , 6/24/08 , 239
124 , 6/23/08 , 240
Out put should be like
col1 col2 col3 Col4
------------------------------
123 , 6/24/08 , 237 , 000
124 , 6/24/08 , 238 , 000
123 , 6/24/08 , 239 , 45
124 , 6/23/08 , 240 , 000
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241
Would you want the first two occurrences of 123,6/24/08 to both have 000 in the output and the third have 45? If so, you could process your data once to find the maximum value for each key (sort it and write to a hashed file, for example), and then do a lookup to check whether the current record is the highest value for the key. Alternatively, sort it by the key and by descending col3, and on the first occurrence of a key, write out the record and the 45 value to indicate the highest value, and write zero for subsequent occurrences of the same key.
Key change logic is implemented by having three Stage Variables like this:
svKey = inputlink.col1 : inputlink.col2
svKeyChanged = COMPARE( svKey, svOldKey ) <> 0
svOldKey = svKey
This works by comparing the new value of svKey with the old value of svOldKey, because the derivations are performed in the order that they are in the Stage Variables box, so svOldKey will have the value "left over" from the previous record. Make sure you init your SV's! The COMPARE function is important since "0" and "000" compare as equal when using the = operator.
In order to do key change logic, your data needs to be sorted on the key, which your example is not. So, whatever your solution, you need to process the data twice, even if the first is just a sort on the key.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Thanks Ray..
ray.wurlod wrote:Exactly as I posted earlier. Use stage variables for check for "changed" (Col1 and Col2) and "larger" (Col3) in data sorted by Col1, Col2 and Col3.
Derive your output columns accordingly. ...
Thanks Ray...
Could you please explain me in detail, what i have to do to achive this??
Appriciate your Time and Help
Thanks
Kr
Thanks for your reply
What is the 45 value? Is this just an indicator that that row is a duplicate key and that the col3 value is greater? Is it indicating that the col3 value is the highest for that col1/col2 key pair? What if this was your input:
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241
Would you want the first two occurrences of 123,6/24/08 to both have 000 in the output and the third have 45? If so, you could process your data once to find the maximum value for each key (sort it and write to a hashed file, for example), and then do a lookup to check whether the current record is the highest value for the key. Alternatively, sort it by the key and by descending col3, and on the first occurrence of a key, write out the record and the 45 value to indicate the highest value, and write zero for subsequent occurrences of the same key.
Let me answer your Question:
Value 45 is an indicator that says user, this record have highest seq num..
If values in col1 and col2 have duplicate entries then i have to check col3 for higher base_row_id and further i have to pass one indicator in the output file with value like 45(New column in output file)
Secondly
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241
If my in put is like above then my output should be like
col1 col2 col3 col4
------------------------------
123 , 6/24/08 , 237 000
123 , 6/24/08 , 239 000
123 , 6/24/08 , 241 45
Hope you understand my problem
Thanks
Kr
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241
Would you want the first two occurrences of 123,6/24/08 to both have 000 in the output and the third have 45? If so, you could process your data once to find the maximum value for each key (sort it and write to a hashed file, for example), and then do a lookup to check whether the current record is the highest value for the key. Alternatively, sort it by the key and by descending col3, and on the first occurrence of a key, write out the record and the 45 value to indicate the highest value, and write zero for subsequent occurrences of the same key.
Let me answer your Question:
Value 45 is an indicator that says user, this record have highest seq num..
If values in col1 and col2 have duplicate entries then i have to check col3 for higher base_row_id and further i have to pass one indicator in the output file with value like 45(New column in output file)
Secondly
col1 col2 col3
------------------------------
123 , 6/24/08 , 237
123 , 6/24/08 , 239
123 , 6/24/08 , 241
If my in put is like above then my output should be like
col1 col2 col3 col4
------------------------------
123 , 6/24/08 , 237 000
123 , 6/24/08 , 239 000
123 , 6/24/08 , 241 45
Hope you understand my problem
Thanks
Kr
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
OK. The answer is the same: either you process the file once and keep a record (in a hashed file) of the highest col3 value for every key pair, or you process the file in descending order of col1, col2, col3 and use key change logic to detect a new key pair (which will always be the highest value for that key due to the descending sort).
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant