Hello All,
My source table have 10 columns and I have to add a new column count in my trasformation to load into the target table which have 11 columns.
I need some idea in implementing the logic below.
I have to compare the first 5 columns with the previous rows and if they are equal loop through the 6th column which is code and check the conditions. the rest of the columns will load straight into the target without any logic.
if code = 20 and 30 then delete row of code = 20 and count should equal to 1
example
if
a,b,c,d,e,20
a,b,c,d,e,30
then delete record of code =20 (or take only record code = 30)
output: a,b,c,d,e,30
some more conditions
if code =4 then delete records of code = 0 and 2
if code <> 4 and code = 0 and 2 then take record of code = 02.
IF code =8 occured for the same person in more than one time during the year count only the first occurence.
I desperately need your help.
thanks,
Loop Conditions
Moderators: chulett, rschirm, roy
You will need to do two passes thru the source data.
The first pass will establish the highest importance code encountered for a group. You will want to write that to a hash file with a group key and the highest importance code encountered. Your job will reference the same hash file it is writing to, first to check if the row is already in the hash file and then to write to the hash file if a more important code is found. Be sure to use separate hash file icons, and manually create the hash file first.
Then, in the second pass at the source data, reference the hash file using the group key and with a constraint throw away the appropriate rows and let pass thru the others.
In your example, a code 30, if found, means throw away all rows that aren't the 30. In the first pass you can establish that for one group, the most important code found is a 30, whereas another group only 20's were found. In another group, you found a 0, but then found a 2, which overwrites the 0, but then in the next row you found a 4, which is allowed to overwrite the 2. When the first pass is done, your hash file has the greatest importance code for each group.
The first pass will establish the highest importance code encountered for a group. You will want to write that to a hash file with a group key and the highest importance code encountered. Your job will reference the same hash file it is writing to, first to check if the row is already in the hash file and then to write to the hash file if a more important code is found. Be sure to use separate hash file icons, and manually create the hash file first.
Then, in the second pass at the source data, reference the hash file using the group key and with a constraint throw away the appropriate rows and let pass thru the others.
In your example, a code 30, if found, means throw away all rows that aren't the 30. In the first pass you can establish that for one group, the most important code found is a 30, whereas another group only 20's were found. In another group, you found a 0, but then found a 2, which overwrites the 0, but then in the next row you found a 4, which is allowed to overwrite the 2. When the first pass is done, your hash file has the greatest importance code for each group.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Thanks for the quick reply. As I am new to this can you please briefly explain what you exactly mean by highest importance code encountered
thanks
and how we compare with the previous rows (only 5 columns out of 10 and code is not part of these columns)kcbland wrote:You will need to do two passes thru the source data.
The first pass will establish the highest importance code encountered for a group. You will want to write that to a hash file with a group key and the highest importance code encountered.
thanks
Look back at your examples in your post. You refer to "code" as the column that indicates what should be done with the rows.
You gave some sketchy rules, such as a code 30 means throw away all rows with a code 20. That tells me that 30 is important. You gave more rules such as a code 4 means throw away rows with code 0 and 2.
So, you have to determine which code is the one of most importance for a group of rows.
Your last statement about code is not part of the columns, well, where is it? You know we can only be helpful if you clearly articulate your issue and use real examples. Your a,b,c,d,e,20 example has probably not helped if it truly doesn't represent the situation. To me, I would group all rows with "a,b,c,d,e" and take the code value of highest importance according to the rules you haven't even posted. What constitutes importance you have to decide.
You gave some sketchy rules, such as a code 30 means throw away all rows with a code 20. That tells me that 30 is important. You gave more rules such as a code 4 means throw away rows with code 0 and 2.
So, you have to determine which code is the one of most importance for a group of rows.
Your last statement about code is not part of the columns, well, where is it? You know we can only be helpful if you clearly articulate your issue and use real examples. Your a,b,c,d,e,20 example has probably not helped if it truly doesn't represent the situation. To me, I would group all rows with "a,b,c,d,e" and take the code value of highest importance according to the rules you haven't even posted. What constitutes importance you have to decide.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sorry for the inconvinence. Here is the example
Names of the 10 columns
SR_ID,PAT_ID,TYP_CD,CNT_NBR,MB_NBR, EFF_DT,PR_CD,AGE, GRP_NBR,DIV_NBR,PR_CODE
If first six columns have the same values with different PR_CODE then based on the rules delete the complete records from the list.
If PR_CODE = 4 then do not count PR_CODE = 0 or 2
example:
Some of the rules:
If pr_code <> 4 and pr_code = 0 and 2 then count 2( meaning taking only the row of pr_code =2)
if code =8 and 6 then count 8
if code 32 and 10 then count 10
If PR_CODE = 8 occured for the same person more than one time during the reporting year count only the first occurence.
If PR_CODE = 32 and pr_code=10 occured for the same person more than one time during the reporting year count only the first occurence.
Is it a good idea to take only the six columns into a hash file with the key columns CNT_NBR,MB_NBR.
Need help in performing the conditions.
thanks
Names of the 10 columns
SR_ID,PAT_ID,TYP_CD,CNT_NBR,MB_NBR, EFF_DT,PR_CD,AGE, GRP_NBR,DIV_NBR,PR_CODE
If first six columns have the same values with different PR_CODE then based on the rules delete the complete records from the list.
If PR_CODE = 4 then do not count PR_CODE = 0 or 2
example:
Code: Select all
SR_ID,PAT_ID,TYP_CD,CNT_NBR,MB_NBR, EFF_DT,PR_CODE
123,H,CD,5986,666622,2004-12-31, 4
123,H,CD,5986,666622,2004-12-31, 0
123,H,CD,5986,666622,2004-12-31, 2
Output file should contain only the record below
123,H,CD,5986,666622,2004-12-31, (remaining columns), 4
If pr_code <> 4 and pr_code = 0 and 2 then count 2( meaning taking only the row of pr_code =2)
if code =8 and 6 then count 8
if code 32 and 10 then count 10
If PR_CODE = 8 occured for the same person more than one time during the reporting year count only the first occurence.
If PR_CODE = 32 and pr_code=10 occured for the same person more than one time during the reporting year count only the first occurence.
Is it a good idea to take only the six columns into a hash file with the key columns CNT_NBR,MB_NBR.
Code: Select all
SELECT SR_ID,PAT_ID,TYP_CD,CNT_NBR,MB_NBR, EFF_DT, PR_CODE
where PR_CODE in ('4','2','8','10'....)
group by 1,2,3,4,5,6
thanks
You need to construct a truth table. One of your codes is the highest pecking order value. Another code is the second, another the third.
For example:
A > B > C > D > E > F
If you read a row for a group, and its code is E, then lookup the current value in your hash file. If the current value is less important than E (like F), then overwrite the row in the hash lookup with the E value. On the next row, the same comparison will be performed. If the incoming value is more important than the current reference value, then overwrite it, otherwise you can throw it away.
You need to write a Function that establishes this truth table. It will be a big case statement, like:
Now in your transformation job, reference that hash file and reference the hash file. Depending on the value of the code in the hash file, act accordingly to the row.
For example:
A > B > C > D > E > F
If you read a row for a group, and its code is E, then lookup the current value in your hash file. If the current value is less important than E (like F), then overwrite the row in the hash lookup with the E value. On the next row, the same comparison will be performed. If the incoming value is more important than the current reference value, then overwrite it, otherwise you can throw it away.
You need to write a Function that establishes this truth table. It will be a big case statement, like:
Code: Select all
IncomingValue = Arg1
CurrentValue = Arg2
BEGIN CASE
Case IncomingValue = 'A' and (CurrentValue = 'B' OR CurrentValue = 'C' OR CurrentValue = 'D' OR CurrentValue = 'E' OR CurrentValue = 'F')
Ans = IncomingValue
Case IncomingValue = 'B' and (CurrentValue = 'C' OR CurrentValue = 'D' OR CurrentValue = 'E' OR CurrentValue = 'F')
Ans = IncomingValue
Case IncomingValue = 'C' and (CurrentValue = 'D' OR CurrentValue = 'E' OR CurrentValue = 'F')
Ans = IncomingValue
Case IncomingValue = 'D' and (CurrentValue = 'E' OR CurrentValue = 'F')
Ans = IncomingValue
Case IncomingValue = 'E' and (CurrentValue = 'F')
Ans = IncomingValue
Case @TRUE
Ans = CurrentValue
END CASE
Now in your transformation job, reference that hash file and reference the hash file. Depending on the value of the code in the hash file, act accordingly to the row.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Could that be done with input sorted by PR_CODE (ascending) and running through an Aggregator stage, grouping by the other columns (which seem to be constant for variations in PR_CODE), and using Last as the aggregate function on the PR_CODE column?
If my assumption holds, this should offer a simpler solution than writing a routine.
If my assumption holds, this should offer a simpler solution than writing a routine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.