I am trying to work on the logic for manipulating data and here is the scenario
Input data
Code: Select all
row no flag YEAR Code m_Cost_a m_Value_a n_cost_a n_Value_a
1 M 2000 1 10 20 '' ''
2 N 2000 2 '' '' 13 14
3 M 2000 3 15 16 '' ''
4 M 2001 3 19 20 '' ''
5 M 2001 4 23 40 '' ''
(please note that when the flag is M the columns that begin with n_ are blank and same applies for N )
My output requirement is to bring the data in a format that
I have to generate additional columns
m_cost_1_a, m_Value_1_a, n_cost_1_a, n_Value_1_a,
m_cost_2_a, m_value_2_a, n_cost_2_a, n_value_2_a
so on upto n_value_4_a (as there are 4 codes) for each of the code values and flag types
In the above example the output column set per record becomes
row_no,flag,YEAR,Code,
m_cost_1_a, m_Value_1_a, n_cost_1_a, Value_1_a,
m_cost_2_a, m_value_2_a, n_cost_2_a, n_value_2_a,
m_cost_3_a, m_value_3_a, n_cost_3_a, n_value_3_a,
m_cost_4_a, m_value_4_a, n_cost_4_a, n_value_4_a
so basically all I am looking is to bring the respective costs and values basing on the code value and leave the rest of the code and value fields null (or empty) for records belonging to a particular code type
I have generated a job to test this logic (please note my actual requirement deals with 40 plus code types and over a million records)
I successfully tested it and all i am doing is defining 16 stage variables one for each new column (in output) for example
stgvar_m_cost_1_a will be
Code: Select all
If inp.SOURCE_FLAG = 'M' Then
(If inp.CODE = '1' then inp.m_Cost_a Else SetNull())
Else SetNull()
and so on and so forth
This works well for my test data but when I actually run it across all my real time data, I see the following issue
for most of my rows I see that even though for say Code 1 and flag M in our first row of sample input data , I get the correct value for output column "m_cost_1_a" (10 in our example above ) , I also get values in the output column
"m_cost_4_a" values that actually belong to some other row which has code of 4 .
But actually I am expecting a Null value for "m_cost_4_a" field as each row from input should have 'values' for only one of the new columns (in output) and rest of the new columns should be Null (as per my logic)
I hope I have been clear in explaining the exact scenario.
I appreciate your time to go through this and point me to any possible reasons as why this might be happening.
Thank you again