Data manipulation logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Data manipulation logic

Post by devsonali »

Hello everybody.

I am trying to work on the logic for manipulating data and here is the scenario

Input data

Code: Select all

 row no         flag             YEAR       Code  m_Cost_a  m_Value_a n_cost_a n_Value_a
 1               M               2000         1       10          20       ''          ''
 2               N               2000         2       ''           ''         13           14
 3               M               2000         3       15          16       ''           ''
 4               M               2001         3       19          20       ''           ''
 5               M               2001         4       23          40       ''           ''
and so on an so forth
(please note that when the flag is M the columns that begin with n_ are blank and same applies for N )

My output requirement is to bring the data in a format that

I have to generate additional columns
m_cost_1_a, m_Value_1_a, n_cost_1_a, n_Value_1_a,
m_cost_2_a, m_value_2_a, n_cost_2_a, n_value_2_a
so on upto n_value_4_a (as there are 4 codes) for each of the code values and flag types

In the above example the output column set per record becomes

row_no,flag,YEAR,Code,
m_cost_1_a, m_Value_1_a, n_cost_1_a, Value_1_a,
m_cost_2_a, m_value_2_a, n_cost_2_a, n_value_2_a,
m_cost_3_a, m_value_3_a, n_cost_3_a, n_value_3_a,
m_cost_4_a, m_value_4_a, n_cost_4_a, n_value_4_a

so basically all I am looking is to bring the respective costs and values basing on the code value and leave the rest of the code and value fields null (or empty) for records belonging to a particular code type

I have generated a job to test this logic (please note my actual requirement deals with 40 plus code types and over a million records)

I successfully tested it and all i am doing is defining 16 stage variables one for each new column (in output) for example
stgvar_m_cost_1_a will be

Code: Select all

If inp.SOURCE_FLAG  = 'M' Then   
   (If inp.CODE = '1' then inp.m_Cost_a Else SetNull())
       Else SetNull()
and assign it to the new column (in output) m_Cost_a.
and so on and so forth

This works well for my test data but when I actually run it across all my real time data, I see the following issue

for most of my rows I see that even though for say Code 1 and flag M in our first row of sample input data , I get the correct value for output column "m_cost_1_a" (10 in our example above ) , I also get values in the output column
"m_cost_4_a" values that actually belong to some other row which has code of 4 .
But actually I am expecting a Null value for "m_cost_4_a" field as each row from input should have 'values' for only one of the new columns (in output) and rest of the new columns should be Null (as per my logic)

I hope I have been clear in explaining the exact scenario.

I appreciate your time to go through this and point me to any possible reasons as why this might be happening.

Thank you again
Last edited by devsonali on Fri Mar 15, 2013 10:08 pm, edited 1 time in total.
devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Post by devsonali »

Update -
I retested my code with real data but say 20 records but first sorting the data with code as key.

So here is what it looks like its going wrong

if the first row has a code of 1 with Flag as 'M' then all the columns of the output are populating correctly , which means the OUTPUT columns
row_no ,
flag,
YEAR,
Code
m_cost_1_a . m_Value_1_a , n_cost_1_a ,n_Value_1_a,
m_cost_2_a,m_value_2_a,n_cost_2_a,n_value_2_a,
m_cost_3_a,m_value_3_a,n_cost_3_a,n_value_3_a,
m_cost_4_a,m_value_4_a,n_cost_4_a,n_value_4_a

are populating correctly for first record (in the input data sample)

as 1, M, 2000, 1,10, 20,NULL,NULL,NULL,NULL,NULL (till the last column)

But the third record from input
3 M 2000 3 15 16 '' ''

is populating as
3, M, 2000,3,
10 20, null,null,
null,null,null,null,
3,15,null,null,
null,null,null,null

instead of
3, M, 2000,3,
null,null, null,null,
null,null,null,null,
3,15,null,null,
null,null,null,null

Hope this example clarifies the problem more in detail
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Please try the derivation as like below and check whether blank is getting populated.

Code: Select all

If inp.SOURCE_FLAG  = 'M' Then   
   (If inp.CODE = '1' then inp.m_Cost_a Else ' ')
       Else ' '
Thanks,
Prasanna
devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Post by devsonali »

Thank you Prasanna
But are we not substituting Null with a blank in here , for me Null or blank doesn't matter , its just that I am getting a value for new output columns based on previous records on top of my required code field value.

Do I make sense here ?
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

stage variables dont handle null values properly in px transformer. So give it as blank. When it works, then in the column expression you can use SetNull() for blank.

Just give a try. I read the entire logic i see a gap only here.


Or else try this - in the stage variables property intialize the stage variable to some value like -1. Check whether for the first record itself whether null is getting overridden for -1 in m_value_4_a
Last edited by prasannakumarkk on Sat Mar 16, 2013 9:10 am, edited 1 time in total.
Thanks,
Prasanna
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

reading this post will make you understand more.
Thanks,
Prasanna
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Your post is making me to explore more. :D
From 8.5 null is handled in stage variables.
http://www-01.ibm.com/support/docview.w ... wg21514921
Thanks,
Prasanna
devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Post by devsonali »

I am in 8.5 version now , however , i have started testing using spaces instead of null and it seems to be working.

So I think even if the field is null, assigning a stage variable to null will not work in my above logic.

I am still not convinced as why Setting this to Null will NOT work but setting it to space does (As I just mentioned i carried out few tests and it seems to work with space )

I will update once I test the whole data.
I appreciate your inputs but I will still like to know why setting it to Null is not working .
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

If you are in 8.5 and still null is not handled by stage variable then, as IBM says
Backward Compatibility
Set APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING=1, as a user-defined environment variable, to use the previous null handling mechanism at the project level
OR
Use the check box per Transformer Stage to use previous null handling for that transformer
please check these environment variable. they have specifically mentioned a stage variable can be null in 8.5. So it should
Thanks,
Prasanna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasannakumarkk wrote:If you are in 8.5 and still null is not handled by stage variable
It's not the stage variables are "not handling" the nulls in 8.5, the logic being used isn't. Do us a favor a post ALL of your derivations for the columns in question. Seems like that should be 4 in your test job?

I'd also ask you to please be a little more careful when posting example data. I already cleaned up the initial post a little with proper column delimiters and such but I'm not going to do that every time you post.

And a question on your code:

Code: Select all

If inp.SOURCE_FLAG  = 'M' Then
  If inp.CODE = '1' then 
    inp.m_Cost_a 
  Else 
    SetNull()
Else 
  SetNull()
Could be rephrased as:

Code: Select all

If inp.SOURCE_FLAG  = 'M'
  and inp.CODE = '1' then 
    inp.m_Cost_a 
Else 
    SetNull()
As they are equivalent. Is that what you had in mind? Bring the value to the column for "M" and "1" and for any other combination of the two columns set it to NULL.
-craig

"You can never have too many knives" -- Logan Nine Fingers
devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Post by devsonali »

prasannakumarkk wrote:stage variables dont handle null values properly in px transformer. So give it as blank. When it works, then in the column expression you can use SetNull() for blank.
Yes I tried this one before posting the question suspecting the same but initialized with 0 . This still did not work .

However as you mentioned the space does work for my test records.I currently do not have access to entire data and would be testing the whole data and updating the test results.
Thanks again
devsonali
Premium Member
Premium Member
Posts: 155
Joined: Sun Mar 18, 2007 2:31 pm

Post by devsonali »

-Craig, Thanks for the clean up and I will try to be more careful with column delimiters next time.

To answer your question on the code - Yes this is precisely what I am trying to do.

Each row of the output should have only one new column value populated (which is corresponding to the flag and code value) the rest of the new columns should be all null or empty
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Am not sure whether you checked this environment variable APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING and replied.
Thanks,
Prasanna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't see anything wrong with the derivation you posted, it should work exactly as you say you need it to. And if you are careful with spreading the derivation to the other columns and ensure you're using the correct input in the check to match the output column I don't see any reason why you wouldn't get the output you need.

Setting the output column column to either a space or a null shouldn't change the behaviour of the job unless there's some other critical piece of information you haven't passed along to us. Why don't we clarify a few things? First you've never specified what your "output" actually is - a record in what - a database table? A flat file? If it is the latter, detail for us the type of file. I'd also be curious what your complete job design is - source, target, intermediate stages, etc. Lastly, let us know how you are testing that this to determine if it works or not...

Obviously we're missing something here because on the surface this looks like a fairly simple problem to solve... just one of a tedious nature to ensure everything is correct. Don't worry about giving us too much information, I'd rather wade through that than make multiple posts pulling little tidbits of information out each time. :wink:

Lastly, I'd like to ask about your 8.5 version - are you current on your fixpacks? You could be doing everything correctly and be the victim of a known issue, hence the question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasannakumarkk wrote:Am not sure whether you checked this environment variable APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING and replied.
Seems to me all that would do is preclude the use of SetNull() in the stage variable derivations. However, it can certainly be tested... more grist for the mill. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply