Selecting distinct values and creating a list/array

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Selecting distinct values and creating a list/array

Post by pdntsap »

Hello,

The input is a flat file that has 'n' columns. I need to create an array/list to store the distinct values of one of the columns. I then need to execute a sequence of steps for each distinct value in the array/list. My questions are:

1. How can I create an array/list based on the distinct values of a column?
2. Can I then use a Start Loop End Loop acitvity and execute the sequence of steps for each value in the array/list?

Thanks for the help.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Vertical pivot.
2. Yes. Use a "list of things" loop.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

I have a sequential file input, then a Transformer Stage that uses stage variables to implement the pivot logic. Stage variables will be processed and output for each input row.
As an example:

Input Stage
column Variable
  • 1 1
    1 1
    2 1,2
    2 1,2
    3 1,2,3
    4 1,2,3,4
Howcan I just store the final Stage variable value (in this case 1,2,3,4) into a project level varaible so that it can be accessed in future jobs?

Thanks.
    Last edited by pdntsap on Wed Oct 12, 2011 12:51 pm, edited 2 times in total.
    arunkumarmm
    Participant
    Posts: 246
    Joined: Mon Jun 30, 2008 3:22 am
    Location: New York
    Contact:

    Post by arunkumarmm »

    Set the stage variable as user status of the job.
    Arun
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    Can user status be set for a parallel job? I understand that user status can be set only for server jobs.

    Thanks.
    arunkumarmm
    Participant
    Posts: 246
    Joined: Mon Jun 30, 2008 3:22 am
    Location: New York
    Contact:

    Post by arunkumarmm »

    I never tried it before but cant you use a BASIC transformer and set the status?
    Arun
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    Yes Arun. You are right.

    I created the following routine

    Code: Select all

    FUNCTION UserStatus(Arg) 
      Call DSSetUserStatus((Arg)) 
      Ans=Arg 
    RETURN(Ans) 
    and assigned UserStatus(StageVarn) to StageVarm. Now the problem is the routine gets called for every input row but I need it to be called only once at the very end.
    The thread below: viewtopic.php?t=100809 talks about calling it once either before or after the transform stage. I am looking into doing this but any help will be great.

    Thanks.
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    Hello,

    I have the following routine

    Code: Select all

    FUNCTION UserStatus(Arg) 
      Call DSSetUserStatus((Arg)) 
      Ans=Arg 
    RETURN(Ans) 
    I would like to set the UserStatus to the final value of a Stage Variable used in the transformer. I believe I can use the After-stage subroutine in the Transformer stage. My question is how what should I choose from the drop-down list of After-stage subroutine so that

    Code: Select all

    UserStatus(StageVariable)

    gets called.

    Thanks for the help.[/code]
    chulett
    Charter Member
    Charter Member
    Posts: 43085
    Joined: Tue Nov 12, 2002 4:34 pm
    Location: Denver, CO

    Post by chulett »

    If you have created it as a 'before/after' routine, it will show up in the drop down.
    -craig

    "You can never have too many knives" -- Logan Nine Fingers
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    But from the thread below:
    viewtopic.php?t=107023
    I thought that the routine must be created of type 'transform function'.

    I changed it to type 'before/after routine' but it does not return any value.

    Thanks.
    chulett
    Charter Member
    Charter Member
    Posts: 43085
    Joined: Tue Nov 12, 2002 4:34 pm
    Location: Denver, CO

    Post by chulett »

    For one used in a field derivation, yes. For use before/after use you must create it as such... and it doesn't need to "return" anything other than success, it just needs to put a value in USERSTATUS.

    Also realize that it uses a hashed structure to store it, so it can only store one value and it's the normal 'last one in wins' rule as with hashed files, so no matter how many values you send all you'll end up with is the last one.
    -craig

    "You can never have too many knives" -- Logan Nine Fingers
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    I created the routine below as before/after subroutine

    Code: Select all

    SUBROUTINE UserStatus(InputArg,ErrorCode) 
      Call DSSetUserStatus((InputArg)) 
      ErrorCode = 0
      Return
    
    Now UserStatus shows in the before/after stage routine of the transform stage. I seLECT UserStatus in the After-stage routine and pass input values as StageVarn, Errornum. But the value of StageVarn is not being passeD to the subroutine but the literal text StageVarn. Please let me know what I am doing wrong here.


    I remember reading in the forum about the 'last one in wins' issue and to overcome that I believe I need to store the values using a delimiter and then read the individual values using the filed() function. But that is a challenege is for another day I guess.
    chulett
    Charter Member
    Charter Member
    Posts: 43085
    Joined: Tue Nov 12, 2002 4:34 pm
    Location: Denver, CO

    Post by chulett »

    What is "StageVarn"? Realize that after stage (or after job) all you'll have access to are job parameters, anything else is out of scope. If you need to pass out processing or derived data, you'll need to go back to a transform function inside a Transformer.
    -craig

    "You can never have too many knives" -- Logan Nine Fingers
    pdntsap
    Premium Member
    Premium Member
    Posts: 107
    Joined: Mon Jul 04, 2011 5:38 pm

    Post by pdntsap »

    StageVarn was supposed to represent Stage variables, for example, StageVar1, StageVar2. Sorry that I was not more clear.

    So, like you said,if I need the stage variables to be stored and passed onto subsequent jobs/stages I need to go back to a Transform function and assign the transform function to a stage variable in a Transformer stage which will result in the transform function being exected for each input row.

    Thanks.
    chulett
    Charter Member
    Charter Member
    Posts: 43085
    Joined: Tue Nov 12, 2002 4:34 pm
    Location: Denver, CO

    Post by chulett »

    Correct... unless you can conditionally make the call. I've done something similar in the past where I call it once when the first record comes through and then never again as I know it would be a waste of time. To do this on the last record, however, would be a little more problematic unless you are running 8.5 which (from what I've read) has a new system variable (@EOD?) that is set to true on the last record.
    -craig

    "You can never have too many knives" -- Logan Nine Fingers
    Post Reply