Seeking a best solution in DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Seeking a best solution in DataStage

Post by DWS »

Hi There,
I have a request of produing outputs from inputs both listed below. The key columns for source is the first 3 columns and the target is first 2 columns.

Indeed, I don't know how Seq_Num could be, some cases it is over a thousand. I think the solution for this case is to use Loop in Transformer Stage. But I have not used for this case.
I am asking if any of you have had this experience.

Any feedback is great appreciated.

Thank you very much in advance.
--------------------------------------------------------------


Input:
----------
Emp_ID Name Seq_Num Desc(Varchar(10)
100 Tiger 1 I am
100 Tiger 2 en
100 Tiger 3 ETL
100 Tiger 4 developer
100 Tiger 5 working fr
100 Tiger 6 om
100 Tiger 7 home
100 Tiger 8 everyday
100 Tiger 9 from 10 am
100 Tiger 10 to 10 pm.

Output:
----------
Emp_ID Name Desc(LongVarchar(10000))
100 Tiger I am an ETL developer working from home everyday from 10 am to 10 pm.
-----------------------------------------------------------------------------
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Using the transformer's loop functionality would be one way, although you could also use simple key break detection (keyChange from Sort stage and LastRowInGroup() in Transformer) as well. Looping and key break detection are both well-documented in the Transformer stage documentation in the Parallel Job Developer's Guide.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Post by DWS »

Thank you, James.

You are right.
Both ways are good for this case.

I have done key break detection method and also Loop for horizontal pivot but not Vertical pivot.
Do you have any sample of this vertical looping?

Thanks,
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I don't have specific examples of vertical pivot loops. You should be able to queue a key's worth of rows, then simply build your final output string by concatenating the pieces from each row as you loop through (very similar to what you would do with keybreak detection).

The <a href="http://publib.boulder.ibm.com/infocente ... umentation </a> for transformer loops includes an example for data aggregation, which is what this basically is.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Use KeyBreak detection logic using stage variables to concatenate DESC field and use lastrowingroup function to get the last row.
Please let me know if you need more info
DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Post by DWS »

Thank you very much for your reply.
Do you have any sample of
Use KeyBreak detection logic using stage variables to concatenate DESC field and use lastrowingroup function to get the last row.
Thanks,
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi,
StageVaribale:
svconcat
svconcat:Inputlink.desc

Output derivation
Map all columns to output except number as per your requirement but in place of desc column use the stage variable svconcat.

Constraint:
LastRowInGroup(Inputlink.id)

Sort incoming data as it is a necessary for LastRowInGroup() else job will abort.

From Documentation
LastRowInGroup(column) returns TRUE if the value of the specified column is about to change
Thanks,
Chetan.C
DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Post by DWS »

Thanks, Chetan.

This issue has been solved. It has taken too long to solve this case. Because I have never realized that it needs only ONE node for the job.
I also added more stage variables.

Is it possible to to use mutiple nodes instead of one node?

Does anyone know why can not use mutiple nodes?



Thanks in advance.

:D
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You should be able to use multiple nodes provided that you properly partition your data so that all data for a key value remains in the same partition--a Core Concept of DataStage parallel processing!!!!!

By running in a single node, you have forced all of the data into a single partition, which meets the above requirement but limits the throughput of the job.

Based on your decriptions of the rules, the expected results and the example data provided, you could probably partition your data on:
a) EmpID alone or
b) EmpID and Name
and meet the requirements of the necessary job logic. This of course means not using "Auto" partitioning :)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Ok.. Here is the logic to do key break

Define four stage variables in the exact order.
svNewKey input.Emp_ID
svKeyChange svNewKey <> svLastKey or svLastKey = ""
svLastKey input.Emp_ID
svDesc if svKeyChange then input.Desc else svCounter : input.Desc

Make sure to Partition and Sort on Emp_ID and Sort Only on Seq_Num in Transformer Input

Use LastRowInGroup(input.Emp_ID) as constraint to pass out the last row per employee id.

Output columns will be mapped as follows:

Emp_ID = input.Emp_ID
Name = input.Name
Desc = svDesc

Hope this will help you as it will work on multiple nodes
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

typo correction:

Define four stage variables in the exact order.
svNewKey = input.Emp_ID
svKeyChange = svNewKey <> svLastKey or svLastKey = ""
svLastKey = input.Emp_ID
svDesc = if svKeyChange then input.Desc else svDesc : input.Desc
[/list]
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

In 8.5+, it's a little bit simpler:

svDesc: if svNewKey=1 then input.Desc else svDesc : input.Desc
svNewKey (InitialValue=1): if LastRowInGroup(input.EmpID) then 1 else 0

Output Link Constraint: LastRowInGroup(input.EmpID)

Be sure to sort prior to the transformer (a requirement of the transformer's key detection routines).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Post by DWS »

Thank you, everyone for your input.

This case has been solved in single node only which means the input data has not properly partitioning in term of leting all grouped data in same partition.

I have to modify the job and give the proper partition set up.

Anyway, thanks for everyone's help and will let you the know final result.

Regards,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

mobashshar wrote:typo correction:
:idea: FYI - there's no need to create a 'typo correction' post, simply go back and edit your post, correct it there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DWS
Premium Member
Premium Member
Posts: 23
Joined: Thu Oct 13, 2011 11:47 am

Post by DWS »

Finally it works perfectly now.
Thanks for everyone who had input for this topic.

If anyone in future wants to know the exactlly steps, I could post those in the forum.

Thanks,
Post Reply