Seeking a best solution in DataStage

DWS · Post by **DWS** » Thu May 24, 2012 1:29 pm

Hi There,
I have a request of produing outputs from inputs both listed below. The key columns for source is the first 3 columns and the target is first 2 columns.

Indeed, I don't know how Seq_Num could be, some cases it is over a thousand. I think the solution for this case is to use Loop in Transformer Stage. But I have not used for this case.
I am asking if any of you have had this experience.

Any feedback is great appreciated.

Thank you very much in advance.
--------------------------------------------------------------

Input:
----------
Emp_ID Name Seq_Num Desc(Varchar(10)
100 Tiger 1 I am
100 Tiger 2 en
100 Tiger 3 ETL
100 Tiger 4 developer
100 Tiger 5 working fr
100 Tiger 6 om
100 Tiger 7 home
100 Tiger 8 everyday
100 Tiger 9 from 10 am
100 Tiger 10 to 10 pm.

Output:
----------
Emp_ID Name Desc(LongVarchar(10000))
100 Tiger I am an ETL developer working from home everyday from 10 am to 10 pm.
-----------------------------------------------------------------------------

jwiles · Post by **jwiles** » Thu May 24, 2012 3:08 pm

Using the transformer's loop functionality would be one way, although you could also use simple key break detection (keyChange from Sort stage and LastRowInGroup() in Transformer) as well. Looping and key break detection are both well-documented in the Transformer stage documentation in the Parallel Job Developer's Guide.

Regards,

DWS · Post by **DWS** » Fri May 25, 2012 9:19 am

Thank you, James.

You are right.
Both ways are good for this case.

I have done key break detection method and also Loop for horizontal pivot but not Vertical pivot.
Do you have any sample of this vertical looping?

Thanks,

jwiles · Post by **jwiles** » Fri May 25, 2012 2:44 pm

I don't have specific examples of vertical pivot loops. You should be able to queue a key's worth of rows, then simply build your final output string by concatenating the pieces from each row as you loop through (very similar to what you would do with keybreak detection).

The <a href="http://publib.boulder.ibm.com/infocente ... umentation </a> for transformer loops includes an example for data aggregation, which is what this basically is.

Regards,

mobashshar · Post by **mobashshar** » Sat May 26, 2012 4:38 pm

Use KeyBreak detection logic using stage variables to concatenate DESC field and use lastrowingroup function to get the last row.
Please let me know if you need more info

DWS · Post by **DWS** » Mon May 28, 2012 6:58 am

Thank you very much for your reply.
Do you have any sample of

Use KeyBreak detection logic using stage variables to concatenate DESC field and use lastrowingroup function to get the last row.

Thanks,

chetan.c · Post by **chetan.c** » Tue May 29, 2012 2:27 am

Hi,
StageVaribale:
svconcat
svconcat:Inputlink.desc

Output derivation
Map all columns to output except number as per your requirement but in place of desc column use the stage variable svconcat.

Constraint:
LastRowInGroup(Inputlink.id)

Sort incoming data as it is a necessary for LastRowInGroup() else job will abort.

From Documentation

LastRowInGroup(column) returns TRUE if the value of the specified column is about to change

Thanks,
Chetan.C

DWS · Post by **DWS** » Thu May 31, 2012 3:44 pm

Thanks, Chetan.

This issue has been solved. It has taken too long to solve this case. Because I have never realized that it needs only ONE node for the job.
I also added more stage variables.

Is it possible to to use mutiple nodes instead of one node?

Does anyone know why can not use mutiple nodes?

Thanks in advance.

:D

jwiles · Post by **jwiles** » Thu May 31, 2012 4:03 pm

You should be able to use multiple nodes provided that you properly partition your data so that all data for a key value remains in the same partition--a Core Concept of DataStage parallel processing!!!!!

By running in a single node, you have forced all of the data into a single partition, which meets the above requirement but limits the throughput of the job.

Based on your decriptions of the rules, the expected results and the example data provided, you could probably partition your data on:
a) EmpID alone or
b) EmpID and Name
and meet the requirements of the necessary job logic. This of course means not using "Auto" partitioning

Regards,

mobashshar · Post by **mobashshar** » Thu May 31, 2012 5:21 pm

Ok.. Here is the logic to do key break

Define four stage variables in the exact order.
svNewKey input.Emp_ID
svKeyChange svNewKey <> svLastKey or svLastKey = ""
svLastKey input.Emp_ID
svDesc if svKeyChange then input.Desc else svCounter : input.Desc

Make sure to Partition and Sort on Emp_ID and Sort Only on Seq_Num in Transformer Input

Use LastRowInGroup(input.Emp_ID) as constraint to pass out the last row per employee id.

Output columns will be mapped as follows:

Emp_ID = input.Emp_ID
Name = input.Name
Desc = svDesc

Hope this will help you as it will work on multiple nodes

mobashshar · Post by **mobashshar** » Thu May 31, 2012 5:25 pm

typo correction:

Define four stage variables in the exact order.
svNewKey = input.Emp_ID
svKeyChange = svNewKey <> svLastKey or svLastKey = ""
svLastKey = input.Emp_ID
svDesc = if svKeyChange then input.Desc else svDesc : input.Desc
[/list]

jwiles · Post by **jwiles** » Thu May 31, 2012 10:09 pm

In 8.5+, it's a little bit simpler:

svDesc: if svNewKey=1 then input.Desc else svDesc : input.Desc
svNewKey (InitialValue=1): if LastRowInGroup(input.EmpID) then 1 else 0

Output Link Constraint: LastRowInGroup(input.EmpID)

Be sure to sort prior to the transformer (a requirement of the transformer's key detection routines).

Regards,

DWS · Post by **DWS** » Fri Jun 01, 2012 5:20 am

Thank you, everyone for your input.

This case has been solved in single node only which means the input data has not properly partitioning in term of leting all grouped data in same partition.

I have to modify the job and give the proper partition set up.

Anyway, thanks for everyone's help and will let you the know final result.

Regards,

chulett · Post by **chulett** » Fri Jun 01, 2012 6:24 am

mobashshar wrote:typo correction:

FYI - there's no need to create a 'typo correction' post, simply go back and edit your post, correct it there.

DWS · Post by **DWS** » Fri Jun 01, 2012 11:39 am

Finally it works perfectly now.
Thanks for everyone who had input for this topic.

If anyone in future wants to know the exactlly steps, I could post those in the forum.

Thanks,