Seeking a best solution in DataStage
Moderators: chulett, rschirm, roy
Seeking a best solution in DataStage
Hi There,
I have a request of produing outputs from inputs both listed below. The key columns for source is the first 3 columns and the target is first 2 columns.
Indeed, I don't know how Seq_Num could be, some cases it is over a thousand. I think the solution for this case is to use Loop in Transformer Stage. But I have not used for this case.
I am asking if any of you have had this experience.
Any feedback is great appreciated.
Thank you very much in advance.
--------------------------------------------------------------
Input:
----------
Emp_ID Name Seq_Num Desc(Varchar(10)
100 Tiger 1 I am
100 Tiger 2 en
100 Tiger 3 ETL
100 Tiger 4 developer
100 Tiger 5 working fr
100 Tiger 6 om
100 Tiger 7 home
100 Tiger 8 everyday
100 Tiger 9 from 10 am
100 Tiger 10 to 10 pm.
Output:
----------
Emp_ID Name Desc(LongVarchar(10000))
100 Tiger I am an ETL developer working from home everyday from 10 am to 10 pm.
-----------------------------------------------------------------------------
I have a request of produing outputs from inputs both listed below. The key columns for source is the first 3 columns and the target is first 2 columns.
Indeed, I don't know how Seq_Num could be, some cases it is over a thousand. I think the solution for this case is to use Loop in Transformer Stage. But I have not used for this case.
I am asking if any of you have had this experience.
Any feedback is great appreciated.
Thank you very much in advance.
--------------------------------------------------------------
Input:
----------
Emp_ID Name Seq_Num Desc(Varchar(10)
100 Tiger 1 I am
100 Tiger 2 en
100 Tiger 3 ETL
100 Tiger 4 developer
100 Tiger 5 working fr
100 Tiger 6 om
100 Tiger 7 home
100 Tiger 8 everyday
100 Tiger 9 from 10 am
100 Tiger 10 to 10 pm.
Output:
----------
Emp_ID Name Desc(LongVarchar(10000))
100 Tiger I am an ETL developer working from home everyday from 10 am to 10 pm.
-----------------------------------------------------------------------------
Using the transformer's loop functionality would be one way, although you could also use simple key break detection (keyChange from Sort stage and LastRowInGroup() in Transformer) as well. Looping and key break detection are both well-documented in the Transformer stage documentation in the Parallel Job Developer's Guide.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
I don't have specific examples of vertical pivot loops. You should be able to queue a key's worth of rows, then simply build your final output string by concatenating the pieces from each row as you loop through (very similar to what you would do with keybreak detection).
The <a href="http://publib.boulder.ibm.com/infocente ... umentation </a> for transformer loops includes an example for data aggregation, which is what this basically is.
Regards,
The <a href="http://publib.boulder.ibm.com/infocente ... umentation </a> for transformer loops includes an example for data aggregation, which is what this basically is.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
Hi,
StageVaribale:
svconcat
svconcat:Inputlink.desc
Output derivation
Map all columns to output except number as per your requirement but in place of desc column use the stage variable svconcat.
Constraint:
LastRowInGroup(Inputlink.id)
Sort incoming data as it is a necessary for LastRowInGroup() else job will abort.
From Documentation
Chetan.C
StageVaribale:
svconcat
svconcat:Inputlink.desc
Output derivation
Map all columns to output except number as per your requirement but in place of desc column use the stage variable svconcat.
Constraint:
LastRowInGroup(Inputlink.id)
Sort incoming data as it is a necessary for LastRowInGroup() else job will abort.
From Documentation
Thanks,LastRowInGroup(column) returns TRUE if the value of the specified column is about to change
Chetan.C
Thanks, Chetan.
This issue has been solved. It has taken too long to solve this case. Because I have never realized that it needs only ONE node for the job.
I also added more stage variables.
Is it possible to to use mutiple nodes instead of one node?
Does anyone know why can not use mutiple nodes?
Thanks in advance.
:D
This issue has been solved. It has taken too long to solve this case. Because I have never realized that it needs only ONE node for the job.
I also added more stage variables.
Is it possible to to use mutiple nodes instead of one node?
Does anyone know why can not use mutiple nodes?
Thanks in advance.
:D
You should be able to use multiple nodes provided that you properly partition your data so that all data for a key value remains in the same partition--a Core Concept of DataStage parallel processing!!!!!
By running in a single node, you have forced all of the data into a single partition, which meets the above requirement but limits the throughput of the job.
Based on your decriptions of the rules, the expected results and the example data provided, you could probably partition your data on:
a) EmpID alone or
b) EmpID and Name
and meet the requirements of the necessary job logic. This of course means not using "Auto" partitioning
Regards,
By running in a single node, you have forced all of the data into a single partition, which meets the above requirement but limits the throughput of the job.
Based on your decriptions of the rules, the expected results and the example data provided, you could probably partition your data on:
a) EmpID alone or
b) EmpID and Name
and meet the requirements of the necessary job logic. This of course means not using "Auto" partitioning
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
Ok.. Here is the logic to do key break
Define four stage variables in the exact order.
svNewKey input.Emp_ID
svKeyChange svNewKey <> svLastKey or svLastKey = ""
svLastKey input.Emp_ID
svDesc if svKeyChange then input.Desc else svCounter : input.Desc
Make sure to Partition and Sort on Emp_ID and Sort Only on Seq_Num in Transformer Input
Use LastRowInGroup(input.Emp_ID) as constraint to pass out the last row per employee id.
Output columns will be mapped as follows:
Emp_ID = input.Emp_ID
Name = input.Name
Desc = svDesc
Hope this will help you as it will work on multiple nodes
Define four stage variables in the exact order.
svNewKey input.Emp_ID
svKeyChange svNewKey <> svLastKey or svLastKey = ""
svLastKey input.Emp_ID
svDesc if svKeyChange then input.Desc else svCounter : input.Desc
Make sure to Partition and Sort on Emp_ID and Sort Only on Seq_Num in Transformer Input
Use LastRowInGroup(input.Emp_ID) as constraint to pass out the last row per employee id.
Output columns will be mapped as follows:
Emp_ID = input.Emp_ID
Name = input.Name
Desc = svDesc
Hope this will help you as it will work on multiple nodes
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
In 8.5+, it's a little bit simpler:
svDesc: if svNewKey=1 then input.Desc else svDesc : input.Desc
svNewKey (InitialValue=1): if LastRowInGroup(input.EmpID) then 1 else 0
Output Link Constraint: LastRowInGroup(input.EmpID)
Be sure to sort prior to the transformer (a requirement of the transformer's key detection routines).
Regards,
svDesc: if svNewKey=1 then input.Desc else svDesc : input.Desc
svNewKey (InitialValue=1): if LastRowInGroup(input.EmpID) then 1 else 0
Output Link Constraint: LastRowInGroup(input.EmpID)
Be sure to sort prior to the transformer (a requirement of the transformer's key detection routines).
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Thank you, everyone for your input.
This case has been solved in single node only which means the input data has not properly partitioning in term of leting all grouped data in same partition.
I have to modify the job and give the proper partition set up.
Anyway, thanks for everyone's help and will let you the know final result.
Regards,
This case has been solved in single node only which means the input data has not properly partitioning in term of leting all grouped data in same partition.
I have to modify the job and give the proper partition set up.
Anyway, thanks for everyone's help and will let you the know final result.
Regards,