Alternate logic required ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Alternate logic required ?

Post by chandra.shekhar@tcs.com »

Hi All,

I am preparing a job for a new requirement in our Project.
I have to find out the Top 20 Customers for each Branch.
Now my source is a Seq File(~150 million rows) and it contains 4 Amount columns, Customer No and Branch No.
I have to add these Amount columns then find out the customers with most Amount balance.
So I have to use two Transformers :( 1st for addition of 4 Amount columns and nullability check and 2nd for filtering out top 20 customers.

In the 2nd transformer, I am doing hash partition on Branch No and then sorting on Branch asc, Total Amount desc.
Then using a stage variable where I generate a counter and finally only those rows will go the output where the count is <= 20.

Any alternate logics will be appreciated.
Thanx and Regards,
ETL User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An alternate logic would be to use a ranking function in the presentation layer (e.g. a query) rather than using an ETL tool to perform a non-ETL task.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Thanks Ray,
But I cant use a rank function since my source is a Seq File.
Is there any way where I can eliminate a transformer ?
Or is addition of columns possible in some other stage ??
Thanx and Regards,
ETL User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Is it working for you right now? I don't see any burning reason to try to eliminate anything if it is.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

No Craig, the job is running fine.
Just wanted to know if there's anything which I may be missing.
Additionally I wanted tune the job for performance.
(As I said my source will have 150+ million records)
Also is addition of amount columns possible in transformer only ?
Or Modify stage can also do it ?
Last edited by chandra.shekhar@tcs.com on Wed Jun 05, 2013 5:52 am, edited 1 time in total.
Thanx and Regards,
ETL User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you have a 'performance problem' right now? Sure, a Modify stage can be used to 'add' columns. And FYI - you haven't stated what your target is.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Yeah Craig, My job completes in around 1.5 hrs, and I need to make it finish with in 30 min.
Can you give me the logic that can be used in Modify stage ? I have checked but didnt get the solution.
And my target is a DB2 table.
Thanx and Regards,
ETL User
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

A UNIX script may be? And then you can use the "External Source' to read the data.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure what logic you'd need that's not in the documentation:
Modify stage: Options category: Specification

This is a statement with one of the following the forms:
* DROP columnname [, columnname]
* KEEP columnname [, columnname]
* new_columnname [:new_type] = [explicit_conversion_function] old_columnname
Note the last one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

One options is to use Unix script in before job subroutine and get the total of all 4 amount column.
For this you can use something like awk '$6+=$5+=$4+=$3' where $n is the column number with amount fields.
This will remove your first transformer. You will end up using one transformer to filter out top 20 customer per branch.
Hope this helps.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@Craig,
I also found the same but I think the last logic is for conversion only, not for addition.
found the list of conversions logic from the same document but there was nothing for addition.

@mobashshar
I will try that, but what if any of the columns has null value.
Then may be total amount value value will be null.
I dont know how the addition works in UNIX.
Thanx and Regards,
ETL User
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Check it. Your total amount will not be null in case if one or more amount fields are null. It will only add the numeric values.
Post Reply