Header & Footer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Re: Header & Footer

Post by karthi_gana »

jwiles,

Thanks for your detailed reply. I will try it tomorrow morning. Time is 11 PM...going to sleep now...:)
Karthik
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I have a single data source and must create two output files, both needing a header and trailer. I do it all in one job with no intermediate files.

For each output file job stream, I split the data to the three record types (I must use the original input to set some of the columns in the header and trailer). Each split ends with a transformer that concatenates all the columns into one fixed-length Char column. The final stage before creating the file is a Funnel, type Sequential, and I set the link order to header-details-trailer. It was quite simple once I understood the coding requirements at each point.

In case anyone thinks to ask: My source is in xml format. It has two sets of repeating sub-tags, which must be split into separate destinations. My output is fixed-length as required by the next process that uses it.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

karthi_gana wrote:Actually i am creating parallel job for first time. so i am having lot of doubts and questions.
Fair enough, but again... anyone there that can help you with the basics?
-craig

"You can never have too many knives" -- Logan Nine Fingers
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Re: Header & Footer

Post by karthi_gana »

karthi_gana wrote:jwiles,

Thanks for your detailed reply. I will try it tomorrow morning. Time is 11 PM...going to sleep now...:)
I tried your method. It is working almost fine except the aggregator stage.

As we need to specify 'Properties -->Grouping Keys --> GROUP' atleast one column, i am getting row count for each group (which i don't want).

I just want to show the total no of rows in the footer.

I set the below properties:

Aggregation Type = Count Rows
Count Output column = row_cnt ( which i created in the output column)

I would like to get a single row from the Aggregator. i.e only total row count[/size]
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Re: Header & Footer

Post by karthi_gana »

I send single non-null field (Just hardcoded the value)to the Aggregator to get the total row count.

It gives the total row count. is there any other way to do this? am i using a kind of short cut to get the total row count?
Karthik
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What you're doing is correct. It's not a short cut. The "count rows" aggregation method is one of three available.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

I would like to know the performance of this method. can anybody tell me something about that?
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

For Header
Colum_Export Stage
Execution Mode --> Sequential
Preserve partitioning --> Propogate


For Footer
Colum_Export Stage
Execution Mode --> Sequential
Preserve partitioning --> Propogate

For Detail
Colum_Export Stage
Execution Mode --> Parallel
Preserve partitioning --> Set

is it correct?
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Those settings seem appropriate for the logic I suggested, so long as the Funnel Type is Sequence (processing input links in order) and Execution Mode is sequential (not parallel). That will be necessary in order to maintain the correct Header-Detail-Trailer order in your output file.

Actual performance of a job design is hard to predict, especially in a parallel environment. There are many factors that affect performance, covering everything from job design, data sources and targets to system configuration and processing load. If properly implemented, the design I proposed should perform relatively good on a well-managed and configured server. Keep in mind that this is only one method of accomplishing what you need....there are other methods that would work.

If you haven't already, you should invest some time in training for parallel job development. Your employer should be willing to offer something, either directly or through a third party. Also, following Ray's question, co-workers or others you know who already have experience are a great resource.

Good luck!

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

Jwiles,

Thanks for all your input. I appreciate your help.

Funnel Type = Sequence
Execution Type = Sequential
Preserve partitioning --> Propogate
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

In my team,I am the one and only person working on datastage. There are one admin team and some developers in other business group.They are all supporting some other projects. It is very hard to get input from them.

I started to read Parallel Job guide and started to read the other posts here. I hope i can get some good inputs from here.
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

karthi_gana wrote:In my team,I am the one and only person working on datastage. There are one admin team and some developers in other business group.They are all supporting some other projects. It is very hard to get input from them.
Very understandable, if unfortunate. Everyone has their own job to do!
karthi_gana also wrote:I started to read Parallel Job guide and started to read the other posts here. I hope i can get some good inputs from here.
There's a lot of good info available here....it has helped me a lot over the years just searching for information.

There is a parallel job tutorial that is included with the documentation. Also, IBM has a redbook or two about parallel job design. There're links available in a recent post (just search for redbook in this forum), and/or you can Google for parallel job design redbook. The PDF's are free for download and are an excellent resource.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

Using aggregator in the job will hit performance and also involves more job design. If you expect to have this functionality in more number of jobs, it will be easier to use routines.
I received an email from my ADMIN team as above yesterday.
is it true?
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

karthi_gana wrote:
Using aggregator in the job will hit performance and also involves more job design. If you expect to have this functionality in more number of jobs, it will be easier to use routines.
I received an email from my ADMIN team as above yesterday.
is it true?
This is a more or less generic statement based on limited knowledge of how to use the aggregator stage and of proper parallel job design, and the statement accomplishes nothing but spreading FUD: Fear, Uncertainty and Doubt.

An aggregator, when improperly used, can hurt performance. So can the sort, transformer, join and many other stages--again when improperly used. As to the comment "if you expect to have....it will be easier to use routines": is the admin referring to Basic routines? While the Basic transformer can run in parallel, using it will also hurt performance in a parallel job.

Regarding the comment "involves more job design", I'm certain several of us here on the forum can present examples of how using an aggregator greatly simplifies parallel job design. The design that has been suggested to you is one where the aggregator simplifies the design and the particular aggregation types specified (Count Rows and Sum) would not hurt performance. What was suggested was designed to be an efficient method to accomplish what you requested.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

karthi_gana wrote:I received an email from my ADMIN team as above yesterday.
is it true?
We believe you. So it is true (for some value of true anyway) that you received an email from your ADMIN tram yesterday.

As to its assertion, I would be asking for proof. Surely it would depend on who wrote the routine. A well-constructed Aggregator working with properly sorted and partitioned data is a very efficient beast indeed.

Never generalize.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply