DSXchange

Posted: **Thu Jan 27, 2011 11:19 am**

jwiles,

Thanks for your detailed reply. I will try it tomorrow morning. Time is 11 PM...going to sleep now...

Posted: **Thu Jan 27, 2011 11:37 am**

I have a single data source and must create two output files, both needing a header and trailer. I do it all in one job with no intermediate files.

For each output file job stream, I split the data to the three record types (I must use the original input to set some of the columns in the header and trailer). Each split ends with a transformer that concatenates all the columns into one fixed-length Char column. The final stage before creating the file is a Funnel, type Sequential, and I set the link order to header-details-trailer. It was quite simple once I understood the coding requirements at each point.

In case anyone thinks to ask: My source is in xml format. It has two sets of repeating sub-tags, which must be split into separate destinations. My output is fixed-length as required by the next process that uses it.

Posted: **Thu Jan 27, 2011 6:46 pm**

karthi_gana wrote:Actually i am creating parallel job for first time. so i am having lot of doubts and questions.

Fair enough, but again... anyone there that can help you with the basics?

Posted: **Fri Jan 28, 2011 12:44 am**

karthi_gana wrote:jwiles,

Thanks for your detailed reply. I will try it tomorrow morning. Time is 11 PM...going to sleep now...

I tried your method. It is working almost fine except the aggregator stage.

As we need to specify 'Properties -->Grouping Keys --> GROUP' atleast one column, i am getting row count for each group (which i don't want).

I just want to show the total no of rows in the footer.

I set the below properties:

Aggregation Type = Count Rows
Count Output column = row_cnt ( which i created in the output column)

I would like to get a single row from the Aggregator. i.e only total row count[/size]

Posted: **Fri Jan 28, 2011 1:00 am**

I send single non-null field (Just hardcoded the value)to the Aggregator to get the total row count.

It gives the total row count. is there any other way to do this? am i using a kind of short cut to get the total row count?

Posted: **Fri Jan 28, 2011 1:25 am**

What you're doing is correct. It's not a short cut. The "count rows" aggregation method is one of three available.

Posted: **Fri Jan 28, 2011 6:32 am**

I would like to know the performance of this method. can anybody tell me something about that?

Posted: **Fri Jan 28, 2011 6:42 am**

For Header
Colum_Export Stage
Execution Mode --> Sequential
Preserve partitioning --> Propogate

For Footer
Colum_Export Stage
Execution Mode --> Sequential
Preserve partitioning --> Propogate

For Detail
Colum_Export Stage
Execution Mode --> Parallel
Preserve partitioning --> Set

is it correct?

Posted: **Fri Jan 28, 2011 7:50 am**

Those settings seem appropriate for the logic I suggested, so long as the Funnel Type is Sequence (processing input links in order) and Execution Mode is sequential (not parallel). That will be necessary in order to maintain the correct Header-Detail-Trailer order in your output file.

Actual performance of a job design is hard to predict, especially in a parallel environment. There are many factors that affect performance, covering everything from job design, data sources and targets to system configuration and processing load. If properly implemented, the design I proposed should perform relatively good on a well-managed and configured server. Keep in mind that this is only one method of accomplishing what you need....there are other methods that would work.

If you haven't already, you should invest some time in training for parallel job development. Your employer should be willing to offer something, either directly or through a third party. Also, following Ray's question, co-workers or others you know who already have experience are a great resource.

Good luck!

Regards,

Posted: **Fri Jan 28, 2011 8:06 am**

Jwiles,

Thanks for all your input. I appreciate your help.

Funnel Type = Sequence
Execution Type = Sequential
Preserve partitioning --> Propogate

Posted: **Fri Jan 28, 2011 8:14 am**

In my team,I am the one and only person working on datastage. There are one admin team and some developers in other business group.They are all supporting some other projects. It is very hard to get input from them.

I started to read Parallel Job guide and started to read the other posts here. I hope i can get some good inputs from here.

Posted: **Fri Jan 28, 2011 8:31 am**

karthi_gana wrote:In my team,I am the one and only person working on datastage. There are one admin team and some developers in other business group.They are all supporting some other projects. It is very hard to get input from them.

Very understandable, if unfortunate. Everyone has their own job to do!

karthi_gana also wrote:I started to read Parallel Job guide and started to read the other posts here. I hope i can get some good inputs from here.

There's a lot of good info available here....it has helped me a lot over the years just searching for information.

There is a parallel job tutorial that is included with the documentation. Also, IBM has a redbook or two about parallel job design. There're links available in a recent post (just search for redbook in this forum), and/or you can Google for parallel job design redbook. The PDF's are free for download and are an excellent resource.

Regards,

Posted: **Wed Feb 02, 2011 12:32 am**

Using aggregator in the job will hit performance and also involves more job design. If you expect to have this functionality in more number of jobs, it will be easier to use routines.

I received an email from my ADMIN team as above yesterday.
is it true?

Posted: **Wed Feb 02, 2011 1:57 am**

karthi_gana wrote:
Using aggregator in the job will hit performance and also involves more job design. If you expect to have this functionality in more number of jobs, it will be easier to use routines.
I received an email from my ADMIN team as above yesterday.
is it true?

This is a more or less generic statement based on limited knowledge of how to use the aggregator stage and of proper parallel job design, and the statement accomplishes nothing but spreading FUD: Fear, Uncertainty and Doubt.

An aggregator, when improperly used, can hurt performance. So can the sort, transformer, join and many other stages--again when improperly used. As to the comment "if you expect to have....it will be easier to use routines": is the admin referring to Basic routines? While the Basic transformer can run in parallel, using it will also hurt performance in a parallel job.

Regarding the comment "involves more job design", I'm certain several of us here on the forum can present examples of how using an aggregator greatly simplifies parallel job design. The design that has been suggested to you is one where the aggregator simplifies the design and the particular aggregation types specified (Count Rows and Sum) would not hurt performance. What was suggested was designed to be an efficient method to accomplish what you requested.

Regards,

Posted: **Wed Feb 02, 2011 3:37 am**

karthi_gana wrote:I received an email from my ADMIN team as above yesterday.
is it true?

We believe you. So it is true (for some value of true anyway) that you received an email from your ADMIN tram yesterday.

As to its assertion, I would be asking for proof. Surely it would depend on who wrote the routine. A well-constructed Aggregator working with properly sorted and partitioned data is a very efficient beast indeed.

Never generalize.

DSXchange

Header & Footer

Re: Header & Footer

Re: Header & Footer

Re: Header & Footer