Job running long: Changed Seq file to Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Job running long: Changed Seq file to Dataset

Post by vijayrc »

Hi,
Here's a scenario.
Designed a job with Seq file as input and job runs in a minute
Now getting the Input as Dataset and so changed Seq File to Dataset, and it takes 20minutes.
[PS: I tried a simple job, copying Dataset to Seq file, thinking reading the dataset is taking long, but it ran in few seconds]
No partitioning involved...All run with the same configuration.

Tried deleting the Seq File and associated link, and creaetd a Dataset with a new link, but didn't help.

Any light on this appreciated
Thanks,
Vijay
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many nodes in your configuration file? Do the data files reside on the same disk volume as your sequential file?
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

ArndW wrote:How many nodes in your configuration file? Do the data files reside on the same disk volume as your sequential file? ...
Thanks.
[1] 4 Nodes and
[2] NO - Datasets and Sequential file reside in different mountpoints[but on the same disk volume]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What stage are you writing to in this job? Perhaps the dataset is repartitioning to match the output and therefore is taking longer.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

ArndW wrote:What stage are you writing to in this job? Perhaps the dataset is repartitioning to match the output and therefore is taking longer. ...
I have Dataset as Input, passed thru Transformer, Filter, SORT and Aggregator and finally funnelled thru to an output Dataset
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

kumar_s wrote:The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Kumar, Yes the Datasets and seq file have the same input and output directories. I have changed the partition to be SAME, and still the same effect. I'm trying with RCP OFF on few stages...will keep you posted.
Thanks, Vijay
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am fairly certain that your job is doing a significant amount of I/O sorting and repartitioning and that the slowdown is due to these stages as opposed to a dataset. Can you enable your APT_DUMP_SCORE to see what processes you are actually running?
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

add env variable APT_NO_SORT_INSERTION= true as job parameter and test ur job...with partition as "same" thru all the stages
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why?

You have not seen a score, so can not assert that sorts have been inserted, and do not know what partitioning has been used. In this job design, (Auto) should use the same partitioning right through, so forcing it to be Same achieves nothing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

ur job Dataset as Input, passed thru Transformer, Filter, SORT and Aggregator

Few assumption:
1. its a linear flow, just one stream, as u have given.
2. In transformer u have some derivations and in next stage ur are dropping/selecting some records.
3. SOrt is used to just group the rows before aggregating..

I would design it as:
Step 1: while creating the dataset in JOBA, I will sort the records on keys; which i will be using in aggragator for grouping. hash partition on the highest level of grouping key.

Step2: In JOB-B i will have
dataset >> Transformer > aggregator > o/p stage
1. I will use "same" partition throughout. so it wd hash thru out
2. combine the logic of transformer and filter stage. this would eliminate need of an extra stage(filter).
3. Since dataset is already sorted and partitioned on aggragator key, I dont have to insert a sort stage before aggregator.
4. I will use APT_NO_SORT_INSERTION, as its possible that DS inserts a sort before aggragator stage. You can check DUMP_SCORE, before adding this.
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please write in English, otherwise we will send Borat to your site.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

ray.wurlod wrote:Please write in English, otherwise we will send Borat to your site.
There is no word in English called "Borat" :shock: If it's a slang, then please correct your english.

Secondly, we are here not to correct anybody's english but to correct Datastage understanding. If language used by somebody is abusive or insulting, then we should raise a concern.

Thirdly, if you are against abbreviation. Then protest and avoid using all the abbreviations in this world. Even won't as won't is contraction of will not. **You never know, the word's I am using now, becomes part of dictionary tomorrow.**
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Borat, since it has a capital letter, is a proper noun (in both senses). It is a person's name, albeit a fictitious person, an alter ego of Sacha Baron Cohen. I leave the remaining research as an exercise for the reader.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply