Job running long: Changed Seq file to Dataset
Moderators: chulett, rschirm, roy
Job running long: Changed Seq file to Dataset
Hi,
Here's a scenario.
Designed a job with Seq file as input and job runs in a minute
Now getting the Input as Dataset and so changed Seq File to Dataset, and it takes 20minutes.
[PS: I tried a simple job, copying Dataset to Seq file, thinking reading the dataset is taking long, but it ran in few seconds]
No partitioning involved...All run with the same configuration.
Tried deleting the Seq File and associated link, and creaetd a Dataset with a new link, but didn't help.
Any light on this appreciated
Thanks,
Vijay
Here's a scenario.
Designed a job with Seq file as input and job runs in a minute
Now getting the Input as Dataset and so changed Seq File to Dataset, and it takes 20minutes.
[PS: I tried a simple job, copying Dataset to Seq file, thinking reading the dataset is taking long, but it ran in few seconds]
No partitioning involved...All run with the same configuration.
Tried deleting the Seq File and associated link, and creaetd a Dataset with a new link, but didn't help.
Any light on this appreciated
Thanks,
Vijay
The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Kumar, Yes the Datasets and seq file have the same input and output directories. I have changed the partition to be SAME, and still the same effect. I'm trying with RCP OFF on few stages...will keep you posted.kumar_s wrote:The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Thanks, Vijay
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Why?
You have not seen a score, so can not assert that sorts have been inserted, and do not know what partitioning has been used. In this job design, (Auto) should use the same partitioning right through, so forcing it to be Same achieves nothing.
You have not seen a score, so can not assert that sorts have been inserted, and do not know what partitioning has been used. In this job design, (Auto) should use the same partitioning right through, so forcing it to be Same achieves nothing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
ur job Dataset as Input, passed thru Transformer, Filter, SORT and Aggregator
Few assumption:
1. its a linear flow, just one stream, as u have given.
2. In transformer u have some derivations and in next stage ur are dropping/selecting some records.
3. SOrt is used to just group the rows before aggregating..
I would design it as:
Step 1: while creating the dataset in JOBA, I will sort the records on keys; which i will be using in aggragator for grouping. hash partition on the highest level of grouping key.
Step2: In JOB-B i will have
dataset >> Transformer > aggregator > o/p stage
1. I will use "same" partition throughout. so it wd hash thru out
2. combine the logic of transformer and filter stage. this would eliminate need of an extra stage(filter).
3. Since dataset is already sorted and partitioned on aggragator key, I dont have to insert a sort stage before aggregator.
4. I will use APT_NO_SORT_INSERTION, as its possible that DS inserts a sort before aggragator stage. You can check DUMP_SCORE, before adding this.
Few assumption:
1. its a linear flow, just one stream, as u have given.
2. In transformer u have some derivations and in next stage ur are dropping/selecting some records.
3. SOrt is used to just group the rows before aggregating..
I would design it as:
Step 1: while creating the dataset in JOBA, I will sort the records on keys; which i will be using in aggragator for grouping. hash partition on the highest level of grouping key.
Step2: In JOB-B i will have
dataset >> Transformer > aggregator > o/p stage
1. I will use "same" partition throughout. so it wd hash thru out
2. combine the logic of transformer and filter stage. this would eliminate need of an extra stage(filter).
3. Since dataset is already sorted and partitioned on aggragator key, I dont have to insert a sort stage before aggregator.
4. I will use APT_NO_SORT_INSERTION, as its possible that DS inserts a sort before aggragator stage. You can check DUMP_SCORE, before adding this.
Shantanu Choudhary
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
There is no word in English called "Borat"ray.wurlod wrote:Please write in English, otherwise we will send Borat to your site.
![Shocked :shock:](./images/smilies/icon_eek.gif)
Secondly, we are here not to correct anybody's english but to correct Datastage understanding. If language used by somebody is abusive or insulting, then we should raise a concern.
Thirdly, if you are against abbreviation. Then protest and avoid using all the abbreviations in this world. Even won't as won't is contraction of will not. **You never know, the word's I am using now, becomes part of dictionary tomorrow.**
Shantanu Choudhary
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Borat, since it has a capital letter, is a proper noun (in both senses). It is a person's name, albeit a fictitious person, an alter ego of Sacha Baron Cohen. I leave the remaining research as an exercise for the reader.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.