Job running long: Changed Seq file to Dataset
Moderators: chulett, rschirm, roy
Job running long: Changed Seq file to Dataset
Hi,
Here's a scenario.
Designed a job with Seq file as input and job runs in a minute
Now getting the Input as Dataset and so changed Seq File to Dataset, and it takes 20minutes.
[PS: I tried a simple job, copying Dataset to Seq file, thinking reading the dataset is taking long, but it ran in few seconds]
No partitioning involved...All run with the same configuration.
Tried deleting the Seq File and associated link, and creaetd a Dataset with a new link, but didn't help.
Any light on this appreciated
Thanks,
Vijay
Here's a scenario.
Designed a job with Seq file as input and job runs in a minute
Now getting the Input as Dataset and so changed Seq File to Dataset, and it takes 20minutes.
[PS: I tried a simple job, copying Dataset to Seq file, thinking reading the dataset is taking long, but it ran in few seconds]
No partitioning involved...All run with the same configuration.
Tried deleting the Seq File and associated link, and creaetd a Dataset with a new link, but didn't help.
Any light on this appreciated
Thanks,
Vijay
How many nodes in your configuration file? Do the data files reside on the same disk volume as your sequential file?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
What stage are you writing to in this job? Perhaps the dataset is repartitioning to match the output and therefore is taking longer.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Kumar, Yes the Datasets and seq file have the same input and output directories. I have changed the partition to be SAME, and still the same effect. I'm trying with RCP OFF on few stages...will keep you posted.kumar_s wrote:The test you made for Dataset to sequential file, dose it have the same input and output directories of you normal jobs has?
As Arnd suggested, maintain 'Same' partiton on all the stages as possilbe (neglect the warning for the case study).
Thanks, Vijay
I am fairly certain that your job is doing a significant amount of I/O sorting and repartitioning and that the slowdown is due to these stages as opposed to a dataset. Can you enable your APT_DUMP_SCORE to see what processes you are actually running?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Why?
You have not seen a score, so can not assert that sorts have been inserted, and do not know what partitioning has been used. In this job design, (Auto) should use the same partitioning right through, so forcing it to be Same achieves nothing.
You have not seen a score, so can not assert that sorts have been inserted, and do not know what partitioning has been used. In this job design, (Auto) should use the same partitioning right through, so forcing it to be Same achieves nothing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
ur job Dataset as Input, passed thru Transformer, Filter, SORT and Aggregator
Few assumption:
1. its a linear flow, just one stream, as u have given.
2. In transformer u have some derivations and in next stage ur are dropping/selecting some records.
3. SOrt is used to just group the rows before aggregating..
I would design it as:
Step 1: while creating the dataset in JOBA, I will sort the records on keys; which i will be using in aggragator for grouping. hash partition on the highest level of grouping key.
Step2: In JOB-B i will have
dataset >> Transformer > aggregator > o/p stage
1. I will use "same" partition throughout. so it wd hash thru out
2. combine the logic of transformer and filter stage. this would eliminate need of an extra stage(filter).
3. Since dataset is already sorted and partitioned on aggragator key, I dont have to insert a sort stage before aggregator.
4. I will use APT_NO_SORT_INSERTION, as its possible that DS inserts a sort before aggragator stage. You can check DUMP_SCORE, before adding this.
Few assumption:
1. its a linear flow, just one stream, as u have given.
2. In transformer u have some derivations and in next stage ur are dropping/selecting some records.
3. SOrt is used to just group the rows before aggregating..
I would design it as:
Step 1: while creating the dataset in JOBA, I will sort the records on keys; which i will be using in aggragator for grouping. hash partition on the highest level of grouping key.
Step2: In JOB-B i will have
dataset >> Transformer > aggregator > o/p stage
1. I will use "same" partition throughout. so it wd hash thru out
2. combine the logic of transformer and filter stage. this would eliminate need of an extra stage(filter).
3. Since dataset is already sorted and partitioned on aggragator key, I dont have to insert a sort stage before aggregator.
4. I will use APT_NO_SORT_INSERTION, as its possible that DS inserts a sort before aggragator stage. You can check DUMP_SCORE, before adding this.
Shantanu Choudhary
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
There is no word in English called "Borat" If it's a slang, then please correct your english.ray.wurlod wrote:Please write in English, otherwise we will send Borat to your site.
Secondly, we are here not to correct anybody's english but to correct Datastage understanding. If language used by somebody is abusive or insulting, then we should raise a concern.
Thirdly, if you are against abbreviation. Then protest and avoid using all the abbreviations in this world. Even won't as won't is contraction of will not. **You never know, the word's I am using now, becomes part of dictionary tomorrow.**
Shantanu Choudhary
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Borat, since it has a capital letter, is a proper noun (in both senses). It is a person's name, albeit a fictitious person, an alter ego of Sacha Baron Cohen. I leave the remaining research as an exercise for the reader.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.