Help in Job design
Moderators: chulett, rschirm, roy
Help in Job design
Hi,
I have 2 datasets .
The source dataset contains around 22215332.
The reference dataset contains 21532 records.
I am using the lookup stage and loading into teradata table using teradata connector stage.
Is this the correct design,Because its currently taking more than 45 mins to complete?
Is there any way to improve this?
Will Join be faster than lookup in this case?
For loading large data, will teradata enterprise stage be the good choice?
Please help me.
Thanks
I have 2 datasets .
The source dataset contains around 22215332.
The reference dataset contains 21532 records.
I am using the lookup stage and loading into teradata table using teradata connector stage.
Is this the correct design,Because its currently taking more than 45 mins to complete?
Is there any way to improve this?
Will Join be faster than lookup in this case?
For loading large data, will teradata enterprise stage be the good choice?
Please help me.
Thanks
pandeeswaran
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Lookup stage seems reasonable, if the smaller Data Set is the reference data and its rows are not exceedingly wide. That is, the reference Data Set can be loaded into memory.
Replace the target stage with a Peek stage temporarily, to isolate which part of the job (lookup or Teradata) is the slow part.
Replace the target stage with a Peek stage temporarily, to isolate which part of the job (lookup or Teradata) is the slow part.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What is the source? If it's a Data Set and there's no repartitioning, it should be very fast. On the other hand if it's a database view that's based on a correlated subquery with joins to huge tables, then it's going to be very slow.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 251
- Joined: Mon Jun 09, 2008 5:52 am
That data set generated in the previous job.
In the previous job there are 2 data sets joined using hash partitioning and te result is written to this data set. In the target dataset stage I have used auto partition only .
But in the previous join stage i have used hash partition.
Since I have used auto in target data stage ,here also I amusing auto in the source dataset .
Should I use hash or auto partition in the current job?
Thanks
In the previous job there are 2 data sets joined using hash partitioning and te result is written to this data set. In the target dataset stage I have used auto partition only .
But in the previous join stage i have used hash partition.
Since I have used auto in target data stage ,here also I amusing auto in the source dataset .
Should I use hash or auto partition in the current job?
Thanks
pandeeswaran
If you want to know if your job is re-partitioning then you can dump the score into the log - or check the row counts on each partition in the monitor.
The key you used for the hash partition in the previous job - is that the same key you need to use for your join in the current job?
If yes, then you most likely could avoid both a re-partition and a sort in the current job by explicit defining this in the job.
The key you used for the hash partition in the previous job - is that the same key you need to use for your join in the current job?
If yes, then you most likely could avoid both a re-partition and a sort in the current job by explicit defining this in the job.
_________________
- Susanne
- Susanne
-
- Participant
- Posts: 5
- Joined: Wed Nov 02, 2011 9:08 pm
- Location: Hyderabad
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
U is one of our posters. U has had no involvement in this thread.Arun Reddy wrote:Hi pandeesh,
U said in previous job u used hash partition..after that for target dataset u use same partion it wil give best performance then auto ..and continue that dataset as a source dataset in current job u said .. i think it will work..
The second person personal pronoun in English is spelled "you".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
This is my first job:
In data set 1 and data set 2 hash partition is used .
But in the target data set I have used auto partition.
In the second job I am using this target data set as source.
There also I have used auto partition.
In this way it works fine.
When I tried to change the partition to hash in the target data set in the first job, the first Jon works fine.
In the second job also I have used hash partitioning in the source data set.
But the second job got failed due to lookup failure.
Second job design:
Any thoughts on this?
Code: Select all
Dataset1
--------join------->target data set.
Dataset2
But in the target data set I have used auto partition.
In the second job I am using this target data set as source.
There also I have used auto partition.
In this way it works fine.
When I tried to change the partition to hash in the target data set in the first job, the first Jon works fine.
In the second job also I have used hash partitioning in the source data set.
But the second job got failed due to lookup failure.
Second job design:
Code: Select all
Source Data set -------->lkp stage------>teradata connector
/
/
/
Reference data set
pandeeswaran