Hi,
I have 3 jobs.... 2 parallel and 1 sequence job.
In First parallel job,
It reads Employee data from I/P Source file Emp1.txt which has 3940 records. These are inputted to Surr_Key_Gen stage where Surrogate key Emp_ID is generated starting from value 1. Then o/p of Surr_Key_Gen stage is inserted into DB2 Table 'Employee_Surrogate'.
Second parallel job is exactly similar as the first job, only the difference being that i/p source file is Emp2.txt. Here the target table is same i.e 'Employee_Surrogate'.
Source file Emp2.txt has 6060 records. Surrogate key Emp_ID is generated starting from the last highest value previously generated.
Here in both the above jobs, Surrogate Key Generator stages are referring to the same surrogate key source file.
The third job, is a Sequence Job. In this the parallel jobs 1 and 2 are executed parallelly.
Observed Output:
Total no of records inserted into the target table Employee_Surrogate is 10,000 (i.e 3940+6060).
Generated Surrogate key range 1 to 10,030.
Hole in surrogate key 3971 to 4000(i.e 30)
Q:Why the holes are created in surrogate key range?
Please suggest the solution to remove these holes, if any..
Thank You.
Gaps(Holes) in surr_key generated by Surr_Key_Gen stage
Moderators: chulett, rschirm, roy
Gaps(Holes) in surr_key generated by Surr_Key_Gen stage
I never let school to interfere in my education
This is most likely due to the block size that is being used. In order to speed up execution, surrogate keys are generated in blocks instead of going to the file/database for each and every key. It is possible that holes can occur when the program is stopped where the rest of the keys in the block are left unused.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India
If the partitions are not balanced properly, then surrogate key stage will generate the values with holes in the range.
If you want to generate the keys without any holes you should achieve perfect balance of records in the partitioning or you can use round robin partitioning while reading the data from the input (provided this partition will not affect your functionality)
If you want to generate the keys without any holes you should achieve perfect balance of records in the partitioning or you can use round robin partitioning while reading the data from the input (provided this partition will not affect your functionality)
Raghavendra
Dare to dream and care to achieve ...
Dare to dream and care to achieve ...
It's unnecessary to fill the holes. Surrogate keys inherently have no meaning other than to uniquely identify a row. The fact that they can be non-contiguous does not cause a single data discrepancy or issue. The largest concern is probably running out of numbers when exceeding the datatype.
This is a very common topic and an indication of ignorance in the "data architects" who design data warehouses. These "experts" often set the integer datatypes too small and then gaps become a concern because the maximum number is achieved.
The solution is to use a consistent integer datatype everywhere and then educate your architects. Suggest they read Kimball's articles and sections from his books where he more eloquently deflates this non-issue.
This is a very common topic and an indication of ignorance in the "data architects" who design data warehouses. These "experts" often set the integer datatypes too small and then gaps become a concern because the maximum number is achieved.
The solution is to use a consistent integer datatype everywhere and then educate your architects. Suggest they read Kimball's articles and sections from his books where he more eloquently deflates this non-issue.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
One solution would be to force the entire job to execute sequentially. Which defeats the purpose of having this parallel engine and is, as others have noted, unnecessary. Then again, for only 10,000 rows it would probably perform adequately.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.