Surrogate key issue

Devendrudu · Post by **Devendrudu** » Fri Mar 30, 2012 3:40 am

Hi,

In my job i am using the Surrogate_Key_Generator stage when i run first time the sequence values are getting properly like 1,2,3,4 but when i run second time with one more new insert record i am getting 1,2,3,4,6 not as 5.

for every run it is skipping one sequence value.

first run:
source
-------
cid,name
1,xx
2,yy
3,ww
4,bb

out put
----------

seq_num,cid,name
1,1,xx
2,2,yy
3,3,ww
4,4,bb

second run
--------
source
-------
cid,name
1,xx
2,yy
3,ww
4,bb
5,cc

out put
----------

seq_num,cid,name
1,1,xx
2,2,yy
3,3,ww
4,4,bb
6,5,cc

note: Surrogate Source_type = flat file
User specified block size =1
every time i am running in single node only.

please suggest me what options i need to select to get proper sequence numbers.

priyadarshikunal · Post by **priyadarshikunal** » Tue Apr 03, 2012 9:49 am

I think we are missing something here as it will have a proper pattern while generating the keys not like missing just 1 value.

Are you generating the sequence first and then filtering out records from the stream?

ray.wurlod · Post by **ray.wurlod** » Tue Apr 03, 2012 5:14 pm

How are you partitioning these data? Is each partition processing exactly the same number of rows? If not, expect gaps.