Surrogate Key - wrong generation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Surrogate Key - wrong generation

Post by dsuser_cai »

Hi

I have a Server job that extracts data from a sequential file and loads it into oracle table using oracle bulk loader. the total number of records is 3.8 million in the text file. And when i load the data we generate a surrogate key in the transformer (we use a stage variable - keep adding 1 to the previous value).

Now the problem is; the job loads 2.5 million records without any problem, but when it comes to 2500001th record it uses the same value as 2500001.

Ex: lets say i have 10 records in total..the job creates value from 1,2,3...5 correctly but when it comes to 6 or 7... 10 it keep generating like this 1,2,3,4,5,5,5,5,5,5.

Also im not able to check the logs in director. when i open the log, the screen freezes.

Cany any body help me out of this.
Thanks
Karthick
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

what is the datatype of your stage variable ??
Nag
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For starters, your log issue is going to be from the zillions of warnings your failed sqlldr load is slamming into the hashed file. Best to just clear it, do you know the job number for this job? From the Administrator, issue a CLEAR.FILE RT_LOGnnn where nnn is the job number in question.

If you aren't sure, from memory the query to get the number would be:

SELECT JOBNO FROM DS_JOBS WHERE NAME = 'YourJobName';

What data type is your surrogate key field? The target, not any stage variable - Server stage variables are not typed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

data type is Double (15-length)
Thanks
Karthick
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

i used the following to find my job number:
LIST DS_JOBS JOBNO WITH NAME = CustLdCup2OdsCustRef

and this gave me the number as 2235.

so my command to clear the log should be like:

CLEAR.FILE RT_LOG2235, please correct me if im wrong.

or can i create a copy of this job and try loading it in the mean while.?
Thanks
Karthick
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

a small update: this is the stage variable that i use:

If not(isnull(KEY_COLUMN)) then LAST_KEY+1 else 0

this works fine for all jobs that we have.
Thanks
Karthick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

LAST_KEY is what, the stage variable that is a derivation for?

Yes, that would be the correct command to completely clear the log entries for that job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

last_key is the previous value.(i mean the previous surrogate key that was generated)
Thanks
Karthick
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

as you said i cleared the log, now i ahve ahardcoded all the parameters. im going to start the job again.
Thanks
Karthick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... if your surrogate values are still wrong in the source file (stuck at 2.5M+1) I wouldn't bother to simply restart it. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

Hi chulett,

That was a good idea. I tried to verify the issue, but i really couldnt understand why the key is not generated properly. but i found a way around to get this working. These are the steps i followed:

1) Increase the rows per transaction to a higher number (initially it was 0)
2) when the job aborted at 2.5Mth row i manually updated the surrogate key table to have the next key.
3) divided the staging file into two half, the first half contains the data that was loaded and the second half contains the new ones.
4) used the 2nd half to load the remaining, with corrected keys.

after this the table is loaded with no issues. if you have any suggestions please share with me.
Thanks
Karthick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unfortunately, I have no idea what would cause the generation of your surrogate key to get "stuck" like that. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

But thanks Craig for your idea.
Thanks
Karthick
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

Hi ,

Im tryig to analyze why this problem exist, and only today i as able to get the Director logs, and that says:

Code: Select all

Read caching disabled, file size exceeds cache size
i searched the forum for some replies. i read in a post that says, if i increase the "Read Cache Size" (mine was 128 MB). also the post says that if i increase the size there might be some adverse changes/issues that i might encounter. Im afraid to do that. also i check online and i got this manual,
http://71.18.93.139/uploads/upload_book ... dskche.pdf

I read this but got little confused, what am im going to do...! i totally got confused now.

Can anybody help me with this. Also i checked the size of the hash file, the DATA.30 file is only 16MB.

Note: Im reading and updating the same hash file. any help or suggestions would be great help.
Thanks
Karthick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A couple of points. First, that message is not an error or even a warning, it is strictly informational and means you told it to cache a hashed file lookup and it couldn't fit into memory. It simply reads it from disk in that case as if you hadn't asked it to be cached. Not a problem. And increasing the size can cause other problems so I would not necessarily rush to change that.

When you read from and write to the same hashed file in a single transformer you need to not cache the lookup. Either that or cache and 'lock for update'. Is this hashed file the source for your surrogate key? If so this could be part of your issue.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply