surrogate key file SDKSequences file corrupt

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
frankzguo
Participant
Posts: 4
Joined: Fri Dec 16, 2005 9:49 pm

surrogate key file SDKSequences file corrupt

Post by frankzguo »

We use KeyMgtGetNextValueConcurrent routin to get surrogate key(SID) for each dimension. Today we find the file SDKSequences which stores the latest SID for each dimension has the numbers don't match the existing SID of dimensions.

if you have same problem, please share with me how you solve it.

Thanks for help.

Frank
aditya
Charter Member
Charter Member
Posts: 41
Joined: Sat May 28, 2005 7:32 am

Post by aditya »

Did you try restoring the file using the any backup dsx?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hashed files would not be in any 'export dsx', you'd need a filesystem level backup. Or you'll need to identify which records are incorrect and update them with their correct values.

The syntax to do so has been posted here multiple times, a search for SDKSequences should turn it up...
-craig

"You can never have too many knives" -- Logan Nine Fingers
frankzguo
Participant
Posts: 4
Joined: Fri Dec 16, 2005 9:49 pm

Post by frankzguo »

The first time when it corrupted, I did manually correct the problem. Since it happened again, I want to know how it happened.

I looked KeyMgtGetNextValue, this routine has open statement, but does not has close file or flush statement. Does this cause any problem?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

frankzguo wrote:The first time when it corrupted, I did manually correct the problem. Since it happened again, I want to know how it happened.
Let us know what you find. :wink:

You'd have to be more specific about the nature of the problem. Note that it is not 'corrupted' per se but 'out of sync' to me - and you'd need to explain the exact nature of the out-of-sync-ness before anyone could do anything but guess as to the cause.
frankzguo also wrote:I looked KeyMgtGetNextValue, this routine has open statement, but does not has close file or flush statement. Does this cause any problem?
Not that I've seen, it closes when the calling job ends as far as I know. Since you mentioned 'PeopleSoft EPM8.9' I'm guessing these are delivered jobs as apposed to jobs you all have developed? If that's the case, contact Support - perhaps they are aware of the issue. Any idea if the jobs are using KeyMgtGetNextValue or actually KeyMgtGetNextValueConcurrent? The latter of which would be more appropriate to an environment where multiple jobs may be pulling numbers at the same time for the same table.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

just curios.

is anyone else using the same file...maybe for testing or just by mistake?
frankzguo
Participant
Posts: 4
Joined: Fri Dec 16, 2005 9:49 pm

Post by frankzguo »

We use PS EPM 8.9, but we developed most of Ascential jobs by Oracle consultants.

Be specific, we have one dimension table d_program_fdm, after running all dimension jobs, the max dimension sid for d_program_fdm is 1,000,124. but in SDKSequences file, d_program_fdm has value as 30.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

We preach that hashed files should be rebuilt every ETL run. This is a hashed file. I practice what I preach. With the new EtlStats I included a job named ETL_Reseed_SDKSequences. This should rebuild this hashed file. Modify it to your needs. It basically assumes that the key to SDKSequences is the same name as the column name in your dimension table like CustomerId. If your naming convention does not match this then change it or pass in the key as a parameter. Everything else is a parameter. Very flexible.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

frankzguo wrote:We use PS EPM 8.9, but we developed most of Ascential jobs by Oracle consultants.

Be specific, we have one dimension table d_program_fdm, after running all dimension jobs, the max dimension sid for d_program_fdm is 1,000,124. but in SDKSequences file, d_program_fdm has value as 30.
SDKSequences, being a hashed file, does not have data types and would not be fazed by a number such as 1000124.

Search through your application to see whether there is a function or job that resets the sequence in SDKSequences.

How many sequence names do you have (SELECT COUNT(*) FROM SDKSequences;)? It might be worthwhile to resize SDKSequences if it's overflowed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
frankzguo
Participant
Posts: 4
Joined: Fri Dec 16, 2005 9:49 pm

Post by frankzguo »

[quote="chulett"][quote="frankzguo"]The first time when it corrupted, I did manually correct the problem. Since it happened again, I want to know how it happened.[/quote]
Let us know what you find. :wink:

You'd have to be more specific about the nature of the problem. Note that it is not 'corrupted' per se but 'out of sync' to me - and you'd need to explain the exact nature of the out-of-sync-ness before anyone could do anything but guess as to the cause.

[quote="frankzguo also"]I looked KeyMgtGetNextValue, this routine has open statement, but does not has close file or flush statement. Does this cause any problem?[/quote]
Not that I've seen, it closes when the calling job ends as far as I know. Since you mentioned 'PeopleSoft EPM8.9' I'm guessing these are delivered jobs as apposed to jobs you all have developed? If that's the case, contact Support - perhaps they are aware of the issue. Any idea if the jobs are using [b]KeyMgtGetNextValue[/b] or actually [b]KeyMgtGetNextValueConcurrent[/b]? The latter of which would be more appropriate to an environment where multiple jobs may be pulling numbers at the same time for the same table.[/quote]

Based on my C programming knowledge, I need to flush(commit in Basic) the buffer and/or close the file to get the write statement data back to hard drive. When I have 7000+ rows in dimension table, it makes 7000+ routine call to KeyMgtGetNextValueConcurrent in the job. It can exhaust all file handle. Does Ascential manual says when the routine call exit, open file(s) get closed automatically? I'm new to Ascential Basic programming, but in C programming I need to close the file explicitly, just like alloc memory without free will cause memory leak.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm curious why you'd think that Ascential would provide a core routine for several years now that had such a obvious fatal flaw in it? And while the routine will certainly get called several thousand times in a normal operation, you'll see if you look at the code that the hashed file it leverages is only opened once on the initial call... so no worries about 'exhausting all file handle'.

My statements on the close are from what I've experienced over the years, not something I've read in a manual somewhere. Perhaps one of the grognards could confirm or deny this, or perhaps post something 'official'.

As Kim notes, using hashed files as persistant data stores is risky business. We prefer not to but sometimes we do. When we do, there should be a mechanism to rebuild it from the current data automatically. You really don't want to have to go figure out which ones are bad and correct them manually each time, do ya?

Nothing in the 'normal' operation of that routine would cause what you are seeing that I am aware of. Are there any processes in your application that reset those values? Anything that does an UPDATE SDKSequences perhaps? Errant running of that could cause your issue...
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

More to the point, do you take any steps to synchronize SDKSequences with the table? A simple job can SELECT MAX(keycol)+1 FROM table and load this into the SDKSequences hashed file, giving the name of the sequence as the key value (a constant in a Transformer stage).

The key management routines do not run out of file handles. They use precisely one file handle. Reads and writes to hashed files do not generate additional file unit demands. The hashed file is held open by virtue of the fact that its associated handle (= file variable) is declared to be COMMON (in C the closest analogue is a static variable).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply