How to get JobHandle in PX job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tacitus
Participant
Posts: 13
Joined: Fri Feb 25, 2005 5:46 pm
Location: Boston, MA

How to get JobHandle in PX job

Post by tacitus »

Hi,

I am trying to retrieve the PID for a job in a C++ funtion used in a routine. Unfortunately the built-in functions of the API require that you know the job handle. Since there is no DSJ_ME that I can find. How do we retrieve this information?

I would settle for passing the value into the routine, however I am not certain how to get the handle in a PX job either.

Your help is greatly appreciated!
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

What do you expect the PID to tell you? The controlling OSH script or the current process running your function?

Are you attempting to log messages to the job log? What's the underlying goal?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
tacitus
Participant
Posts: 13
Joined: Fri Feb 25, 2005 5:46 pm
Location: Boston, MA

Post by tacitus »

Thanks for the quick reponse!

We have jobs that will be writing records to tables that may only be unique based on the the rowid and the row_id must be known to the job due to business requirements. Additionally, many job instances could be running at the same time. We need a solution that can be easily and consistently applied across jobs to create the rowid.

With this in mind I was thinking to generate a rowid based on timestamp : PID : Partition : inputrownum using a routine. The timestamp will take care of some of the uniqeness issues and the PID will make sure multiple instance jobs (or other jobs) are not stepping on each other (I think).

So the PID is for the Job.

Any thoughts?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Are you on a single SMP or a cluster? How are you resolving duplicate PIDs across nodes, because of Partition? You're going to have a big integer there because each field has to be fixed width and then concatenated to be the larger integer, very messy.

Why not use something like a batch number instead of a timestamp? Each load process is assigned a batch number at startup via a serialized and locking stored procedure or even a simple sequence. Your PID then is irrelevant, so your timestamp and PID are gone, leaving partition and rownumber. So, using a job parameter BatchNumber plus the inherent surrogate key generator stage you should be fine because it takes care of uniqueness across parallel partitions. Just pad the BatchNumber big enough to accommodate your high watermark and add that to the generated key.

What do you think?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
tacitus
Participant
Posts: 13
Joined: Fri Feb 25, 2005 5:46 pm
Location: Boston, MA

Post by tacitus »

I think that's a great idea! I was a bit uncomfortable with the timestamp as part of the key.

Can you elaborate on how to define the stored procedure or point me to a resource that might have a good explanation for DB2?

PS. i would still love to know how to access the handle information for future use. The API documention is not stellar.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Being a Horracle person and not a DB2, either use a serial assigning process that guarantees a unique identifer. In Horracle, you'd use a Sequence. If you go the stored procedure route, select the max from the batch table, add 1 to it, and try to get a lock on that row. Keep incrementing and trying until you get the lock and don't find the row. Then insert your row, giving it all kinds of nice information like the process name, what time you started, etc. Maybe even come back later and update it with a status and end timestamp.

Take that number, add something like 10000000000 to it and use that as the seed to the generator. Make sure you've left plenty of room underneath the batch number to fill in the gaps. Big trouble if you overrun, because you won't be unique.

As for the APIs, can't help you from the C side, I prefer to use Batch jobs with native DS BASIC calls to the APIs, much easier and cleaner.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply