Page 1 of 1

Issue generating multiple surrogate key state files

Posted: Thu Dec 20, 2012 1:29 am
by yabhinav
Hi,

We are currently generating 150 surrogate key state files to be used in our jobs.

Our current design is

Job1: get the filename and surrogate key value from the database and put all the data into a sequential file.

Job2:(sequence) -> Will read each record from that file and pass the filename and key value as a parameter to job3

Job3: Will create state files for values passed to it from the sequence mentioned above.

This design worked fine when we have 10 files but now we are creating 150 state files and this is taking close to 2 hours to finish. Thereby impacting our performance.

Would appreciate it if you can help me with a better design approach.

Thanks,
Abhinav

Posted: Thu Dec 20, 2012 4:16 am
by ray.wurlod
How many times are you proposing to run this? If only once then two hours ought not to be a problem.

You have provided no information about how you are creating the state files. For example, are their initial values obtained from a database query? That will take some time - to establish the connection, to run the query, to get the results and to free the connection. You're doing one of these every 25 seconds or so - including, I'd imagine, the startup time of the parallel jobs. Does that really sound so unreasonable?

Posted: Thu Dec 20, 2012 12:23 pm
by jwiles
Job 3 creates only 1 state file? You could make it multi-instance and run multiple copies at the same time, such as 5 or 10, by modifying the sequence job.

Regards,