User defined SQL issues - AKA: am I making this too hard??

jdmiceli · Post by **jdmiceli** » Mon Nov 20, 2006 6:48 pm

Hi all,

I need to tap into the bounty of knowledge that is DSXchange :D yet again!

In my current project I am transferring the data from a transactional system to a copy of the same on a different server on an iterative basis during the course of the day. Normally, I would rather do this using replication or MQ Series, but neither of those are available to me.

Here are the ground rules and system stuff:
1. Source and target are DB2 UDB running on AIX Unix
2. No federated servers allowed
3. DataStage 7.5.1.A Server Edition on AIX Unix is my flavor. One of my main design paradigm's is to use as little database specific stuff as possible (avoiding stored procedures, external code and the like). The reason for this is that there is some question as to whether the target will stay on DB2 UDB as planned (wasn't my choice) or go back to SQL Server 2000 (2005 eventually). Hence, I also chose to sacrifice a little speed and used nothing but ODBC instead of DB2 CLI drivers/bulk loaders.
4. Using separate schema in the same database to capture history/changes
5. Using triggers to populate the history/changes, adding TRIGGER_DT, TRIGGER_USERID, TRIGGER_ACTION (U-update lists all fields, I-insert lists all fields, D-delete list only key fields), & TRIGGER_STATUS (default = 2 = in process - target table also has 1= current row, 0=historical row)
6. Target tables generally follow slowly changing dimension type 2 rules - therefore, in my case, everything is an INSERT.
7. There are 412 source tables to process, current with five steps (jobs) for each table, giving a total of 2060 jobs plus 1 job to get the batch info and 1 job to perform cleanup when done.
8. My goal is to optimize to be able to run every 15 minutes. This should mean no more than 2000-3000 rows per table per batch. Over time, I will also be able to eliminate tables that don't change that often to fewer scheduled runs.

My jobs are structured as follows, landing all data for processing:

Step 1: src_<tablename>
Creates a seq file of all fields, a hashed file of natural key values plus TRIGGER_DT (keyhash), and hashed file of distinct natural key values plus maximum TRIGGER_DT (hash_max).

Step 2: delsrc_<tablename>
Uses keyhash to remove extracted rows from source tables - this is necessary because rows may be put in while DS is processing the existing rows. This effectively gets me row level locking without preventing the triggers from putting more data in during processing (at least, that's my thought).

Step 3: tgt_<tablename>
This step inserts all records into the target table from the seq file. While inserting, the process uses hash_max to change the TRIGGER_STATUS from 2 to 5 when the natural keys and the TRIGGER_DT matches, identifying the most current row for later processing.

Step 4: updtgt_<tablename>
This step uses the distinct natural keys in the hash_max to change the TRIGGER_STATUS of any already inserted rows with 1 or 2 to 0 based on the distict natural keys in hash_max.

Step 5: upd_current_tgt_<tablename>
This step is meant to take any TRIGGER_STATUS's = 5 and update them to 1 as the most current row. Using the above methodology, there should be no way for it to be off (famous last words

)

I will have follow up processes nightly and weekends to cleanup, verify and validate/balance depending on level of necessity.

Now, here are my problems and the sources of my questions to you all:

1. In steps 4 & 5, I cannot seem to use parameters in the tablename portion of the user defined query. For example, one of my tables goes as follows:

Code: Select all

UPDATE UCWS.ADM_ACCT_PERIOD SET TRIGGER_STATUS = '0' WHERE UCWS.ADM_ACCT_PERIOD.SEQ_NUM = ? AND UCWS.ADM_ACCT_PERIOD.STATUS_FLAG IN (1,2)

I would like it to read

Code: Select all

UPDATE #SchemaName#.#TableName# AS X SET X.TRIGGER_STATUS = '0' WHERE X.SEQ_NUM = ? AND X.STATUS_FLAG IN (1,2)

It does not like the use of parameters in the table name. I'm also not sure about the use of the alias there, though I think it is legal in DB2 UDB.

2. Using hard coded fields that are not part of the stream coming in from the hash_max also seems to cause problems (see example above). Specifically, the TRIGGER_STATUS field is not a part of the hash_max file so it doesn't come down the link. Could I just use a stage variable to get around that by hard coding that value to (1,2)?

If I can figure out the answers to the two questions above, I hope to be able to pass the tablename into a single job to update all TRIGGER_STATUS fields that equal 5 to 1 to mark the most current rows. This would knock 411 jobs out of the cycle. I know I could do it with a stored procedure, but once again since the final target db is in flux and could be anywhere (or even multiple platforms), I'm trying to do as much in DataStage as possible, sometimes beyond what makes sense to do.

Am I just making this too hard?

As always, your input is greatly appreciated.

Bestest!

ray.wurlod · Post by **ray.wurlod** » Mon Nov 20, 2006 7:11 pm

You're correct, the ODBC stage does not like a job parameter reference as table name.
However, it will accept (as user-defined SQL) an entire SQL statement that is a job parameter! That's how most folks work around this restriction. You can build the SQL in the parent job sequence when the table name is parameterized there.

kcbland · Post by **kcbland** » Mon Nov 20, 2006 7:26 pm

We should mark John's posts for new members to review so that posts can reach the quality and attention to thought that he gives us.

World take note!!!

Ray's given you the slickest answer. Use a job parameter to contain your custom SQL generated by your job control. You can always add extra output columns from the Transformer stage prior to the ODBC stage and use either hard-coded values, or again use a job parameter. Just make sure your passed-in SQL matches the columns being sent to the ODBC stage.

chulett · Post by **chulett** » Mon Nov 20, 2006 7:34 pm

jdmiceli wrote:1. In steps 4 & 5, I cannot seem to use parameters in the tablename portion of the user defined query. For example, one of my tables goes as follows:
Code: Select all
UPDATE UCWS.ADM_ACCT_PERIOD SET TRIGGER_STATUS = '0' WHERE UCWS.ADM_ACCT_PERIOD.SEQ_NUM = ? AND UCWS.ADM_ACCT_PERIOD.STATUS_FLAG IN (1,2)
I would like it to read
Code: Select all
UPDATE #SchemaName#.#TableName# AS X SET X.TRIGGER_STATUS = '0' WHERE X.SEQ_NUM = ? AND X.STATUS_FLAG IN (1,2)
It does not like the use of parameters in the table name. I'm also not sure about the use of the alias there, though I think it is legal in DB2 UDB.

That's odd because I know you sure as heck can do this in the OCI stages, either generated or user defined... must be something specific to the stage you are using.

jdmiceli · Post by **jdmiceli** » Mon Nov 20, 2006 8:15 pm

chulett wrote:
jdmiceli wrote:1. In steps 4 & 5, I cannot seem to use parameters in the tablename portion of the user defined query. For example, one of my tables goes as follows:
Code: Select all
UPDATE UCWS.ADM_ACCT_PERIOD SET TRIGGER_STATUS = '0' WHERE UCWS.ADM_ACCT_PERIOD.SEQ_NUM = ? AND UCWS.ADM_ACCT_PERIOD.STATUS_FLAG IN (1,2)
I would like it to read
Code: Select all
UPDATE #SchemaName#.#TableName# AS X SET X.TRIGGER_STATUS = '0' WHERE X.SEQ_NUM = ? AND X.STATUS_FLAG IN (1,2)
It does not like the use of parameters in the table name. I'm also not sure about the use of the alias there, though I think it is legal in DB2 UDB.
That's odd because I know you sure as heck can do this in the OCI stages, either generated or user defined... must be something specific to the stage you are using.

Hi Craig,

You are correct that you can do it with OCI drivers, just as you can with SQL Server drivers for the most part. I think the problem enters in that I'm using straight ODBC since the target is still up in the air. This is why I'm trying so hard to get this to work in a bland fashion (No pun intended, Ken

) with no special stuff needed.

I am kind of liking Ray's suggestion of just building the entire command into a parameter. I'll need to play with the mechanics of it to see if placing that in a template and working with it there is harder than the current model.

I appreciate your input!

kduke · Post by **kduke** » Mon Nov 20, 2006 8:24 pm

Why not?

Code: Select all

UPDATE #SchemaName#.#TableName# SET TRIGGER_STATUS = '0' WHERE SEQ_NUM = ? AND STATUS_FLAG IN (1,2)

I agree with Craig. This should work.

jdmiceli · Post by **jdmiceli** » Mon Nov 20, 2006 9:19 pm

Hi again,

OK, I've spent the last couple of hours experimenting with a couple of my jobs for one table (steps 4 and 5 from above).

I was able to get the parameter substitution to work, as Craig and Kim have insisted should work even with ODBC. So, that part is not the actual problem. I was having difficulty seeing what DataStage creates with the parameters so I coded an error into place (just stuck an 'and' on the end with nothing else on the user define query). Doing so generated the following error:

upd_current_tgt_ADM_ACCT_PERIOD..upd_current_ADM_ACCT_PERIOD.upd_current_ADM_ACCT_PERIOD: DSD.BCIOpenW call to SQLPrepare failed. UPDATE UCWS.ADM_ACCT_PERIOD AS X
SET X.STATUS_FLAG = CAST(1 AS CHAR(1))
WHERE CAST(X.STATUS_FLAG AS SMALLINT) = 5 and
SQLSTATE=37000, DBMS.CODE=-104
[DataStage][SQL Client][ODBC][DataDirect][ODBC DB2 Wire Protocol driver][UDB DB2 for Windows, UNIX, and Linux]ILLEGAL SYMBOL END-OF-STATEMENT; VALID SYMBOLS ARE AS SMALLINT) = 5 and. <boolean_factor>

I have highlighted what was constructed. Please note that I also added a bunch more stuff that is specific to DB2 UDB since the STATUS_FLAG is defined as a char(1) and it is pretty specific in how it does assignments. When I cut and pasted the code above to WinSQL, it ran like a champ. I still get nothing updated when I run it through DataStage like this though. That job is set up as follows:

SeqDummy (one field called dummy) -> xfm -> ODBC connection with user def query as above

Nothing updates still. Dummy file has one pipe to trigger the udq. Am I doing this wrong?

Thanks again!

jdmiceli · Post by **jdmiceli** » Mon Nov 20, 2006 9:21 pm

I forgot to mention one thing: the sample I gave above is step 5. This is the one that I would like to be able to use just one job for and pass in parameters for the schemaname, tablename and key fields (maybe). Still tinkering with this one.

Bestest!

ray.wurlod · Post by **ray.wurlod** » Mon Nov 20, 2006 9:26 pm

Have you tried the "whole SQL statement as job parameter" approach yet? There's got to be some reason folks use it.

jdmiceli · Post by **jdmiceli** » Thu Dec 07, 2006 1:10 pm

Hi all,

Using some of your input I was able to find the solutions to most of my issues. Instead of imbedding sql into a parameter, I simply redefined the fields to what they should have been in the first place. The TRIGGER_STATUS field was defined as a char(1) instead of a smallint. Once I fixed that then my problems with the update at step 4 worked fine using what Craig and Kim were talking about.

I'm still having a little issue with step 5 but using what I have gotten from you guys here, I think I can finish this out.

Once I get it all working, do you want me to post the results and how I resolved each issue? Would it be of any use to any other newbies like me for their issues?

Bestest!

chulett · Post by **chulett** » Thu Dec 07, 2006 1:17 pm

jdmiceli wrote:Once I get it all working, do you want me to post the results and how I resolved each issue? Would it be of any use to any other newbies like me for their issues?

You betcha!

narasimha · Post by **narasimha** » Thu Dec 07, 2006 1:17 pm

jdmiceli wrote:Once I get it all working, do you want me to post the results and how I resolved each issue? Would it be of any use to any other newbies like me for their issues?
Bestest!

Please do, it could help someone someday. They will thank you for that

chulett · Post by **chulett** » Thu Dec 07, 2006 1:27 pm

(copycat)

narasimha · Post by **narasimha** » Thu Dec 07, 2006 4:34 pm

chulett wrote:(copycat)

I take that as a compliment, would be happy if I could actually do

I was late may by maybe a few milliseconds!
I should be very, very fast to to see what you have typed and copy that.See we have the same timestamp!
As I write this a suggestion comes to my mind : Why not have the seconds/milliseconds displayed along with the Posted time
Suggestion not to be considered, just jokin !
(P.S. Shouldn't you be SAH,MW,MN

)

DSXchange

User defined SQL issues - AKA: am I making this too hard??

User defined SQL issues - AKA: am I making this too hard??

Re: User defined SQL issues - AKA: am I making this too hard

Re: User defined SQL issues - AKA: am I making this too hard

Take 2!!

addendum to previous post

Results of the input from you fine folks

Re: Results of the input from you fine folks

Re: Results of the input from you fine folks