Inline/embeded SQL
Moderators: chulett, rschirm, roy
Inline/embeded SQL
Hi All,
I need to pull one column from a database table for which I have the following query
SELECT FORM_EDITION_DT
INTO :PB90H-FORM-EDITION-DT
FROM FORM_TAB_V A
WHERE A.FRM_ST_CD = :PB90H-FRM-ST-CD
AND A.FORM_ID = :PB90H-FORM-ID
AND A.FORM_EDITION_DT =
(SELECT MAX(FORM_EDITION_DT)
FROM FORM_TAB_V
WHERE FRM_ST_CD = A.FRM_ST_CD
AND FORM_ID = A.FORM_ID
AND FORM_EDITION_DT <= :PB90H-FORM-EDITION-DT
AND POL_SYMBOL_CD = A.POL_SYMBOL_CD)
AND DOCUMENT_CD =
(SELECT MIN(DOCUMENT_CD)
FROM FORM_TAB_V
WHERE FRM_ST_CD = A.FRM_ST_CD
AND FORM_ID = A.FORM_ID
AND FORM_EDITION_DT = A.FORM_EDITION_DT
AND POL_SYMBOL_CD = A.POL_SYMBOL_CD)
AND A.POL_SYMBOL_CD = :PB90H-POL-SYMBOL-CD
Now the Variables are with : and have a value. But the question is how do I take care this one in a relational stage or is there any work around using multiple stages. This is tricky one.
Regards,
Rupesh
I need to pull one column from a database table for which I have the following query
SELECT FORM_EDITION_DT
INTO :PB90H-FORM-EDITION-DT
FROM FORM_TAB_V A
WHERE A.FRM_ST_CD = :PB90H-FRM-ST-CD
AND A.FORM_ID = :PB90H-FORM-ID
AND A.FORM_EDITION_DT =
(SELECT MAX(FORM_EDITION_DT)
FROM FORM_TAB_V
WHERE FRM_ST_CD = A.FRM_ST_CD
AND FORM_ID = A.FORM_ID
AND FORM_EDITION_DT <= :PB90H-FORM-EDITION-DT
AND POL_SYMBOL_CD = A.POL_SYMBOL_CD)
AND DOCUMENT_CD =
(SELECT MIN(DOCUMENT_CD)
FROM FORM_TAB_V
WHERE FRM_ST_CD = A.FRM_ST_CD
AND FORM_ID = A.FORM_ID
AND FORM_EDITION_DT = A.FORM_EDITION_DT
AND POL_SYMBOL_CD = A.POL_SYMBOL_CD)
AND A.POL_SYMBOL_CD = :PB90H-POL-SYMBOL-CD
Now the Variables are with : and have a value. But the question is how do I take care this one in a relational stage or is there any work around using multiple stages. This is tricky one.
Regards,
Rupesh
Mike,
Truly speaking I did not give more clearity on what exactly my problem is?
Yes, there are many rows in Form_Tab table.
But the issue is, that in the query one of the variable accepts value from the input which is being read. For example variables :PB90H-FORM-ID and :PB90H-POL-SYMBOL-CD would have a value from input stream.
In this case if I define them as job parameter I wont be able to set them dynamically for each row coming in.
To give you a picture, consider a flat file as input (which has column A) followed by a lookup stage which looks up a relational stage. In the relational stage I put sql query which should accept this Column A value in where clause. I guess for this I cannot use job parameter as i need to set them depending on each row.
Hope I didnt confuse you. To tell you in plain words, I was worried about Inline SQL but now its more complicated. Let me know if you need more information.
Regards,
Rupesh
Truly speaking I did not give more clearity on what exactly my problem is?
Yes, there are many rows in Form_Tab table.
But the issue is, that in the query one of the variable accepts value from the input which is being read. For example variables :PB90H-FORM-ID and :PB90H-POL-SYMBOL-CD would have a value from input stream.
In this case if I define them as job parameter I wont be able to set them dynamically for each row coming in.
To give you a picture, consider a flat file as input (which has column A) followed by a lookup stage which looks up a relational stage. In the relational stage I put sql query which should accept this Column A value in where clause. I guess for this I cannot use job parameter as i need to set them depending on each row.
Hope I didnt confuse you. To tell you in plain words, I was worried about Inline SQL but now its more complicated. Let me know if you need more information.
Regards,
Rupesh
More info needed:
Are we talking 100s, 1000s, millions, billions?
You've now mentioned a flat file input source that contains values for at least two of your host variables. How many rows in this flat file?
Where are the values for your other two host variables (PB90H-FRM-ST-CD and PB90H-FORM-EDITION-DT) coming from?
Mike
doesn't tell me anything.Yes, there are many rows
Are we talking 100s, 1000s, millions, billions?
You've now mentioned a flat file input source that contains values for at least two of your host variables. How many rows in this flat file?
Where are the values for your other two host variables (PB90H-FRM-ST-CD and PB90H-FORM-EDITION-DT) coming from?
Mike
Ok. We're making progress on the specification. Though I still don't have a clear enough picture to recommend some design alternatives.
More clarification still needed:
Is it the relational table, FORM_TAB_V, the input source with the billion rows? Or is that the input source with thousands of rows?
Is the flat file containing POL-SYMBOL-CD and FORM-ID the input source with a billion rows? Or is that the input source with thousands of rows.
What about the other host variables? You state that FRM-ST-CD and FORM-EDITION-DT are set to defaults ('UK' and CURRENT DATE). How/where are non-default values obtained? Or does each job run have a different set of defaults?
Mike
More clarification still needed:
Is it the relational table, FORM_TAB_V, the input source with the billion rows? Or is that the input source with thousands of rows?
Is the flat file containing POL-SYMBOL-CD and FORM-ID the input source with a billion rows? Or is that the input source with thousands of rows.
What about the other host variables? You state that FRM-ST-CD and FORM-EDITION-DT are set to defaults ('UK' and CURRENT DATE). How/where are non-default values obtained? Or does each job run have a different set of defaults?
Mike
Mike,
As I mentioned early,
"billion of rows coming from Input and 1000s of rows in lookup table"this should expalin it all.
But for your clarification
Is it the relational table, FORM_TAB_V, the input source with the billion rows? Or is that the input source with thousands of rows?
Answer: Form_Tab_V is a relational table, and has 1000s of rows.
Is the flat file containing POL-SYMBOL-CD and FORM-ID the input source with a billion rows? Or is that the input source with thousands of rows.
Answer: Input Flat File has Billions of rows.
What about the other host variables? You state that FRM-ST-CD and FORM-EDITION-DT are set to defaults ('UK' and CURRENT DATE). How/where are non-default values obtained? Or does each job run have a different set of defaults?
Answer: To make it more clear, I would keep these two defaults hardcoded in the SQL query. As they are standard for each row incoming. In short I won't keep them as variables.
I think I gave more clearity now, so I would appreciate your valuable input on this one.
Regards,
Rupesh
As I mentioned early,
"billion of rows coming from Input and 1000s of rows in lookup table"this should expalin it all.
But for your clarification
Is it the relational table, FORM_TAB_V, the input source with the billion rows? Or is that the input source with thousands of rows?
Answer: Form_Tab_V is a relational table, and has 1000s of rows.
Is the flat file containing POL-SYMBOL-CD and FORM-ID the input source with a billion rows? Or is that the input source with thousands of rows.
Answer: Input Flat File has Billions of rows.
What about the other host variables? You state that FRM-ST-CD and FORM-EDITION-DT are set to defaults ('UK' and CURRENT DATE). How/where are non-default values obtained? Or does each job run have a different set of defaults?
Answer: To make it more clear, I would keep these two defaults hardcoded in the SQL query. As they are standard for each row incoming. In short I won't keep them as variables.
I think I gave more clearity now, so I would appreciate your valuable input on this one.
Regards,
Rupesh
Thank you Rupesh.
Sorry for forcing you to spell things out for me. I just wanted to make sure I understood your specifications clearly. With an input source containing "billions" of rows, a poor design decision will have devastating performance implications.
First, a couple of points:
1) You do not want to do a direct lookup to the relational table. Issuing "billions" of SELECT statements is something to be avoided.
2) You do not want to sort a file with "billions" of rows. A design that avoids the need to sort will be a good thing.
Now for a design option:
I would suggest creating 2 jobs for this task:
1) A simple job that will extract the relational lookup data into a flat file. This flat file will be used as the reference input into a hash lookup in the second job.
Three stages:
2) A job that reads the input and does a conditional hash lookup and writes to the output.
Six Stages:
The hash lookup (make sure you specify a hash lookup) will cause the lookup file to be read into memory at the start of the job. Lookups will take place at memory i/o speed instead of disk i/o speed. The conditional lookup can be used to skip the lookup operation when the lookup keys from the current row are the same as the keys from the previous input row. Normally I wouldn't bother with the conditional lookup on a hash lookup, but multiplying even a millisecond in savings by billions can be significant.
Mike
Sorry for forcing you to spell things out for me. I just wanted to make sure I understood your specifications clearly. With an input source containing "billions" of rows, a poor design decision will have devastating performance implications.
First, a couple of points:
1) You do not want to do a direct lookup to the relational table. Issuing "billions" of SELECT statements is something to be avoided.
2) You do not want to sort a file with "billions" of rows. A design that avoids the need to sort will be a good thing.
Now for a design option:
I would suggest creating 2 jobs for this task:
1) A simple job that will extract the relational lookup data into a flat file. This flat file will be used as the reference input into a hash lookup in the second job.
Three stages:
Code: Select all
relational stage -> transformer -> flat file
Code: Select all
SELECT A.FORM_ID,
A.POL_SYMBOL_CD,
A.FORM_EDITION_DT
FROM FORM_TAB_V A
WHERE A.FRM_ST_CD = 'UK'
AND A.FORM_EDITION_DT = (SELECT MAX(B.FORM_EDITION_DT)
FROM FORM_TAB_V B
WHERE B.FRM_ST_CD = A.FRM_ST_CD
AND B.FORM_ID = A.FORM_ID
AND B.FORM_EDITION_DT <= CURRENT DATE
AND B.POL_SYMBOL_CD = A.POL_SYMBOL_CD)
AND A.DOCUMENT_CD = (SELECT MIN(C.DOCUMENT_CD)
FROM FORM_TAB_V C
WHERE C.FRM_ST_CD = A.FRM_ST_CD
AND C.FORM_ID = A.FORM_ID
AND C.FORM_EDITION_DT = A.FORM_EDITION_DT
AND C.POL_SYMBOL_CD = A.POL_SYMBOL_CD)
Six Stages:
Code: Select all
Flat File (from job 1)
|
|
\|/
Flat file (billions) -> Transformer -> Lookup -> Transformer -> Flat file
Mike
Thanks a million Mike.
That really helped. So far I was following the same approach, but did not think of spliting it first into a file and than looking up.
Definately there would be a performance hinderance if not splited.
Just out of curiosity, what if one of the variable was inside an inline query?
Regards,
Rupesh
That really helped. So far I was following the same approach, but did not think of spliting it first into a file and than looking up.
Code: Select all
DB2 table with query
|
|
V
input----->Transformer---->lookup----->Transformer------>DB2 Loadready File.
Just out of curiosity, what if one of the variable was inside an inline query?
Regards,
Rupesh
For performance reasons, you'll want to extract the lookup data to a flat file first. Attaching the relational stage directly to the lookup stage results in those "billions" of SELECT statements that you want to avoid.
With a hash lookup, it won't make much difference whether you are matching on one key or two keys.
Note: I've made the assumption that when you said the lookup file contains thousands of rows, that that was the total volume for the entire lookup table (and not the volume for a single combination of form id and symbol code). If the lookup file volume will be significantly higher than about 100,000 rows, we will likely have to revisit the design options.
Mike
This isn't possible with your query. Host variables can not be directly coded into the SQL statement. The best that you can hope for is that the code optimizer will push a transformer constraint back into the SQL WHERE clause. The optimizer probably won't even attempt that with user-defined SQL (which is required due to the subselect clauses).what if one of the variable was inside an inline query
With a hash lookup, it won't make much difference whether you are matching on one key or two keys.
Note: I've made the assumption that when you said the lookup file contains thousands of rows, that that was the total volume for the entire lookup table (and not the volume for a single combination of form id and symbol code). If the lookup file volume will be significantly higher than about 100,000 rows, we will likely have to revisit the design options.
Mike
Thanks Mike.
Actually I didnt understand the part in the following quote.
And if yes, how are the sub select query's parameter taken care?
What Code Optimizer are we talking here?
Hope you understand my question.
Regards,
Rupesh
Actually I didnt understand the part in the following quote.
Do you mean to say that one of the constraint in WHERE clause can be put into transformer Constraint?The best that you can hope for is that the code optimizer will push a transformer constraint back into the SQL WHERE clause.
And if yes, how are the sub select query's parameter taken care?
What Code Optimizer are we talking here?
Hope you understand my question.
Regards,
Rupesh
The code generation engine generates COBOL code and then performs an optimization of the code. For example columns that aren't referenced in a downsteam stage won't be passed on. The SQL for a relational stage will be optimized as well. If you select a column that isn't used downstream, the SQL that is passed to DB2 won't include that column. If there is a simple constraint in a transformer immediately after the relational stage, then that constraint may be converted to a SQL WHERE clause during optimization.
In the relational stage, you can't code something like:
You can code this:
After the relational stage, you can put a constraint in the transformer as something like KEY1 = JOBPARM.
The generated code should include "WHERE KEY1 = :JOBPARM" in the generated SQL statement.
Mke
In the relational stage, you can't code something like:
Code: Select all
SELECT KEY1, COL1
FROM TBL
WHERE KEY1 = :JOBPARM
Code: Select all
SELECT KEY1, COL1
FROM TBL
The generated code should include "WHERE KEY1 = :JOBPARM" in the generated SQL statement.
Mke