Writing SQL Query as Source in DataStage

Debbbie · Post by **Debbbie** » Wed Feb 27, 2013 2:47 pm

Hi.
We are migrating our application from BusinessObject DataServices 4.0 to Data Stage 8.7. Can anyone plz help me to understand how to write SQL query in DataStage? in DataServices we can add SQL object as source instead of DB2 table. Is there any such option in DataStage? Our Database is DB2.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 27, 2013 2:53 pm

Welcome aboard.

In the DB2 Connector stage (or such other DB2 stage as you might use) you have the option to have the SQL generated by the stage, built using SQL builder (Connector only) or using user-defined SQL.

In the Connector stage set the Generate SQL property to "No", and provide a value for the SQL Statement property (the actual query).

chulett · Post by **chulett** » Wed Feb 27, 2013 3:31 pm

I would suggest that, if at all possible, you let the stage generate the SQL whenever possible. Only supply "user defined" SQL for those situations when you have no other choice, for something that is not... generatable.

VineelPrem · Post by **VineelPrem** » Wed Feb 27, 2013 3:46 pm

Hi Chulett .
Any specific reasons for this ? I suppose it has got nothing to do with performance.I generally build my sql manually and supply it to the stage . Is this a wrong practice ??

chulett · Post by **chulett** » Wed Feb 27, 2013 4:53 pm

You are right, it's not any kind of a performance issue. To me it's a Best Practice that I enforce as more of a maintenance issue for the people that come after us. Say there's a table change and new columns are added to a table, when someone goes into the stage and adds the new columns, the generated SQL will automatically adjust and include them. I've seen too many developers bitten by them forgetting to add the new columns to the SQL as well, or adding them in the wrong order... something that isn't an issue if the stage does the work.

To me it's a best practice regardless of tool, heck it's even part of Informatica's "Velocity" methodology.

eostic · Post by **eostic** » Wed Feb 27, 2013 7:29 pm

Absolutely. Let the tool (whatever tool) do the work....it also streamlines everything else, from portability (easier to change stage types), data lineage, and more. There will always be those times when you need your own, from special techniques to a need for special dialects, or just because the complex SQL exists somewhere else and you are copying it.......but save those for the exceptions and not the norm.

Ernie

ray.wurlod · Post by **ray.wurlod** » Wed Feb 27, 2013 7:40 pm

Some of the cases where I need user-defined SQL (though often I prototype the same with SQL Builder):

need to refer to parameters that are in Parameter Sets, for example in WHERE clause

need to use a UNION or INTERSECTION etc.

need to specify fully parameterised table identifiers

chulett · Post by **chulett** » Thu Feb 28, 2013 7:36 am

Exactly. Do it when you need to otherwise let the stage do the work.

BI-RMA · Post by **BI-RMA** » Thu Feb 28, 2013 10:59 am

I am afraid I'll have to state an outsider opinion here:

The DataStage-SQL-builder is far too inflexible to be of any use whatsoever. It is not even able to handle a parameter on the table-schema. We use the first character of the table-schema to differentiate between Development- and Test-environment. So we need to generally parameterize every table-name in every job we create.

Also - there are lots of idiotic default settings built into the SQL-Builder. Just one example: The minimum of a date-column is what datatype? Right: a double.

And once your query gets a little complicated it will just take ages to put it together in the QueryBuilder. And to test the query you had better copy the generated statement back to your SQL-tool, because it is simply impossible to verify the results using view-data.

Actually, the option available on the old DB2-API-Stage to generate the Select-clause from the column list and enter other clauses was by far the most useful approach in this respect. It was possible to code complex expressions on the derivation-column and to parameterize table-names, join-clauses and whatever you liked in the From-, Where- and Other-clauses sections of the stage. The connector stages do not have this option any longer, but they are still by far preferable to the old API-stage. So we use custom-SQL in Connector-stages. Always.

Concerning the possibility to forget adding new columns to the SQL after adding the columns to the column-list: The first time you try to run the job - hopefully still in the development environment - you'll get a reminder in the job log. Of course: why DataStage does not try to validate the query with the database at compile-time is something I never understood.

chulett · Post by **chulett** » Thu Feb 28, 2013 11:24 am

I never mentioned the SQL Builder in my post, I found it generally awful. 'Generate' meant from the metadata on the fly which I believe all stages support. Parallel jobs are probably more picky on the whole 'missing columns in user-defined SQL' issue but Server jobs? Not so much. I've seen column mismatches - too few, too many, wrong order - be blissfully ignored by the job until finally someone notices that hey, this new column is always null - what's up with that?

BI-RMA · Post by **BI-RMA** » Thu Feb 28, 2013 11:53 am

OK, see that. But the new Connector-stages do not provide that many options to generate SQL. No ability to generate statements using joins that I know of, no ability to use SQL-functions, case-expressions and the like. Just column-names.

And You are able to add a where-clause if you dare to misuse the table-name prompt on the stage.

Yes, we do use this option when appropriate, every once in a while, that is...

I do prefer to make most of the transformations necessary within DataStage instead of writing lengthy or extremly complex SQL. But using joins to reduce the number of rows to be fetched or to be able to make use of existing indexes is a minimum requirement.

stuartjvnorton · Post by **stuartjvnorton** » Thu Feb 28, 2013 6:09 pm

Yeah, have to agree here.
I'm all for readability and keeping the logic in the job design, but you have to be pragmatic too.

If I'm joining 5 tables within the same DB and having to rename the foreign keys so a join stage can even see them, then I'll write (and document) my own SQL every time.

ray.wurlod · Post by **ray.wurlod** » Thu Feb 28, 2013 7:19 pm

Even with generated SQL I place a simplified version of the SQL statement as an annotation on the design canvas itself (Courier new, colour royal blue, top, left, transparent, no border are my standards for these particular annoatations). Makes life so much easier when doing a walkthrough with DBAs.