Writing SQL Query as Source in DataStage

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Debbbie
Participant
Posts: 1
Joined: Wed Feb 27, 2013 2:35 pm

Writing SQL Query as Source in DataStage

Post by Debbbie »

Hi.
We are migrating our application from BusinessObject DataServices 4.0 to Data Stage 8.7. Can anyone plz help me to understand how to write SQL query in DataStage? in DataServices we can add SQL object as source instead of DB2 table. Is there any such option in DataStage? Our Database is DB2.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

In the DB2 Connector stage (or such other DB2 stage as you might use) you have the option to have the SQL generated by the stage, built using SQL builder (Connector only) or using user-defined SQL.

In the Connector stage set the Generate SQL property to "No", and provide a value for the SQL Statement property (the actual query).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would suggest that, if at all possible, you let the stage generate the SQL whenever possible. Only supply "user defined" SQL for those situations when you have no other choice, for something that is not... generatable.
-craig

"You can never have too many knives" -- Logan Nine Fingers
VineelPrem
Participant
Posts: 19
Joined: Tue Jun 26, 2012 11:14 am
Location: Houston

Post by VineelPrem »

Hi Chulett .
Any specific reasons for this ? I suppose it has got nothing to do with performance.I generally build my sql manually and supply it to the stage . Is this a wrong practice ??
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You are right, it's not any kind of a performance issue. To me it's a Best Practice that I enforce as more of a maintenance issue for the people that come after us. Say there's a table change and new columns are added to a table, when someone goes into the stage and adds the new columns, the generated SQL will automatically adjust and include them. I've seen too many developers bitten by them forgetting to add the new columns to the SQL as well, or adding them in the wrong order... something that isn't an issue if the stage does the work.

To me it's a best practice regardless of tool, heck it's even part of Informatica's "Velocity" methodology. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Absolutely. Let the tool (whatever tool) do the work....it also streamlines everything else, from portability (easier to change stage types), data lineage, and more. There will always be those times when you need your own, from special techniques to a need for special dialects, or just because the complex SQL exists somewhere else and you are copying it.......but save those for the exceptions and not the norm.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Some of the cases where I need user-defined SQL (though often I prototype the same with SQL Builder):
  • need to refer to parameters that are in Parameter Sets, for example in WHERE clause

    need to use a UNION or INTERSECTION etc.

    need to specify fully parameterised table identifiers
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactly. Do it when you need to otherwise let the stage do the work.
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

I am afraid I'll have to state an outsider opinion here:

The DataStage-SQL-builder is far too inflexible to be of any use whatsoever. It is not even able to handle a parameter on the table-schema. We use the first character of the table-schema to differentiate between Development- and Test-environment. So we need to generally parameterize every table-name in every job we create.

Also - there are lots of idiotic default settings built into the SQL-Builder. Just one example: The minimum of a date-column is what datatype? Right: a double.

And once your query gets a little complicated it will just take ages to put it together in the QueryBuilder. And to test the query you had better copy the generated statement back to your SQL-tool, because it is simply impossible to verify the results using view-data.

Actually, the option available on the old DB2-API-Stage to generate the Select-clause from the column list and enter other clauses was by far the most useful approach in this respect. It was possible to code complex expressions on the derivation-column and to parameterize table-names, join-clauses and whatever you liked in the From-, Where- and Other-clauses sections of the stage. The connector stages do not have this option any longer, but they are still by far preferable to the old API-stage. So we use custom-SQL in Connector-stages. Always.

Concerning the possibility to forget adding new columns to the SQL after adding the columns to the column-list: The first time you try to run the job - hopefully still in the development environment - you'll get a reminder in the job log. Of course: why DataStage does not try to validate the query with the database at compile-time is something I never understood.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I never mentioned the SQL Builder in my post, I found it generally awful. 'Generate' meant from the metadata on the fly which I believe all stages support. Parallel jobs are probably more picky on the whole 'missing columns in user-defined SQL' issue but Server jobs? Not so much. I've seen column mismatches - too few, too many, wrong order - be blissfully ignored by the job until finally someone notices that hey, this new column is always null - what's up with that?
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

OK, see that. But the new Connector-stages do not provide that many options to generate SQL. No ability to generate statements using joins that I know of, no ability to use SQL-functions, case-expressions and the like. Just column-names.

And You are able to add a where-clause if you dare to misuse the table-name prompt on the stage.

Yes, we do use this option when appropriate, every once in a while, that is...

I do prefer to make most of the transformations necessary within DataStage instead of writing lengthy or extremly complex SQL. But using joins to reduce the number of rows to be fetched or to be able to make use of existing indexes is a minimum requirement.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Yeah, have to agree here.
I'm all for readability and keeping the logic in the job design, but you have to be pragmatic too.

If I'm joining 5 tables within the same DB and having to rename the foreign keys so a join stage can even see them, then I'll write (and document) my own SQL every time.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Even with generated SQL I place a simplified version of the SQL statement as an annotation on the design canvas itself (Courier new, colour royal blue, top, left, transparent, no border are my standards for these particular annoatations). Makes life so much easier when doing a walkthrough with DBAs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply