Never ending SQL Query

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Never ending SQL Query

Post by TonyInFrance »

I am trying to extract 2 fields from a parent Oracle table using an Oracle OCI in Datastage 8.5. Firstly something new I have noticed in this stage in DSEE 8.5 is that the 'View Data' option is not available on right clicking the link that exits the stage unlike in v7.5.x. This is available on opening the stage description. I wonder why.

Now, my SQL query is somewhat like the following:

SELECT A.field01, A.field02 FROM TABLE01 A, TABLE02 B, TABLE03 C, TABLE04 D
WHERE
A.fieldXX = B.fieldYY and A.fieldXX = C.fieldYY AND
B.fieldZZ like 'CH1%'

When I count the number of lines in the parent table, i.e. TABLE01 I get 204 million. However when I launch this record in a DSEE job the records transferred crosses this number and keeps going on.

A colleague advised me that since I had defined TABLE04 without defining a join condition, a cartesian product was occurring. thus I removed the table TABLE04 from the SQL query which made my query look like this:

Now, my SQL query is somewhat like the following:

SELECT A.field01, A.field02 FROM TABLE01 A, TABLE02 B, TABLE03 C
WHERE
A.fieldXX = B.fieldYY and A.fieldXX = C.fieldYY AND
B.fieldZZ like 'CH1%'

However the job still seems to go on without stopping whereas I know for a fact that the parent table TABLE01 contains no more than 204 million lines.

Anywhere I am going obviously wrong / making some stupid mistakes?

Please advise.
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Pet Peeve Alert:

"my SQL query is somewhat like the following"

It irks me when we get examples that are "somewhat like" reality. We have gone down many a rabbit hole because of crappy examples people have given us. If all you've changed are the names of the tables and fields but otherwise the sql is identical to what you are using, say so. Or show us the actual sql, are your table names secret? :?

That being said, your first example was obviously bogus as your colleague rightly pointed out. Your second one looks "fine" but no-one here can tell you how well it will perform, not knowing anything about the tables or their structures. For example, are there indexes over the columns you are joining or filtering on?

As a piece of advice, check the explain plan for the query. If you don't know how to do that or read one, ask for help from a colleague or your DBA. You could also have your DBA monitor the query running in the database, he would be able to tell what it is doing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
macromathi1
Participant
Posts: 4
Joined: Tue Sep 18, 2007 3:40 pm
Location: Riyadh, Saudi Arabia

Re: Never ending SQL Query

Post by macromathi1 »

Query looks fine, you may need to improve performance at table level. as craig said go for table index with the columns used in the filter condition (even you can go for index force in the query).

with the given count of table A that too with a 'LIKE' filter, definitely the query will take more time to result. if possible try to use sub query to avoid 'LIKE'.

Refer your DBA to ANALYZE your tables, and for the mentioned count you can for 'INDEX ANALYZE' even.
Thanks,
Mathi Sang
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The LIKE should be fine and if that column is indexed it could actually be performant since it is matching the 'leading edge' of the data. Not sure a sub-query would help there but if push comes to shove, the explain plan should help make that decision.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Thanks Craig. I'll be more specific. Sorry for the confusion.

Yes my query is the same as above only with the table names changed since I do not remember the exact names. I wanted to copy the query before leaving work yesterday, Friday. Since I did not, the same is stored on my work machine which I'll get to log into only on Monday. I didn't want to wait till then to post on this forum. There might be a few more like conditions than in the example I have put in here but the structure is the same.

That said I am at a client's office and am using SQL Developer as my RDBMS tool. Those of you who have used this would know that once logged in, the left column has '+Tables' which on expanding should normally show the tables in the database. In my case however this list is empty and I am told that the DBA has regulated the user account (we use to log in to Oracle) so that we do not get to see the table definition other than by running a 'desc' command. The reasons for this is unknown to us as we are external consultants and thus not supposed to ask too many questions on their security policies. Thus to even ascertain the exact names and definitions of the tables I am forced to run desc commands to ensure that a particular column exists before using it in my where command or any other command for that matter.

This restriction makes it impossible for me thus to figure out whether there are indexes defined on the table, or any other constraints for that matter. All I am given is a parent table (on which I've run a count * from command and found out that it contains 204 million lines). I know that the tables I am joining on also exist as do the columns I am using in my join.

Now given all the above and since my query is correct I still don't understand why and how my Datastage job can go on running and showing that its transferring more lines than those that exist in my parent table.

I will try to look at the explain plan and also get in touch with the DBA on Monday but as I understand these efforts would result in reducing the run time of the extraction job. I'm not sure that the DBA would be able to solve a Datastage problem.
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wow, that sounds like a pretty crappy work environment. :?

One thing to keep in mind...
BASU wrote:I'm not sure that the DBA would be able to solve a Datastage problem.
You do not have a "Datastage problem". You have a SQL problem. DataStage is another SQL client, just like SQL Developer - both send standard SQL to the database and the database executes it. Nothing more exciting than that going on. Run the same query in SQL Developer and you'd see the same results.

I'd suggest you get to know the Oracle views that catalog all objects in the database. You should be able to cobble together a standard set of queries to get all of the object information you need - columns in a table, constraints, indexes, etc. For example, see:

ALL_TABLES
ALL_TAB_COLUMNS
ALL_INDEXES
ALL_CONSTRAINTS
ALL_CONS_COLUMNS

Etc etc. All easily joined to get what you need. Long list here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

You are right there. Not the most conducive environment. But then we are external consultants and I don't know how it is in other parts of the world but in France we are really looked upon with suspicion. That said the situation is sure to improve since last week was my first on the job and with time I will get to know my clients including who the DBA is and who are the different people in charge of the various applications I am extracting / loading data to.

You are also right and it seems more probable that mine is more of a SQL problem. I'm sure I will get the same never ending result if I run my query in SQL Developer. Thanks for the advice till now. Let me get back to you guys later this week.

Cheers
Tony
BI Consultant - Datastage
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Guys
This is my query:

Code: Select all

SELECT
  DCH.NUMR_DOSS,
  DCH.CODE_ORGN_FINN
FROM
  DCLI_CREDIT DCH,
  DCLI_CREDIT_CONVENTION DCC, 
  DCLI_CREDIT_GARANTIE DCG
WHERE
  (
  (
  DCH.NUMR_CRED = DCC.NUMR_CRED AND
  DCH.CODE_ORGN_FINN = DCC.CODE_ORGN_FINN AND
  DCH.NUMR_CRED = DCG.NUMR_CRED AND
  DCH.CODE_ORGN_FINN = DCG.CODE_ORGN_FINN AND
  DCG.CODE_NATR_GARN = '03' OR
  (
      DCC.CODE_TYPE_CONV = 'CA' AND
      (
          DCC.LIBL_LONG_CONV LIKE ('%CEF%') OR
          DCC.LIBL_LONG_CONV LIKE ('%C E F %') OR
          DCC.LIBL_LONG_CONV LIKE ('%NATIXIS%') OR
          DCC.LIBL_LONG_CONV LIKE ('%CEGI%') OR
          DCC.LIBL_LONG_CONV LIKE ('%GCE GARANTIE%')
      )
  )
  )
  --OR
  --DCH.CODE_PRDT = RHDC.DECI AND RHDC.CODE_DEUX_REGR LIKE '3HI%'  
  ) 
  AND ((TO_CHAR(SYSDATE, 'YY') - TO_CHAR(DCH.DATE_FIN_CONT_CRDT, 'YY'))*12 + (TO_CHAR(SYSDATE, 'MM') - TO_CHAR(DCH.DATE_FIN_CONT_CRDT, 'MM'))) <= 36
Now the weird thing is when I run a simple query to count the number of lines in the parent table DCLI_CREDIT I get 6456682. However when I run the above query through a DSEE job my performance statistic at last view showed 137946001 and still counting. Is my SQL doing some kind of Cartesian product which is giving me more lines than in my parent table?
Tony
BI Consultant - Datastage
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Looks like you join it up ok for the first part of the WHERE, and then go and OR it to a cartesian product. Unless you meant to put parentheses before the "DCG.CODE_NATR_GAIN ='03' OR"

Surely you would have had an answer in 5 minutes on a SQL forum?
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Thanks Stuart.
I see where I went wrong. Thus for the moment my problem is solved.
Thanks all...:-)
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

See what happens when you post a real example? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

I see what you mean :D
Tony
BI Consultant - Datastage
Post Reply