Getting different number of records in sql server and ODBC

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Getting different number of records in sql server and ODBC

Post by shivan »

Hi All,
I am getting different number of rows count when i run the query in sql server and when i run the same query in ODBC connection in datastage job.
The way it works :
First i tested the query in sql server which brings 36000 rows.
Then when i inserted the same query in ODBC connection in datastage, i created the same number of columns. There are two calculated columns in the query, In the ODBC connection , i left it blank just assign the same name but nothing in the derivation.
Then when i run the job i get more rows like 36100 rows.

Thanks
shivan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Shivan,

could you post the SQL statement please?
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

select
T7.LF_ID,T7.inv_id,T7.INV_TOTAL_NET_DUE,T7.INV_NEXT_STEP,T7.INV_STATUS,T7.INV_FEE_TOT,T7.INV_EXP_TOT,
T7.INV_CUR_FEE_TOT,T7.INV_CUR_EXP_TOT,T7.INV_ORIG_NET_AMT,T7.INV_SUB_DT,T7.CL_MATTER_ID, T7.MATTER_DIVISION,
T7.LF_NAME, T7.CLAIM_NO,Y2.LVL_DATE,Y2.LVL_EMP_ID, Y2.LVL_EMP_NAME,T7.INV_DT, T7.INV_DUE_DATE,
T7.INV_START_DATE,T7.INV_END_DATE,T7.UPDATE_DT, T7.INV_UNDO_DT,T7.INV_AP_DT,T7.inv_id_esc,T7.inv_id_adj, T7.Flag,
case when Y2.LVL_DATE is null then 'N' else case when DatePart(wk,(Y2.LVL_DATE))=DatePart(wk,getdate())and
DatePart(Mm,(Y2.LVL_DATE)) = DatePart(Mm,getdate())
and DatePart(Yy,(Y2.LVL_DATE)) = DatePart(Yy,getdate()) then 'W' else
case when DatePart(Mm,(Y2.LVL_DATE)) = DatePart(Mm,getdate()) and
DatePart(Yy,(Y2.LVL_DATE)) = DatePart(Yy,getdate()) then 'M' else
case when DatePart(Yy,(Y2.LVL_DATE))=DatePart(Yy,getdate()) then 'Y' else 'P' end end end end APPROVAL_IND
from (select
t5.LF_ID,T5.inv_id,T5.INV_TOTAL_NET_DUE,T5.INV_NEXT_STEP,T5.INV_STATUS,T5.INV_FEE_TOT,T5.INV_EXP_TOT,
T5.INV_CUR_FEE_TOT,T5.INV_CUR_EXP_TOT,T5.INV_ORIG_NET_AMT,T5.INV_SUB_DT,T5.CL_MATTER_ID, T5.MATTER_DIVISION,
T5.LF_NAME, T5.CLAIM_NO,T5.INV_DT, T5.INV_DUE_DATE,
T5.INV_START_DATE,T5.INV_END_DATE,T5.UPDATE_DT, T5.INV_UNDO_DT,T5.INV_AP_DT,T5.inv_id_esc,T5.inv_id_adj, case when T6.rule_data = 'Y' then 'Y' else 'N' end Flag
from (select
t4.LF_ID,t4.inv_id,t4.INV_TOTAL_NET_DUE,t4.INV_NEXT_STEP,t4.INV_STATUS,t4.INV_FEE_TOT,t4.INV_EXP_TOT,
t4.INV_CUR_FEE_TOT,t4.INV_CUR_EXP_TOT,t4.INV_ORIG_NET_AMT,t4.INV_SUB_DT,t4.CL_MATTER_ID, t4.MATTER_DIVISION,
t4.LF_NAME, t4.CLAIM_NO,T4.INV_DT, T4.INV_DUE_DATE,T4.INV_START_DATE,T4.INV_END_DATE,
T4.UPDATE_DT, T4.INV_UNDO_DT,T4.INV_AP_DT,t4.inv_id_esc,t3.inv_id as
inv_id_adj
from(select
t1.LF_ID,t1.inv_id,t1.INV_TOTAL_NET_DUE,t1.INV_NEXT_STEP,t1.INV_STATUS,t1.INV_FEE_TOT,t1.INV_EXP_TOT,
t1.INV_CUR_FEE_TOT,t1.INV_CUR_EXP_TOT,t1.INV_ORIG_NET_AMT,t1.INV_SUB_DT, t1.CL_MATTER_ID, t1.MATTER_DIVISION,
t1.LF_NAME,t1.CLAIM_NO, T1.INV_DT, T1.INV_DUE_DATE,T1.INV_START_DATE,
T1.INV_END_DATE,T1.UPDATE_DT, T1.INV_UNDO_DT, T1.INV_AP_DT, t2.inv_id as
inv_id_esc
from(SELECT INVOICE.LF_ID,
INVOICE.INV_ID,INVOICE.INV_TOTAL_NET_DUE,INVOICE.INV_NEXT_STEP,INVOICE.INV_STATUS,INVOICE.INV_FEE_TOT,INVOICE.INV_EXP_TOT,INVOICE.INV_CUR_FEE_TOT,INVOICE.INV_CUR_EXP_TOT,INVOICE.INV_ORIG_NET_AMT,
INVOICE.INV_SUB_DT,MATTER.CL_MATTER_ID, MATTER.MATTER_DIVISION, LAW_FIRM.LF_NAME,
USER_DEFINED_FIELD.CLAIM_NO,INVOICE.INV_DT, INVOICE.INV_DUE_DATE,INVOICE.INV_START_DATE,INVOICE.INV_END_DATE,
INVOICE.UPDATE_DT, INVOICE.INV_UNDO_DT,INVOICE.INV_AP_DT
FROM INVOICE, MATTER, LAW_FIRM, USER_DEFINED_FIELD
WHERE INVOICE.LF_ID = MATTER.LF_ID
AND INVOICE.INV_ID = MATTER.INV_ID
AND INVOICE.LF_ID = LAW_FIRM.LF_ID
AND INVOICE.LF_ID = USER_DEFINED_FIELD.LF_ID
AND INVOICE.INV_ID = USER_DEFINED_FIELD.INV_ID
AND USER_DEFINED_FIELD.CLAIM_NO not in('#410-59361')
AND INVOICE.LF_ID <> ''
AND INVOICE.LF_ID <> ' '
group by
INVOICE.LF_ID,
INVOICE.INV_ID,INVOICE.INV_TOTAL_NET_DUE,INVOICE.INV_NEXT_STEP,INVOICE.INV_STATUS,INVOICE.INV_FEE_TOT,INVOICE.INV_EXP_TOT,INVOICE.INV_CUR_FEE_TOT,INVOICE.INV_CUR_EXP_TOT,INVOICE.INV_ORIG_NET_AMT,
INVOICE.INV_SUB_DT,MATTER.CL_MATTER_ID, MATTER.MATTER_DIVISION, LAW_FIRM.LF_NAME,
USER_DEFINED_FIELD.CLAIM_NO, INVOICE.INV_DT, INVOICE.INV_DUE_DATE,INVOICE.INV_START_DATE,INVOICE.INV_END_DATE,
INVOICE.UPDATE_DT, INVOICE.INV_UNDO_DT,INVOICE.INV_AP_DT
)
as T1 LEFT JOIN (select distinct(inv_id) from escalation) as T2 on T1.inv_id = T2.inv_id )
as T4 LEFT JOIN (select distinct(inv_id) from invoice_adjustment ) as T3 on T4.INV_ID = T3.INV_ID )
as T5 LEFT JOIN (select distinct INV_ID,CL_MATTER_ID,LF_ID,'Y' rule_data from rules
where rule_action <> 0 and rule_type='invoice' and rule_action is not null ) as T6 on
T5.INV_ID = T6.INV_ID and
T5.LF_ID = T6.LF_ID and
T5.CL_MATTER_ID = T6.CL_MATTER_ID)
as T7 LEFT JOIN (select APPROVAL_LEVEL.INV_ID,APPROVAL_LEVEL.LF_ID,APPROVAL_LEVEL.CL_MATTER_ID,APPROVAL_LEVEL.LVL_DATE,APPROVAL_LEVEL.LVL_EMP_ID,APPROVAL_LEVEL.LVL_EMP_NAME
from APPROVAL_LEVEL, (SELECT INV_ID,LF_ID,CL_MATTER_ID,MAX(LVL_DATE) AS MAX_LVL_DATE
from APPROVAL_LEVEL
where APPROVAL_LEVEL.GRP_NUMBER = 1
AND APPROVAL_LEVEL.LVL_NUMBER = 1
AND APPROVAL_LEVEL.APP_TYPE = 'invoice'
AND APPROVAL_LEVEL.LI_LINUM = '0' group by INV_ID,LF_ID,CL_MATTER_ID ) AS Y1
where APPROVAL_LEVEL.INV_ID = Y1.INV_ID
AND APPROVAL_LEVEL.LF_ID = Y1.LF_ID
AND APPROVAL_LEVEL.CL_MATTER_ID = Y1.CL_MATTER_ID
AND APPROVAL_LEVEL.LVL_DATE = Y1.MAX_LVL_DATE
AND APPROVAL_LEVEL.GRP_NUMBER = 1
AND APPROVAL_LEVEL.LVL_NUMBER = 1
AND APPROVAL_LEVEL.APP_TYPE = 'invoice'
AND APPROVAL_LEVEL.LI_LINUM = '0') AS Y2
ON T7.INV_ID = Y2.INV_ID
AND T7.LF_ID = Y2.LF_ID
AND T7.CL_MATTER_ID = Y2.CL_MATTER_ID


Thanks
Shivan
ArndW wrote:Shivan,

could you post the SQL statement please?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

OK, I shouldn't have asked :) I think you have no choice but to take each of the internal selects singly until you get to the part that has a different number of resulting records. Either that or get the list of additional rows and see if you can come up with a rule.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Did you verify that the ODBC points to the correct SQL Server DSN?
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

It is pointing to correct database. There are like 36000 rows, it wont be easy to compare all of them. Is there any easy way.

thanks
shivan
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Best will be to write into a seq file order by the keys and do a diff.

Did you try using the DB stage?
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

ok! thanks i will try to do that

shivan
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Shivan

Surely there is a better way to extract this data. This SQL is UGLY. Explain why it needs to be this UGLY. DataStage can be better utilized by not doing so much in a source query. You could probably take 36,000 records and load them into hash files and look this up one SQL statement at a time. Rethink your process and use hash files please.

Write it both ways and see which is faster.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you certain that no other process adds 100 rows between when you run the query in SQL server and when you run it in DataStage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

No, i m pretty sure. It is one to one relationship in the transformer. I found out the rows which are added. But i cant figure out anything from them. As the data looks fine in them. I never used hased file before. Can you explain this process in detail. It really important. I appreciate all your help.
thanks
shivan
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Each SELECT can be output into a hash file. The hash file needs a key so you need know what fields make it unique but it should be whatever you used to join these tables together. It is also easy to figure it out based on your group by clause. If you want you can remove one SELECT at a time.

After your hash file is built then go into the stage and saved the metadata. Next when you want to use this in a lookup then load this metadata into the lookup. So in the SQL below all these GROUP BY fields are the key. I am not too sure this helps you because you are doing select col... from (select col... from (select col... from x)). The idea is if you break your SQL down into smaller pieces then it is easier figure out what is broken. Simplify it so you brain can grasp it. You are doing most of your work on your source system. DataStage has many transformations to help you figure out weeks and months. You need to learn these instead what you are used to. Challenge yourself to learn the cool stuf in DataStage.

Code: Select all

Select 
	INVOICE.LF_ID,
	INVOICE.INV_ID,
	INVOICE.INV_TOTAL_NET_DUE,
	INVOICE.INV_NEXT_STEP,
	INVOICE.INV_STATUS,
	INVOICE.INV_FEE_TOT,
	INVOICE.INV_EXP_TOT,
	INVOICE.INV_CUR_FEE_TOT,
	INVOICE.INV_CUR_EXP_TOT,
	INVOICE.INV_ORIG_NET_AMT,
	INVOICE.INV_SUB_DT,
	MATTER.CL_MATTER_ID,
	MATTER.MATTER_DIVISION,
	LAW_FIRM.LF_NAME,
	USER_DEFINED_FIELD.CLAIM_NO,
	INVOICE.INV_DT,
	INVOICE.INV_DUE_DATE,
	INVOICE.INV_START_DATE,
	INVOICE.INV_END_DATE,
	INVOICE.UPDATE_DT,
	INVOICE.INV_UNDO_DT,
	INVOICE.INV_AP_DT
From 
	INVOICE,
	MATTER,
	LAW_FIRM,
	USER_DEFINED_FIELD
Where 
	INVOICE.LF_ID = MATTER.LF_ID
	And INVOICE.INV_ID = MATTER.INV_ID
	And INVOICE.LF_ID = LAW_FIRM.LF_ID
	And INVOICE.LF_ID = USER_DEFINED_FIELD.LF_ID
	And INVOICE.INV_ID = USER_DEFINED_FIELD.INV_ID
	And USER_DEFINED_FIELD.CLAIM_NO not in
	(
	'#410-59361'
	)
	And INVOICE.LF_ID <> ''
	And INVOICE.LF_ID <> ' '
Group By 
	INVOICE.LF_ID,
	INVOICE.INV_ID,
	INVOICE.INV_TOTAL_NET_DUE,
	INVOICE.INV_NEXT_STEP,
	INVOICE.INV_STATUS,
	INVOICE.INV_FEE_TOT,
	INVOICE.INV_EXP_TOT,
	INVOICE.INV_CUR_FEE_TOT,
	INVOICE.INV_CUR_EXP_TOT,
	INVOICE.INV_ORIG_NET_AMT,
	INVOICE.INV_SUB_DT,
	MATTER.CL_MATTER_ID,
	MATTER.MATTER_DIVISION,
	LAW_FIRM.LF_NAME,
	USER_DEFINED_FIELD.CLAIM_NO,
	INVOICE.INV_DT,
	INVOICE.INV_DUE_DATE,
	INVOICE.INV_START_DATE,
	INVOICE.INV_END_DATE,
	INVOICE.UPDATE_DT,
	INVOICE.INV_UNDO_DT,
	INVOICE.INV_AP_DT 
;
Mamu Kim
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

But i dont understand why it gives different rows. It is a same sql query which has different impact on sql server and datastage. In datastage it is returning 100 more rows. Who is right data stage or sql server.

shivan
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Write the link from the source stage to a file.

There can be lots of reasons for having less rows.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Just making sure...
Can you confirm all 4786 characters of your query reached the DB server?
I've had witnessed in the past a limited buffer cutting long queries and oddly enough it might be that the short version is valid but different where clauses cause differnet in result sets returned from both queries.

In general I would also try to think of a more DS oriented solution rather then this very complex SQL if I could.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply