Compare differences in row counts after loading

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Compare differences in row counts after loading

Post by iq_etl »

I'm creating a server job in a sequence that needs to compare the row counts on a table before and after loading. If the post load count is, say, 20% fewer than the pre load count, I want to be sent an email (will use DSSendMail). Two issues:

1) I'm trying to do this comparison in a Transformer stage passing out a value of 'Y' if the condition is met, but I'm not having any success.

Is there another stage I should use?


2) Executing function DSSendMail is set up in the Job Control of the job properties, correct? How do I kick it off? My thought is I would check to see if the variable value is 'Y' coming out of the transformer stage, but what stage would I have this evaluation in?

Thank you!
Last edited by iq_etl on Fri Aug 23, 2013 7:36 am, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

1. In a Server job, all of the work happens in a Transformer so you should be able to do what you need there. We'd need more details to provide any help, though.

2. Use the Notification Activity stage in the Seqence to send the email. I'd suggest leveraging the USERSTATUS area of the job doing the check, the Sequence can easily test a value there and then trigger the email if needed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

So I assume your job will either clear the table or do deletes as well as updates and inserts?

I agree with Craig, its really easy to do in a Job sequence. Write one small job to do a SELECT COUNT(*) from the table and pass that back in the UserStatus. Run the small job before and after the update. Compare the two UserStatus values, and use the Notification Activity if it meets your criteria.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Yes, you guys are correct, the jobs will generally do a delete then an insert. We're basically wanting to be notified if there's a large difference in the row count before the delete and after the load.

So, I'll use the Transformer stage in one small stage to check the two counts, then pass a value which I will evaluate to decide whether to kick off the Notification Activity or not. Sound correct?
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Ok, I've got two other questions.

1) In order to set a value in the UserStatus I use DSSetUserStatus(), but do I call that in the Transformer?

2) If, in my Transformer, I have a Stage Variable called 'CountDiff' that's value is the difference between the pre and post row counts and I use it in the constraint as follows:

If countDiff > 20 then DSSetUserStatus('Y')

I can then evaluate the value of UserStatus as 'Y' or ' ', correct?

Perhaps I don't even need the constraint and can set the UserStatus with the value of the difference and evaluate that difference as a trigger instead of looking for 'Y'.

Did that make any sense?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

1. Been discussed quite a bit, an exact search for "call dssetuserstatus" should turn up the pertinent ones.

2. Call it regardless and set it to 'Y' or 'N' as appropriate and then check in your trigger.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Yes, it does appear to have been discussed frequently.

However, I'm not 100% on if I need to create the function or not. From reading the suggestion in the link below it sounds like I need to define, or initialize, UserStatus before I can call it in the Transformer. Would I do that in a InterProcess stage? I'm not familiar with where an interlude routine is written.

viewtopic.php?t=107023&highlight=call+dssetuserstatus

Thanks!
Mayurgupta7744
Participant
Posts: 8
Joined: Wed Jul 10, 2013 11:30 pm
Location: Nagpur

How to Implement Same logic in Parallel Job

Post by Mayurgupta7744 »

chulett wrote:1. Been discussed quite a bit, an exact search for "call dssetuserstatus" should turn up the pertinent ones.

2. Call it regardless and set it to 'Y' or 'N' as appropriate and then check in your tri ...
Hi,

I am anxious to know, how we can implement the same logic in parallel job. As in parallel job we don't have functionality to SetUserStatus value...
Best Regards,
Mayur
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you Search? One recent discussion is to be found here
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

iq_etl wrote:However, I'm not 100% on if I need to create the function or not.
Yes, you do. Here's how to create a Server routine if that's new to you. The entirety of the code needed is in the post that you linked to, there's no need to 'initialize' anything.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Sorry for the delay, but I got sidetracked with other projects. Does the fact that I'm running a parallel edition of DS have any bearing on this? We used to be in server, but going to 9.1 we switched over to parallel and hope to harness some of its capabilities.

Should I create a parallel routine for this instead of a server routine?

Thanks!
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If you are doing delete and insert then checking row counts on the job will not necessarily give you what you expect. Most people assume the delete and insert is done on key values. This is not necessarily true. So one delete could delete 100 rows. We don't know without looking at the table keys.

You need to do your counts in a separate job store in a table with todays date. Do a comparison to the max and the next to max date. This will compare last run to this run. If you don't know how to do this then let me know and I can post the sql. There is sql like this in EtlStats.
Mamu Kim
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

kduke wrote:You need to do your counts in a separate job store in a table with todays date. Do a comparison to the max and the next to max date. This will compare last run to this run. If you don't know how to do this then let me know and I can post the sql. There is sql like this in EtlStats.
If you've got that to share it would be fantastic!

Thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In Kim's signature line on all of his posts there is a link to his website... it's all there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Here is one.

Code: Select all

select 
   NEXT2LAST_RUN.PROJECT_NAME,
   NEXT2LAST_RUN.JOB_NAME,
   NEXT2LAST_RUN.LINK_NAME,
   NEXT2LAST_RUN.START_TS,
   NEXT2LAST_RUN.END_TS,
   NEXT2LAST_RUN.ELAPSED_TIME,
   NEXT2LAST_RUN.ELAPSED_SEC,
   NEXT2LAST_RUN.ROW_COUNT,
   NEXT2LAST_RUN.ROWS_PER_SEC
from 
   ETL_ROW_HIST NEXT2LAST_RUN,
   (
      select 
         ROWS_NEXT2LAST.PROJECT_NAME,
         ROWS_NEXT2LAST.JOB_NAME,
         ROWS_NEXT2LAST.LINK_NAME,
         max(ROWS_NEXT2LAST.START_TS) as MAX_START_TS
      from 
         ETL_ROW_HIST ROWS_NEXT2LAST,
         (
            select 
               PROJECT_NAME,
               JOB_NAME,
               LINK_NAME,
               max(START_TS) as MAX_START_TS
            from 
               ETL_ROW_HIST
            group by
               PROJECT_NAME,
               JOB_NAME,
               LINK_NAME
         ) ROWS_LAST
      where
         ROWS_NEXT2LAST.PROJECT_NAME = ROWS_LAST.PROJECT_NAME and
         ROWS_NEXT2LAST.JOB_NAME = ROWS_LAST.JOB_NAME and
         ROWS_NEXT2LAST.LINK_NAME = ROWS_LAST.LINK_NAME and
         ROWS_NEXT2LAST.START_TS < ROWS_LAST.MAX_START_TS
      group by
         ROWS_NEXT2LAST.PROJECT_NAME,
         ROWS_NEXT2LAST.JOB_NAME,
         ROWS_NEXT2LAST.LINK_NAME
   ) ROWS_MAX
where
   NEXT2LAST_RUN.PROJECT_NAME = ROWS_MAX.PROJECT_NAME and
   NEXT2LAST_RUN.JOB_NAME = ROWS_MAX.JOB_NAME and
   NEXT2LAST_RUN.LINK_NAME = ROWS_MAX.LINK_NAME and
   NEXT2LAST_RUN.START_TS = ROWS_MAX.MAX_START_TS
order by
   NEXT2LAST_RUN.PROJECT_NAME,
   NEXT2LAST_RUN.JOB_NAME,
   NEXT2LAST_RUN.LINK_NAME,
   NEXT2LAST_RUN.START_TS
Mamu Kim
Post Reply