Compare differences in row counts after loading

iq_etl · Post by **iq_etl** » Thu Aug 22, 2013 2:01 pm

I'm creating a server job in a sequence that needs to compare the row counts on a table before and after loading. If the post load count is, say, 20% fewer than the pre load count, I want to be sent an email (will use DSSendMail). Two issues:

1) I'm trying to do this comparison in a Transformer stage passing out a value of 'Y' if the condition is met, but I'm not having any success.

Is there another stage I should use?

2) Executing function DSSendMail is set up in the Job Control of the job properties, correct? How do I kick it off? My thought is I would check to see if the variable value is 'Y' coming out of the transformer stage, but what stage would I have this evaluation in?

Thank you!

chulett · Post by **chulett** » Thu Aug 22, 2013 2:13 pm

1. In a Server job, all of the work happens in a Transformer so you should be able to do what you need there. We'd need more details to provide any help, though.

2. Use the Notification Activity stage in the Seqence to send the email. I'd suggest leveraging the USERSTATUS area of the job doing the check, the Sequence can easily test a value there and then trigger the email if needed.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Aug 22, 2013 2:28 pm

So I assume your job will either clear the table or do deletes as well as updates and inserts?

I agree with Craig, its really easy to do in a Job sequence. Write one small job to do a SELECT COUNT(*) from the table and pass that back in the UserStatus. Run the small job before and after the update. Compare the two UserStatus values, and use the Notification Activity if it meets your criteria.

iq_etl · Post by **iq_etl** » Fri Aug 23, 2013 7:45 am

Yes, you guys are correct, the jobs will generally do a delete then an insert. We're basically wanting to be notified if there's a large difference in the row count before the delete and after the load.

So, I'll use the Transformer stage in one small stage to check the two counts, then pass a value which I will evaluate to decide whether to kick off the Notification Activity or not. Sound correct?

iq_etl · Post by **iq_etl** » Wed Aug 28, 2013 7:57 am

Ok, I've got two other questions.

1) In order to set a value in the UserStatus I use DSSetUserStatus(), but do I call that in the Transformer?

2) If, in my Transformer, I have a Stage Variable called 'CountDiff' that's value is the difference between the pre and post row counts and I use it in the constraint as follows:

If countDiff > 20 then DSSetUserStatus('Y')

I can then evaluate the value of UserStatus as 'Y' or ' ', correct?

Perhaps I don't even need the constraint and can set the UserStatus with the value of the difference and evaluate that difference as a trigger instead of looking for 'Y'.

Did that make any sense?

chulett · Post by **chulett** » Wed Aug 28, 2013 8:21 am

1. Been discussed quite a bit, an exact search for "call dssetuserstatus" should turn up the pertinent ones.

2. Call it regardless and set it to 'Y' or 'N' as appropriate and then check in your trigger.

iq_etl · Post by **iq_etl** » Wed Aug 28, 2013 12:00 pm

Yes, it does appear to have been discussed frequently.

However, I'm not 100% on if I need to create the function or not. From reading the suggestion in the link below it sounds like I need to define, or initialize, UserStatus before I can call it in the Transformer. Would I do that in a InterProcess stage? I'm not familiar with where an interlude routine is written.

viewtopic.php?t=107023&highlight=call+dssetuserstatus

Thanks!

Mayurgupta7744 · Post by **Mayurgupta7744** » Mon Sep 02, 2013 10:19 pm

chulett wrote:1. Been discussed quite a bit, an exact search for "call dssetuserstatus" should turn up the pertinent ones.

2. Call it regardless and set it to 'Y' or 'N' as appropriate and then check in your tri ...

Hi,

I am anxious to know, how we can implement the same logic in parallel job. As in parallel job we don't have functionality to SetUserStatus value...

ray.wurlod · Post by **ray.wurlod** » Mon Sep 02, 2013 11:23 pm

Did you Search? One recent discussion is to be found here

chulett · Post by **chulett** » Tue Sep 03, 2013 7:22 am

iq_etl wrote:However, I'm not 100% on if I need to create the function or not.

Yes, you do. Here's how to create a Server routine if that's new to you. The entirety of the code needed is in the post that you linked to, there's no need to 'initialize' anything.

iq_etl · Post by **iq_etl** » Tue Sep 24, 2013 1:49 pm

Sorry for the delay, but I got sidetracked with other projects. Does the fact that I'm running a parallel edition of DS have any bearing on this? We used to be in server, but going to 9.1 we switched over to parallel and hope to harness some of its capabilities.

Should I create a parallel routine for this instead of a server routine?

Thanks!

kduke · Post by **kduke** » Thu Sep 26, 2013 7:56 pm

If you are doing delete and insert then checking row counts on the job will not necessarily give you what you expect. Most people assume the delete and insert is done on key values. This is not necessarily true. So one delete could delete 100 rows. We don't know without looking at the table keys.

You need to do your counts in a separate job store in a table with todays date. Do a comparison to the max and the next to max date. This will compare last run to this run. If you don't know how to do this then let me know and I can post the sql. There is sql like this in EtlStats.

iq_etl · Post by **iq_etl** » Tue Oct 08, 2013 8:03 am

kduke wrote:You need to do your counts in a separate job store in a table with todays date. Do a comparison to the max and the next to max date. This will compare last run to this run. If you don't know how to do this then let me know and I can post the sql. There is sql like this in EtlStats.

If you've got that to share it would be fantastic!

Thanks!

chulett · Post by **chulett** » Tue Oct 08, 2013 9:06 am

In Kim's signature line on all of his posts there is a link to his website... it's all there.

kduke · Post by **kduke** » Fri Oct 11, 2013 8:50 am

Here is one.

Code: Select all

select 
   NEXT2LAST_RUN.PROJECT_NAME,
   NEXT2LAST_RUN.JOB_NAME,
   NEXT2LAST_RUN.LINK_NAME,
   NEXT2LAST_RUN.START_TS,
   NEXT2LAST_RUN.END_TS,
   NEXT2LAST_RUN.ELAPSED_TIME,
   NEXT2LAST_RUN.ELAPSED_SEC,
   NEXT2LAST_RUN.ROW_COUNT,
   NEXT2LAST_RUN.ROWS_PER_SEC
from 
   ETL_ROW_HIST NEXT2LAST_RUN,
   (
      select 
         ROWS_NEXT2LAST.PROJECT_NAME,
         ROWS_NEXT2LAST.JOB_NAME,
         ROWS_NEXT2LAST.LINK_NAME,
         max(ROWS_NEXT2LAST.START_TS) as MAX_START_TS
      from 
         ETL_ROW_HIST ROWS_NEXT2LAST,
         (
            select 
               PROJECT_NAME,
               JOB_NAME,
               LINK_NAME,
               max(START_TS) as MAX_START_TS
            from 
               ETL_ROW_HIST
            group by
               PROJECT_NAME,
               JOB_NAME,
               LINK_NAME
         ) ROWS_LAST
      where
         ROWS_NEXT2LAST.PROJECT_NAME = ROWS_LAST.PROJECT_NAME and
         ROWS_NEXT2LAST.JOB_NAME = ROWS_LAST.JOB_NAME and
         ROWS_NEXT2LAST.LINK_NAME = ROWS_LAST.LINK_NAME and
         ROWS_NEXT2LAST.START_TS < ROWS_LAST.MAX_START_TS
      group by
         ROWS_NEXT2LAST.PROJECT_NAME,
         ROWS_NEXT2LAST.JOB_NAME,
         ROWS_NEXT2LAST.LINK_NAME
   ) ROWS_MAX
where
   NEXT2LAST_RUN.PROJECT_NAME = ROWS_MAX.PROJECT_NAME and
   NEXT2LAST_RUN.JOB_NAME = ROWS_MAX.JOB_NAME and
   NEXT2LAST_RUN.LINK_NAME = ROWS_MAX.LINK_NAME and
   NEXT2LAST_RUN.START_TS = ROWS_MAX.MAX_START_TS
order by
   NEXT2LAST_RUN.PROJECT_NAME,
   NEXT2LAST_RUN.JOB_NAME,
   NEXT2LAST_RUN.LINK_NAME,
   NEXT2LAST_RUN.START_TS

DSXchange

Compare differences in row counts after loading

Compare differences in row counts after loading

How to Implement Same logic in Parallel Job