Matching Row Count

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Matching Row Count

Post by pravin1581 »

Hi All,

I need to develop a job that will give me the number of rows pulled from the source tables and the number of rows loaded into the final table and whether they are equal or not.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Excellent. Noting your use of the first person personal pronoun, I say "go for it".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:Excellent. Noting your use of the first person personal pronoun, I say "go for it".
Any sort of help will be appreciated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create an after-job subroutine that uses DSGetLinkInfo() to retrieve the row counts, then compares them. What you need to do with that number appears not to be in your specification.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sanath1234
Participant
Posts: 24
Joined: Wed Oct 17, 2007 1:28 am
Location: hyderabad

Re: Matching Row Count

Post by sanath1234 »

pravin1581 wrote:Hi All,

I need to develop a job that will give me the number of rows pulled from the source tables and the number of rows loaded into the final table and whether they are equal or not.
use peek stage or an environment variable called APT_DUMP_SCORE which will return the number of records u collected with this u can solve the problem...
say cheers always................
bikan
Premium Member
Premium Member
Posts: 128
Joined: Thu Jun 08, 2006 5:27 am

Post by bikan »

You might probably be interested in getting the count from the command line utility, if you want to use the counts for file balancing.

Use dsjob -linkinfo <project> <job> <stage> <link> to get the link count
bikan
Premium Member
Premium Member
Posts: 128
Joined: Thu Jun 08, 2006 5:27 am

Post by bikan »

You might probably be interested in getting the count from the command line utility, if you want to use the counts for file balancing.

Use dsjob -linkinfo <project> <job> <stage> <link> to get the link count
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

bikan wrote:You might probably be interested in getting the count from the command line utility, if you want to use the counts for file balancing.

Use dsjob -linkinfo <project> <job> <stage> <link> to get the link count
But using dsjob gives the record count from a particular link.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So? Using a consistent stage and link naming convention in your jobs can ease that difficulty immensely. For example contemplate suffixes such as _INP, _OUT, _REJ, _LKP (or prefixes if you prefer) on your links that connect to the "outside world".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:So? Using a consistent stage and link naming convention in your jobs can ease that difficulty immensely. For example contemplate suffixes such as _INP, _OUT, _REJ, _LKP (or prefixes if you prefer) on your links that connect to the "outside world".
I think, I misinterpreted the problem. The situation is I need to verify the summation of columns from 2 tables. I have approached the problem in this way : We have taken the reqd. column from the table and by an aggregator we have found out the sum same for the coulmn also and now we need to compare the carry that information in the file, values of both the columns and their difference in summation.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ah, that's rather a different problem. It is solved using a "fork join" technique - split your data into two streams using a Copy stage, run one stream through the Aggregator stage, and re-unite them using a Join stage. The join key and the grouping key are probably the same.

Code: Select all

------> Copy stage ---------------> Join Stage ------>
          |                               ^
          |                               |
          +------> Aggregator Stage ------+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:Ah, that's rather a different problem. It is solved using a "fork join" technique - split your data into two streams using a Copy stage, run one stream through the Aggregator stage, and re-unite them using a Join stage. The join key and the grouping key are probably the same.

Code: Select all

------> Copy stage ---------------> Join Stage ------>
          |                               ^
          |                               |
          +------> Aggregator Stage ------+
The data is coming in two different streams via two different tables and summing through the aggregator , how can I use the Join stage because then I will not get the values when they differ. I have joined them using Funnel but the problem is they are becoming one col and I am unable to calculate the difference between them or either split the columns. The design that we have implemented is as follows:

Code: Select all


 Table 1------Aggregator


 Table 2------Aggregator

[code]

 I have not depicted the flow after aggregator as I am not sure that this is a correct way. As mentioned above we have implemented Funnel after aggregator and then TRANSFORMER.

 Any sort of help is solicited.

Thanks
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

pravin1581 wrote:
ray.wurlod wrote:Ah, that's rather a different problem. It is solved using a "fork join" technique - split your data into two streams using a Copy stage, run one stream through the Aggregator stage, and re-unite them using a Join stage. The join key and the grouping key are probably the same.

Code: Select all

------> Copy stage ---------------> Join Stage ------>
          |                               ^
          |                               |
          +------> Aggregator Stage ------+
The data is coming in two different streams via two different tables and summing through the aggregator , how can I use the Join stage because then I will not get the values when they differ. I have joined them using Funnel but the problem is they are becoming one col and I am unable to calculate the difference between them or either split the columns. The design that we have implemented is as follows:

Code: Select all


 Table 1------Aggregator


 Table 2------Aggregator

[code]

 I have not depicted the flow after aggregator as I am not sure that this is a correct way. As mentioned above we have implemented Funnel after aggregator and then TRANSFORMER.

 Any sort of help is solicited.

Thanks

I have managed to achieve the sumed values and their difference in diff columns in a file but the design is becoming very cumbersome.

Code: Select all


 Table 1-------> Aggr-------->
                                           |
                                          Join ------->Transformer----File
                                           |    
 Table 2-------->Aggr-------->

Transformer is used for calculating the difference between the 2 sums.Another problem is we will not get any data in the file when the join keys (date) dont match. Hence Join is not doing the job always.
Post Reply