Hi, I am using DS V 8.1
In this, there is a requirement wherein i am using the output of Sort Stage to be provided as the Right Link for Join Stage for an INNER JOIN. When the job is being run, i see that in JOB MONITOR, the output link from Sort is producing 800K rows where as the Inner Join Rec Link (Same link) is having only 700K odd rows. Any clue as to why this difference?
Eg : Sort Stage Output Link Name : lnk_srt_Join_RecCount
The record count during run time when viewed from Monitor: 800K
Join Stage Right Link Name (Inner Join) : lnk_srt_Join_RecCount
The record count during run time when viewed from Monitor: 750K.
Regards
Difference in Record Count in Sort/Join Stages
Moderators: chulett, rschirm, roy
Difference in Record Count in Sort/Join Stages
Subbu
TCS BAngalore
TCS BAngalore
An inner join, by definition, only outputs rows when a match is encountered between the left and other input links, and is typically driven by the left input. If there is no match to a row in the left input, there will not be a record output.
Typical reasons for what you are seeing with an inner join include:
1) Not as many actual matches as you are expecting
2) The partitioning is incorrect (matching records don't appear in the same partition)
How many rows are present in your left input link?
Is your data partitioned correctly so that all rows for a given key value will be placed into the same partition?
Regards,
Typical reasons for what you are seeing with an inner join include:
1) Not as many actual matches as you are expecting
2) The partitioning is incorrect (matching records don't appear in the same partition)
How many rows are present in your left input link?
Is your data partitioned correctly so that all rows for a given key value will be placed into the same partition?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
@James: Thanks for the quick response.
The Left Link has around 10K Records. (Comparitively very less to the reference link) in this particular case. There is no problem with the output as all the Source records have found valid matches in the reference. The partitioning also is as per requirement (HASH) for the specific columns.
However, the question remains, why there is a discrepancy in Record Count Display (Job Monitor) on the same link.
Sort Stage Output Link Name : lnk_srt_Join_RecCount
Join Stage Right Link Name (Inner Join) : lnk_srt_Join_RecCount
Since this is the same link thats connecting the Sort Stage and Join Stage.
The Left Link has around 10K Records. (Comparitively very less to the reference link) in this particular case. There is no problem with the output as all the Source records have found valid matches in the reference. The partitioning also is as per requirement (HASH) for the specific columns.
However, the question remains, why there is a discrepancy in Record Count Display (Job Monitor) on the same link.
Sort Stage Output Link Name : lnk_srt_Join_RecCount
Join Stage Right Link Name (Inner Join) : lnk_srt_Join_RecCount
Since this is the same link thats connecting the Sort Stage and Join Stage.
Subbu
TCS BAngalore
TCS BAngalore
Thank you...I misunderstood the question. Have you set the APT_RECORD_COUNTS environment variable to show input/output counts for the various operators in the job logs?
One other possibility in your situation is:
Once all Left Input records have been processed, the remaining records on the right input may have been abandoned...there is no need to process them as there's nothing more to match to. As you only have 10,000 records on the left input, this is a distinct possibility.
Regards,
One other possibility in your situation is:
Once all Left Input records have been processed, the remaining records on the right input may have been abandoned...there is no need to process them as there's nothing more to match to. As you only have 10,000 records on the left input, this is a distinct possibility.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.