Getting the source file & table count

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dssiddu
Participant
Posts: 66
Joined: Mon Nov 07, 2005 10:28 pm
Contact:

Getting the source file & table count

Post by dssiddu »

Hi All

I searched a lot but could not find the right path.
so posting here
question is: how to get the count if file is source or if the source is table.
And also i need to get the target count.
I want to use these counts in the onther job.

Need your help on this..

thanks a lot..
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Re: Getting the source file & table count

Post by kris007 »

If the source is a file you can use something like this in an Execute Command stage within a Job Sequence and pass it as a parameter to the job you would like.

Code: Select all

 ls -1t filename | xargs wc -l | awk '{ printf ("%d", ($1-1))}'
If the source is a table, the way I have done it in the past is to develop a shell script that connects to the database and get the count from the table and returns the value at the end. The shell script was executed through Execute Command Stage and the Command Output was passed as a parameter to the job you need.

Hope that helps.
Kris

Where's the "Any" key?-Homer Simpson
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I really believe all ETL projects should have completeness reports. I wrote a blog about it. I have web pages on how to do it. Basically you need a report comparing source counts to target counts for each target table.

Code: Select all

target table name |      Target Count |     Source Count |  Percent |
---------------------------------------------------------------------
CustomerDim       |               997 |             1000 |    99.70 |
OrderTypeDim      |                 3 |                3 |   100.00 |
---------------------------------------------------------------------
I give the jobs away to build this. It is part of EtlStats. The reports come out very nice. The SQL to count the source and target is stored in a table. The results in another table and report definition in another table.

http://www.duke-consulting.com/Setting_ ... eports.htm
http://it.toolbox.com/blogs/object-orie ... orts-35391
Mamu Kim
dssiddu
Participant
Posts: 66
Joined: Mon Nov 07, 2005 10:28 pm
Contact:

Re: Getting the source file & table count

Post by dssiddu »

Hello Kris

Thanks a lot for your info

But just 1 question when we execute the shell script from the Execute command stage we will get the connection information also in the command output how to remove that please...I want only count in the output command.


kris007 wrote:If the source is a file you can use something like this in an Execute Command stage within a Job Sequence and pass it as a parameter to the job you would like.

Code: Select all

 ls -1t filename | xargs wc -l | awk '{ printf ("%d", ($1-1))}'
If the source is a table, the way I have done it in the past is to develop a shell script that connects to the database and get the count from the table and returns the value at the end. The shell script was executed through Execute Command Stage and the Command Output was passed as a parameter to the job you need.

Hope that helps.
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

What do you mean by connection information?

Above command mentioned by Kris will give only the number of lines in a file.

This command also gives just the number of lines

Code: Select all

printf $(wc -l <filename)
You are the creator of your destiny - Swami Vivekananda
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Re: Getting the source file & table count

Post by kris007 »

dssiddu wrote:Hello Kris
But just 1 question when we execute the shell script from the Execute command stage we will get the connection information also in the command output how to remove that please...I want only count in the output command.
There is no shell script involved. You just have to use the command within the Execute Command stage. Make sure you give the file path correctly within the Execute Command Stage.
Kris

Where's the "Any" key?-Homer Simpson
dssiddu
Participant
Posts: 66
Joined: Mon Nov 07, 2005 10:28 pm
Contact:

Re: Getting the source file & table count

Post by dssiddu »

HI Kris

thanks, But if my source is table then i need to call 1 script in that i need to connect db then i will get some record count as the output command along with the db connection information.It does't give the only number to output command.
Hope its clear..

Thanks
kris007 wrote:
dssiddu wrote:Hello Kris
But just 1 question when we execute the shell script from the Execute command stage we will get the connection information also in the command output how to remove that please...I want only count in the output command.
There is no shell script involved. You just have to use the command within the Execute Command stage. Make sure you give the file path correctly within the Execute Command Stage.
dssiddu
Participant
Posts: 66
Joined: Mon Nov 07, 2005 10:28 pm
Contact:

Re: Getting the source file & table count

Post by dssiddu »

dssiddu wrote:HI Kris

thanks, But if my source is table then i need to call 1 script in that i need to connect db then i will get some record count as the output command along with the db connection information.It does't give the only number to output command. Basically i need to use the output of the script in the execute command in the next stage in the seq.

Hope its clear..

Thanks
kris007 wrote:
dssiddu wrote:Hello Kris
But just 1 question when we execute the shell script from the Execute command stage we will get the connection information also in the command output how to remove that please...I want only count in the output command.
There is no shell script involved. You just have to use the command within the Execute Command stage. Make sure you give the file path correctly within the Execute Command Stage.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Re: Getting the source file & table count

Post by kris007 »

Yes it will. As long as you echo only the COUNT at the end of the script. All other informational messages should be written to the log file. Something along these lines:

Code: Select all

connection=`db2 connect to Database user user using password`

        if [[ $? -ne 0 ]]; then
                echo "\n Error Connecting to the Database \n" >> $logfilename
                exit 3
        fi

COUNT=`db2 -x "SELECT COUNT(*) FROM TABLE"`
echo $COUNT
Kris

Where's the "Any" key?-Homer Simpson
Post Reply