compare records in file with rows in DB

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

compare records in file with rows in DB

Post by vpauls »

I have a complex flat file that starts with a header that includes the number of records in the file. I want to use this information to check both if the number of records in the file matches the header and when readeing these records into a database, see if I get all records i.e if the number of rows mathes the number of records in the header.

If any of these conditions fail I want to abort and alert. Is this possible? Is there any "smart" way of doing it?

I'm something of a newbie using datastage so please try to explain without using to many strange two letters ;)

Best Regards
/Elin
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

There are several ways of doing this. You can use ExecuteCommand activity to read the Header information, and pass it as an parameter to the job which read the Database. The job can read the database and count using Aggregator or simple count using using Database function and pass it to transformer.
In transformer, you can pass the row if Input parameter <> count of database records. And you can use "Abort after n rows" option to abort the job.
For file check, you can do the same or you can directly do (wc -l <filename>)-1 and compare with the header.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

Post by vpauls »

thanks, Execute command sounds like a good option. I am not sure how to use this though. Do you have any example?

Br
/Elin
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

ExecuteCommand that available in JobSequence can be used to run OS commands. You can either call any shell script, or directly run simple comands like this.
Under the ExecComand Tab, you'll find two options.
1. Command - The actual command that you going to use.
2. Parameters - The list of parameters that need to be passed to the command.

For geting use to the stage, just type

Code: Select all

wc -l {filename with fully qualified path}
and check the log. You ll find the command output.
Then try with parameteres passes for file name.
Command will be

Code: Select all

wc -l
and the file name parameter will be placed in Parameters options as #Filename#

Try this, go thorugh the given manual, do a search. And get back if you have any issues. And get back with the logic that you have chosed to follow.[/code]
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

Post by vpauls »

My tought was to first read the first line from the file using dataStage, and load it into a new file that I then sould use to compare with the whole set. Is this a bad option, the reason is that I have no idea how to read the file using a command. How do I write the command file? What programming language and so on?
Thanks again!
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Sorry, just now I noticed that you are on Windows. I was giving you unix commands. If you have MKS toolkit installed, you could even execute Unix commands.
For you case, the options that you have choosed will be simple to construct using Datastage.
If you wish to use ExecuteCommand activity for passing parameters by reading the value of file, you can use TYPE command in that. Its basically DOS commands.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

Post by vpauls »

sounds good that my way is possible, but is it an ugly way of solving it? I want it to be as generic as possible. For the time being we are only to use the number of rows but the header contains a lot more information that could be useful.

The line I want to read looks something like this:

1010LA 200701012007013120070201000000233

and the number I want to use are the last ones 233 in this case. But also the other numbers contain information. My thougth is to put ever distinct information in each column and then continue with the rest of the file into another database.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You ll have a lay out for the header. Read the header using that layout and restrict just the first row through Transformer (@INROWNUM =1) and store it in a database or in a File with the field name given to it.
The other two job will read this file, along with that regular file or Table. It will compare the values of the total number of rows present in the file as well as value that present in the header file. IF both are same, you can proceed with you flow, else you can raise the abort signal.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

Post by vpauls »

I'm terribly sorry to bother you again but I just realaised that althoug I am using a windows client for running datastage. The server is actually a unix server and the batch program should therefore be for UNIX.

A quick question therefore, can't find it on google: How do I write the syntax for grepping a whole line, if the first letter on the line is a 1 ? If it is a 2 I want to skip the line.

Br
/Elin
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

To fetch the first line alone in unix, use

Code: Select all

 head -1 <file name>
.
For you question, use

Code: Select all

grep "1" <filename>.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

grep ^1 <filename>

the carat means the beginning of a line
$ means the end of a line.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

trobinson wrote:grep ^1 <filename>
Note that, the given regular expression can handle only character basis and not a whole string. If you need to grep two or three character to gether, you can try,
grep ^[1][2][3] <filename>.
where it looks for the line which starts with 123
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vpauls
Premium Member
Premium Member
Posts: 37
Joined: Mon May 09, 2005 2:26 am
Location: Oslo

Post by vpauls »

Thank ypui all very much for your answers! I think I have solved the issue for the moment with your help.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Who is ypui? I dont think he/she has contributed something to this post. :wink:
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply