Page 1 of 1

compare records in file with rows in DB

Posted: Tue Mar 13, 2007 2:31 am
by vpauls
I have a complex flat file that starts with a header that includes the number of records in the file. I want to use this information to check both if the number of records in the file matches the header and when readeing these records into a database, see if I get all records i.e if the number of rows mathes the number of records in the header.

If any of these conditions fail I want to abort and alert. Is this possible? Is there any "smart" way of doing it?

I'm something of a newbie using datastage so please try to explain without using to many strange two letters ;)

Best Regards
/Elin

Posted: Tue Mar 13, 2007 2:40 am
by kumar_s
There are several ways of doing this. You can use ExecuteCommand activity to read the Header information, and pass it as an parameter to the job which read the Database. The job can read the database and count using Aggregator or simple count using using Database function and pass it to transformer.
In transformer, you can pass the row if Input parameter <> count of database records. And you can use "Abort after n rows" option to abort the job.
For file check, you can do the same or you can directly do (wc -l <filename>)-1 and compare with the header.

Posted: Tue Mar 13, 2007 4:52 am
by vpauls
thanks, Execute command sounds like a good option. I am not sure how to use this though. Do you have any example?

Br
/Elin

Posted: Tue Mar 13, 2007 4:59 am
by kumar_s
ExecuteCommand that available in JobSequence can be used to run OS commands. You can either call any shell script, or directly run simple comands like this.
Under the ExecComand Tab, you'll find two options.
1. Command - The actual command that you going to use.
2. Parameters - The list of parameters that need to be passed to the command.

For geting use to the stage, just type

Code: Select all

wc -l {filename with fully qualified path}
and check the log. You ll find the command output.
Then try with parameteres passes for file name.
Command will be

Code: Select all

wc -l
and the file name parameter will be placed in Parameters options as #Filename#

Try this, go thorugh the given manual, do a search. And get back if you have any issues. And get back with the logic that you have chosed to follow.[/code]

Posted: Tue Mar 13, 2007 5:07 am
by vpauls
My tought was to first read the first line from the file using dataStage, and load it into a new file that I then sould use to compare with the whole set. Is this a bad option, the reason is that I have no idea how to read the file using a command. How do I write the command file? What programming language and so on?
Thanks again!

Posted: Tue Mar 13, 2007 5:16 am
by kumar_s
Sorry, just now I noticed that you are on Windows. I was giving you unix commands. If you have MKS toolkit installed, you could even execute Unix commands.
For you case, the options that you have choosed will be simple to construct using Datastage.
If you wish to use ExecuteCommand activity for passing parameters by reading the value of file, you can use TYPE command in that. Its basically DOS commands.

Posted: Tue Mar 13, 2007 5:36 am
by vpauls
sounds good that my way is possible, but is it an ugly way of solving it? I want it to be as generic as possible. For the time being we are only to use the number of rows but the header contains a lot more information that could be useful.

The line I want to read looks something like this:

1010LA 200701012007013120070201000000233

and the number I want to use are the last ones 233 in this case. But also the other numbers contain information. My thougth is to put ever distinct information in each column and then continue with the rest of the file into another database.

Posted: Tue Mar 13, 2007 5:43 am
by kumar_s
You ll have a lay out for the header. Read the header using that layout and restrict just the first row through Transformer (@INROWNUM =1) and store it in a database or in a File with the field name given to it.
The other two job will read this file, along with that regular file or Table. It will compare the values of the total number of rows present in the file as well as value that present in the header file. IF both are same, you can proceed with you flow, else you can raise the abort signal.

Posted: Tue Mar 13, 2007 6:19 am
by vpauls
I'm terribly sorry to bother you again but I just realaised that althoug I am using a windows client for running datastage. The server is actually a unix server and the batch program should therefore be for UNIX.

A quick question therefore, can't find it on google: How do I write the syntax for grepping a whole line, if the first letter on the line is a 1 ? If it is a 2 I want to skip the line.

Br
/Elin

Posted: Tue Mar 13, 2007 6:24 am
by kumar_s
To fetch the first line alone in unix, use

Code: Select all

 head -1 <file name>
.
For you question, use

Code: Select all

grep "1" <filename>.

Posted: Tue Mar 13, 2007 6:29 am
by trobinson
grep ^1 <filename>

the carat means the beginning of a line
$ means the end of a line.

Posted: Tue Mar 13, 2007 6:32 am
by kumar_s
trobinson wrote:grep ^1 <filename>
Note that, the given regular expression can handle only character basis and not a whole string. If you need to grep two or three character to gether, you can try,
grep ^[1][2][3] <filename>.
where it looks for the line which starts with 123

Posted: Wed Mar 14, 2007 12:22 am
by vpauls
Thank ypui all very much for your answers! I think I have solved the issue for the moment with your help.

Posted: Wed Mar 14, 2007 2:39 am
by kumar_s
Who is ypui? I dont think he/she has contributed something to this post. :wink: