Page 1 of 1

Validating the file name and column names

Posted: Mon Aug 19, 2013 2:58 pm
by rahul8055
Hello everyone

I need to read a CSV file as source and need to validate whether the file is in CSV format.
The columns should be validated whether they are in the required order or not.
The column names with should be validated with certain names.

Please suggest a way to make it doable.

Thanks

Posted: Mon Aug 19, 2013 4:57 pm
by ray.wurlod
Assuming the first line is column names, I guess you could (a) validate that all the column names are valid and (b) establish the expected order of fields in the data records when processing that first line.

The expected number of fields is also established from the first line, but counting the comma delimiters. If this is non-zero you don't have a CSV file (or, perhaps, you have a CSV file with one field - do you need to be able to handle this possibility?).

If your requirement that columns may be in arbitrary order applies within the file, then there is no generic solution. Indeed there is probably no solution at all unless there is some kind of per-row metadata (such as XML tags, for example).

Re: Validating the file name and column names

Posted: Mon Aug 19, 2013 5:42 pm
by SURA
You can design like below also.

Code: Select all

                                                                 MasterFile2ChkCol                         
                                                                       |
                                                                       |
                                         --Col headerAlone-- --------LKUP----REJ-----TFM (Abort after 1 row) 
SeqFile (Read including header) -->TFM|
                                         --Data Alone -------PROCESS AS YOU WISH   

You can use either TFM | Filter for the first one.

Posted: Tue Aug 20, 2013 9:30 am
by rahul8055
Thank you Ray and SURA!

The below given solution answers my second question which is checking for the field order and field names.

Is there any way where I can check whether the file is CSV or not in datastage? or is it something to be done in Unix before reading the file

Posted: Tue Aug 20, 2013 9:54 am
by eph
Hi,

You can count the number of separators for each line and determine if it is ok or not.

Eric

Posted: Tue Aug 20, 2013 9:55 am
by chulett
Since CSV stands for Comma Separated Values, the first test would be for the presence of commas in the record. Hopefully a count returns the correct number of commas.

Posted: Tue Aug 20, 2013 5:51 pm
by SURA
rahul8055 wrote:Is there any way where I can check whether the file is CSV or not in datastage?
use Reject link in the Sequential file.

Code: Select all

SeqFile----(RejLink)---TFM-(Abort after 1 row)--peek

Posted: Tue Aug 20, 2013 7:36 pm
by ray.wurlod
Existence of the file is more easily checked using the test command in an Execute Command activity in a sequence.

Validity as CSV can be done in any of the ways we've already suggested, as well as operating system commands (such as grep for any lines that do not contain any commas, piping the output of that through wc -l command and verifying that the count ($CommandOutput) is 0).

Posted: Wed Aug 21, 2013 1:56 pm
by rahul8055
Thank you all very much!

I will use one of the above replies depending on my job design.