Validating the file name and column names

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rahul8055
Participant
Posts: 95
Joined: Wed Jun 23, 2010 7:41 pm

Validating the file name and column names

Post by rahul8055 »

Hello everyone

I need to read a CSV file as source and need to validate whether the file is in CSV format.
The columns should be validated whether they are in the required order or not.
The column names with should be validated with certain names.

Please suggest a way to make it doable.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Assuming the first line is column names, I guess you could (a) validate that all the column names are valid and (b) establish the expected order of fields in the data records when processing that first line.

The expected number of fields is also established from the first line, but counting the comma delimiters. If this is non-zero you don't have a CSV file (or, perhaps, you have a CSV file with one field - do you need to be able to handle this possibility?).

If your requirement that columns may be in arbitrary order applies within the file, then there is no generic solution. Indeed there is probably no solution at all unless there is some kind of per-row metadata (such as XML tags, for example).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Validating the file name and column names

Post by SURA »

You can design like below also.

Code: Select all

                                                                 MasterFile2ChkCol                         
                                                                       |
                                                                       |
                                         --Col headerAlone-- --------LKUP----REJ-----TFM (Abort after 1 row) 
SeqFile (Read including header) -->TFM|
                                         --Data Alone -------PROCESS AS YOU WISH   

You can use either TFM | Filter for the first one.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
rahul8055
Participant
Posts: 95
Joined: Wed Jun 23, 2010 7:41 pm

Post by rahul8055 »

Thank you Ray and SURA!

The below given solution answers my second question which is checking for the field order and field names.

Is there any way where I can check whether the file is CSV or not in datastage? or is it something to be done in Unix before reading the file
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

You can count the number of separators for each line and determine if it is ok or not.

Eric
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since CSV stands for Comma Separated Values, the first test would be for the presence of commas in the record. Hopefully a count returns the correct number of commas.
-craig

"You can never have too many knives" -- Logan Nine Fingers
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

rahul8055 wrote:Is there any way where I can check whether the file is CSV or not in datastage?
use Reject link in the Sequential file.

Code: Select all

SeqFile----(RejLink)---TFM-(Abort after 1 row)--peek
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Existence of the file is more easily checked using the test command in an Execute Command activity in a sequence.

Validity as CSV can be done in any of the ways we've already suggested, as well as operating system commands (such as grep for any lines that do not contain any commas, piping the output of that through wc -l command and verifying that the count ($CommandOutput) is 0).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rahul8055
Participant
Posts: 95
Joined: Wed Jun 23, 2010 7:41 pm

Post by rahul8055 »

Thank you all very much!

I will use one of the above replies depending on my job design.
Post Reply