Page 1 of 1

Sequential(CSV) file structure validation?

Posted: Thu Jun 20, 2019 1:33 pm
by jackson.eyton
Hi,
I have a potential issue I am hoping to circumvent before it happens (matter of time). We have a few external sources that we have to bring in from sequential files. The issue is that a few of these have to be manually generated by that department. It is only a matter of time before someone incorrectly generates one of these files in that the structure is different, therefore the file definition that we have in the datastage job doesn't match. In our setup, unfortunately, if that were to happen, our staging process would stop and we would need to manually review and fix the issue to continue nightly processing. Also unfortunately, datastage's Validate, doesn't check if the incoming source file itself will work as its defined. I was hoping someone had some suggestions on this.

Thanks,
Jackson

Posted: Thu Jun 20, 2019 6:08 pm
by chulett
So, you're looking for something to validate the file structure so you can skip loading any that could jam up your staging process? I had to build something several years ago where the sequential files we had issues with were csv files from a spreadsheet source that people would sometimes... "adjust". That or mess up the formatting or add a ton of blank lines at the end from improper deletions. From what I recall, we read each record in as a single string and then did some simple validations:

Validate the delimiter count
Validate the header column names are there in order
Validate data types / conversion to type
Validate any columns with "valid value" lists

Caught most of the issues from what I recall but not all. I could pull the code out of archives and see what (if anything else) we did if that would help.

I wonder if there is any kind of third party validation utility that could be leveraged? Did a quick search for "csv file format checker" and seems like that may be an option for you.

Posted: Thu Jun 27, 2019 5:12 pm
by jackson.eyton
Looks like CSV Lint might work. I was considering writing my own, may still do that but I will research some other tools that already exist first.

https://csvlint.io/

Posted: Fri Jun 28, 2019 9:03 am
by chulett
8)

Posted: Sun Jun 30, 2019 3:59 pm
by kumar_s
I have done the Craig's suggestion in my past. Especially the top 2.
Compare the header of the incoming file against the predefined list. And.. thats it.
Assumption is that, header would promptly represent the all Data column and its values.

Posted: Wed Jul 03, 2019 10:21 pm
by ray.wurlod
Sending threatening messages to the providers works wonders.