Validation of a flat file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Validation of a flat file

Post by mmkhan »

Hi all,
I have a flat file of fix lenth. I have to do couple of valiation and insert that into teradata or again flat file. Basically i have to do some clenansing work on the data.
1. If the record lenth should be 63 or 65 others are rejected
2. Check for not null
3. No duplicate records(Applying the primaray key constrain)

What i am doing

1.Imported the file defination in the repositry accoring to the given schema.
2. associted the above file defination to a sequential file(stage) and in its properties i define which fleids are nullable and which are not and the one which are primary keys

The result i am getting
I am getting records which are of size 65.
Its not applying any define constrain on the colums
Ba

What i need
I need records of size 65 and 63 and i should be able to append 2 blanks to the 63 record size which will be good records
I need to move good records to good file and bad or rejected records to bad file
A good record should be checked for not null and should follow primary key constain to get in good file.

Can any one suggest me how i can achieve it.
Thanks in advance
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Look in your documentation on the sequential stage. You can define in the metadata for the columns the action to take if a column is blank. Open your sequential stage, look at the columns metadata data and scroll to the right. You'll see the column for what action to take if the data is missing. You'll want to define it to pad spaces or whatever.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Tnanks Kenneth

Post by mmkhan »

Kenneth i need a suggestion from you. What if i load the sequentail file into UniVers table (DATASTAGE DBMS). That way i will have the querying capabilities on the data and then i can put the clean data into tera data

One more thing Kenneth
Do i have to go through ODBC only to connect to TERADATA or can we install a TERADATA driver for DATASTAGE , that way we won't we hindering the speed
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you are a beginner to DataStage, and/or new to ETL and data warehousing, then let me first recommend a couple of books. The most import is Ralph Kimball's Data Warehouse Lifecycle Toolkit. The second book for Oracle people is Tom Kyte's Expert one-on-one Oracle. But Tom has very profound commentaries on data warehousing that apply to all database and ETL technologies.

The most import chapter from the Toolkit, IMO, cover ETL guidelines. IMO, data staging is paramount in an ETL application. You will want to prepare load-ready files of pure inserts and update dates. Anytime you do this, you have a restart point, audit capability, go/no-go point, and reload vs. reprocessing capability.

Specifically responding to your question, you should use DataStage for ETL. Putting into a Universe table simply to query is not IMO beneficial. You would get the same overall performance by simply scanning the sequential source data and using a transformer to filter/constrain your data. In addition, you can achieve parallel performance improvements through job instantiation to utilize more cpu's. Anytime you're waiting on loading a table, then querying it back, you're doubling the work you have to do, when scanning the file the first time was sufficient.

As for Teradata, call Ascential Support and find out for your product version and release of Teradata what you're options are.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Post by mmkhan »

Thank You so much Kenneth
I will look into those books you suggested. Mean while as u said i will try to do my cleanings process on the flat file itself and then load into teradata using some multi load process.
Post Reply