Page 1 of 1

Where oh Where to do the Job....

Posted: Wed Sep 03, 2003 12:14 pm
by JDionne
Ok i have just worked and finished a job that pumps data into a table in my SQL server. Now i need to do some work to the table. I need to derive some info using some SQL code, and i need to maintain Referencial integrity before i promote the info from the stage table to the production table. The question is were do i do that? I was going to put the SQL needed to derive the info in a routine...but were do i use that routine? should i have inserted it in the first job that moved the info into the stage table? and were/how do i do the checks to verify the reference tables are updated?????
any Ideas, or comments would be greatly apreciated.
Regards Jim

Posted: Wed Sep 03, 2003 12:28 pm
by kduke
Jim

Normally the first step is to put foreign keys and tables in hash files. Do lookups to make sure that there are no referencial integrity constraints problems. As you insert or update make sure you have a reject link to capture records which would of failed or did fail because of these and other issues. Normally this is a sequential file. You may want to fix the issue and reprocess these records.

I am not sure what you mean when you say "derive some info using SQL code" but you can put almost any SQL code in a User Defined Query portion of a source stage and push that into any table in the data warehouse. You do not need a routine. Otherwise most of the time the job can derive anything you may need to put in a target table.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Wed Sep 03, 2003 12:38 pm
by JDionne
Ohh How i love user forums!!!!
I will have to go over the hash file stuff again, only touched on it in the class....as for the sql stuff....a user defined stage as part of any stage? Im kinda lost there, what key words should I look under for more info in help?
Jim

quote:Originally posted by kduke
[br]Jim

Normally the first step is to put foreign keys and tables in hash files. Do lookups to make sure that there are no referencial integrity constraints problems. As you insert or update make sure you have a reject link to capture records which would of failed or did fail because of these and other issues. Normally this is a sequential file. You may want to fix the issue and reprocess these records.

I am not sure what you mean when you say "derive some info using SQL code" but you can put almost any SQL code in a User Defined Query portion of a source stage and push that into any table in the data warehouse. You do not need a routine. Otherwise most of the time the job can derive anything you may need to put in a target table.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Wed Sep 03, 2003 2:23 pm
by JDionne
Im looking into using hash files for the referencial checking, but I am having a problem in the transfromation. My world says that for me to find whats missing from the refrence table i need to compare the two files, the newly created hash file of all the values in the data vrs the reference table that may not have all the data, and the output of that would be then added to the reference data. The only way that i know how to do that would be a "NOT IN" statment. I was thinking that I would put something similare in the dirivation coloumn in the transfromation. I can seem to get the logic right though. Am i even in the right place to do this and if so how should i do this in data stage?
Jim

quote:Originally posted by JDionne
[br]
Ohh How i love user forums!!!!
I will have to go over the hash file stuff again, only touched on it in the class....as for the sql stuff....a user defined stage as part of any stage? Im kinda lost there, what key words should I look under for more info in help?
Jim

quote:Originally posted by kduke
[br]Jim

Normally the first step is to put foreign keys and tables in hash files. Do lookups to make sure that there are no referencial integrity constraints problems. As you insert or update make sure you have a reject link to capture records which would of failed or did fail because of these and other issues. Normally this is a sequential file. You may want to fix the issue and reprocess these records.

I am not sure what you mean when you say "derive some info using SQL code" but you can put almost any SQL code in a User Defined Query portion of a source stage and push that into any table in the data warehouse. You do not need a routine. Otherwise most of the time the job can derive anything you may need to put in a target table.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Wed Sep 03, 2003 3:53 pm
by kduke
Jim

To figure out what is new then take the target table and dump just the key fields into a hash file. Next job read source table. Lookup keys in hash file. If any field in the hash is null then it is new. Do an insert on new records. Update all the others.

Who taught your class and where? This should of been covered.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Thu Sep 04, 2003 7:02 am
by JDionne
Looking up the Keys in the hash file is where i am lost. Ill E mail you a zip file with a word doc that contains a screan shot of my transfromation. I think that I am trying to think one way when the tool needs me to think another way. If you get a chance could you look and tell me what my problem is.
Thanks
Jim

quote:Originally posted by kduke
[br]Jim

To figure out what is new then take the target table and dump just the key fields into a hash file. Next job read source table. Lookup keys in hash file. If any field in the hash is null then it is new. Do an insert on new records. Update all the others.

Who taught your class and where? This should of been covered.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Thu Sep 04, 2003 4:50 pm
by ray.wurlod
A Transformer stage has one stream input, and zero or more reference inputs. (You can't draw it any other way.) The reference input connects to a Hashed File stage.
The column metadata for the hashed file identifies one or more columns as being Key.

In the Transformer stage, for each reference input link column defined as Key, you must add a "reference key expression". The Transformer stage evaluates this expression and uses it to request a "get row by key" operation on the reference input link.

If you had an ODBC stage, or some other SQL-using stage type, on the other end of the reference input link, the SQL would have the form SELECT col1, col2, col3 FROM table WHERE keycol = '?';

However, hashed files do not use SQL, they use a faster mechanism called record location via hashing.

If the record from the reference input is not found (the key does not exist), every column on the reference input link is set to NULL by DataStage, and the link variable NOTFOUND is set to 1 (= true). If the record is found, the columns contain the returned values and the link variable NOTFOUND is set to 0 (= false). By either of these means you can determine whether the lookup succeeded.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518