Script To A Server Job

vzmz · Post by **vzmz** » Mon Jan 19, 2004 3:09 pm

Hi friends
I have a BTEQ script here (something like an PL SQL ) for Teradata.
All that i have to do is convert this script into a server Job.
If u see thier are 3 tables user and the forth table where i will do the final insert. How can i do this ? like what all stages i need and which stage should be associted to which stage.
Thanks In Advance

Code: Select all

INSERT INTO CBSS_ETL.ACCOUNT
SELECT CAST(ACT_SVC_ORD_EFFV_DT AS TIMESTAMP)
       ,ACT_CUST_ACCT_NO        
       ,ACT_BILL_PHONE_NBR      
       ,ACT_CUSTOMER_NAME       
       ,ACT_S_FULLCITY	        
       ,ACT_S_STATE_CODE        
       ,ACT_S_ZIP_CD_PLUS_4     
       ,SUBSTR((CASE
       WHEN B.LOC_ID IS NULL
       THEN ACT_B_FULLCITY
       ELSE B.B_CITY
       END),1,13) AS ACT_B_FULLCITY 	        
       ,(CASE
       WHEN B.LOC_ID IS NULL
       THEN ACT_B_STATE_CODE
       ELSE B_STATE
       END) AS ACT_B_STATE     
       ,(CASE
       WHEN B.LOC_ID IS NULL
       THEN ACT_B_ZIP_CD_PLUS_4
       ELSE B_ZIP
       END) AS ACT_B_ZIP_CD_PLUS_4   
       ,CAST(ACT_OLD_CRB_MSD AS TIMESTAMP)
       ,ACT_TYPE_OF_ACCOUNT     
       ,ACT_GRD_OF_SERVICE     
       ,ACT_SLS_AREA_CD         
       ,ACT_SIC_CODE            
       ,ACT_CRED_RISK_CD        
       ,ACT_NON_PUB_LIST_CD     
       ,ACT_AMT_SIGN            
       ,ACT_CUR_BILL_CHG         
       ,CAST(ACT_EXTRACT_DATE AS TIMESTAMP)
       ,ACT_SERVICE_ADDR        
       ,SUBSTR((CASE
       WHEN B.LOC_ID IS NULL
       THEN ACT_BILLING_ADDR
       ELSE B.B_ADDR
       END),1,65) AS ACT_BILLING_ADDR  
       ,LEGACY_SYSTEM_ID 
       ,CAST(CURRENT_TIMESTAMP AS TIMESTAMP)
       ,COALESCE(LOSS_TYPE,' ') LOSS_TYPE             
FROM   CBSS_ETL.WB_ACCOUNT_FEEDER A
       LEFT OUTER JOIN CBSS_ETL.WB_LOCATION_LOOKUP B
ON     A.ACT_CUST_ACCT_NO = B.LEGCAN
WHERE  A.ACT_CUST_ACCT_NO NOT IN (SELECT DISTINCT ACT_CUST_ACCT_NO FROM CBSS_ETL.ACCOUNT);

ray.wurlod · Post by **ray.wurlod** » Mon Jan 19, 2004 3:26 pm

Code: Select all

Extract(Teradata) --->  Transformer  --->  Load(Teradata)

Use user-defined SQL in the Extract stage to extract from the three joined tables. This is MUCH easier - especially since you already have the SQL - than loading hashed files from the same database instance.

The Load stage can use a generated SQL statement to perform the INSERT operations.

vzmz · Post by **vzmz** » Mon Jan 19, 2004 4:22 pm

ray.wurlod wrote:
Code: Select all
Extract(Teradata) --->  Transformer  --->  Load(Teradata)
Use user-defined SQL in the Extract stage to extract from the three joined tables. This is MUCH easier - especially since you already have the SQL - than loading hashed files from the same database instance.

The Load stage can use a generated SQL statement to perform the INSERT operations.

I Tried doing that but when i ran the process using exactly what u said the process stop saying process abnormally aborted. So i was wondering may be because of the complex query. Is thier a possibility that i can break down the query in bits and pieces.

kcbland · Post by **kcbland** » Mon Jan 19, 2004 4:43 pm

Write a job that extracts table A to a sequential file using the WHERE clause

Code: Select all

A.ACT_CUST_ACCT_NO NOT IN (SELECT DISTINCT ACT_CUST_ACCT_NO FROM CBSS_ETL.ACCOUNT);

so that you only get the rows you need to process.

At the same time, have a job extracting table B to a hash file.

Lastly, have a third job read the sequential file from the first job, reference the hash file from the second job, and produce your load file. During transformation mappings, replace current CASE SQL with IF-THEN-ELSE derivations. I'm not sure what COALESCE does, but you can figure out the equivalent BASIC logic.

vzmz · Post by **vzmz** » Mon Jan 19, 2004 5:34 pm

kcbland wrote:Write a job that extracts table A to a sequential file using the WHERE clause
Code: Select all
A.ACT_CUST_ACCT_NO NOT IN (SELECT DISTINCT ACT_CUST_ACCT_NO FROM CBSS_ETL.ACCOUNT);
so that you only get the rows you need to process.

At the same time, have a job extracting table B to a hash file.

Lastly, have a third job read the sequential file from the first job, reference the hash file from the second job, and produce your load file. During transformation mappings, replace current CASE SQL with IF-THEN-ELSE derivations. I'm not sure what COALESCE does, but you can figure out the equivalent BASIC logic.

Great
I doing it the way you told me, but i am stuck here.
In my transormation i drag and drop all the fields to be insertered. Now for the case statement
I check for the that field
eg...
if IsNull(DSLink.field) then "how to enter here the refrence values from the hash file" Else DSLink.field
Thanks in advance

kcbland · Post by **kcbland** » Mon Jan 19, 2004 6:18 pm

Let's do some simple link naming to make things easy. Your job looks like this:

Code: Select all

         hash
           |
           |
           V
seq --->  xfm --->  seq

If your link into the xfm is called "SOURCE" and your reference link from the hash is called "REFERENCE", then construct If-Then-Else statements like this:

Code: Select all

If SOURCE.columnA > 50 Then SOURCE.columnA Else REFERENCE.columnX

If NOT(ISNULL(REFERENCE.columnX)) Then REFERENCE.columnX Else SOURCE.columnA 

If REFERENCE.columnX <> SOURCE.columnA Then REFERENCE.columnX  Else SOURCE.columnA

Columns for use are available by using your right-click mouse button and using columns. It displays all available columns for derivations. Another method is to type the link name (REFERENCE.) and as soon as you press the dot "." it pulls out a drop down list of columns.

vzmz · Post by **vzmz** » Mon Jan 19, 2004 7:53 pm

kcbland wrote:Let's do some simple link naming to make things easy. Your job looks like this:

hash
|
|
V
seq ---> xfm ---> seq

If your link into the xfm is called "SOURCE" and your reference link from the hash is called "REFERENCE", then construct If-Then-Else statements like this:
Code: Select all
If SOURCE.columnA > 50 Then SOURCE.columnA Else REFERENCE.columnX

If NOT(ISNULL(REFERENCE.columnX)) Then REFERENCE.columnX Else SOURCE.columnA 

If REFERENCE.columnX <> SOURCE.columnA Then REFERENCE.columnX  Else SOURCE.columnA 
Columns for use are available by using your right-click mouse button and using columns. It displays all available columns for derivations. Another method is to type the link name (REFERENCE.) and as soon as you press the dot "." it pulls out a drop down list of columns.

I am sure that the Hash link goes into xfm instead of seq stage, if am right please acknowledge
Thanks

kcbland · Post by **kcbland** » Mon Jan 19, 2004 8:05 pm

Yeah, yeah, yeah, my wife called me to dinner and I forgot to touch it up. It's done now.

vzmz · Post by **vzmz** » Tue Jan 20, 2004 7:16 am

kcbland wrote:Yeah, yeah, yeah, my wife called me to dinner and I forgot to touch it up. It's done now.

Is this doable to avoid seq files intermediate steps
teradata Refrence table
|
|
V
Hash File

|
|
V
TeraData---> xfm ---> TeraData
*note the upper link is shifted so assume the upper link going into xfm
Thanks

kcbland · Post by **kcbland** » Tue Jan 20, 2004 8:19 am

Well, if you've read anything I've ever posted about audit trail, job instantiation capability, modularity, you would avoid such an All-In-One design. Your suggested design cannot be instantiated as it sits, because lookups being prepared in the job cannot be there.

In addition, if you broke out this part of your job:

Code: Select all

Teradata --> hash

it cannot be instantiated without moving the hash file creation/clearing into a predecessor job. Then, you could range/hash/mod select the source data into the same hash file concurrently, except you're going to see a bottleneck around the hash file as N instances are all piling into the hash file. It's going to be doubling and resizing so often that you'll impede the flow of data out of a database.

The best design is to spool at the speed of the database into sequential files. Then, you concatenate them together and in one job run it into the hash file. You'll see that a

Code: Select all

seq --> xfm --> hash

job screams, and will optimally resize when necessary.

If you're using Teradata it's because you have high volumes. These are techniques to utilize as many cpus as possible thru divide-n-conquer processing.

chulett · Post by **chulett** » Tue Jan 20, 2004 8:20 am

vzmz wrote:*note the upper link is shifted so assume the upper link going into xfm

Use 'Code' and 'Preview' to line things up properly:

Code: Select all

    teradata Reference table 
              | 
              V 
         Hash File 
              | 
              V 
TeraData---> xfm ---> TeraData

Yes, this is 'doable' but the intermediate sequential files aren't really something to be 'avoided'. This way you lose things like restartability / recovery in the event of a problem and (generally speaking) modularity. I'll let Ken give you the official lecture (and probably some links to other posts) as he's got it down pat.

kcbland · Post by **kcbland** » Tue Jan 20, 2004 8:55 am

chulett wrote:I'll let Ken give you the official lecture (and probably some links to other posts) as he's got it down pat.

Tends to repetitious, doesn't it? Break down your jobs, instantiate the subcomponents, coalesce the load data, bulkload.