Header & Footer

karthi_gana · Post by **karthi_gana** » Wed Feb 02, 2011 6:59 am

Jwiles,

Thanks for your detailed explanation. As i am new to Parallel job, once i see that email, i just confused and i was not able to reply back to them like this will not hit performance. I will bring all the points mentioned here to their notice. I am standing like 'CAT ON THE WALL'.

karthi_gana · Post by **karthi_gana** » Wed Feb 02, 2011 7:02 am

The below routine is being used to add the footer.

Code: Select all

$INCLUDE DSINCLUDE jobcontrol.h

StageName = FIELD(StageLinkName,",","1")
LinkName = FIELD(StageLinkName,",","2")
FileName = FIELD(StageLinkName,",","3")
Row_Count_Plus = DSGetLinkInfo(DSJ.ME,StageName,LinkName,DSJ.LINKROWCOUNT)+2
* Get the number of rows of the current Job and current Active Stage using the Link Name passed in.
ROW_COUNT = "UTRL":" ":Fmt(Row_Count_Plus,"10'0'R")

* Uncomment line below for debugging
*CALL DSLogInfo("FileName = ":FileName:" ROW_COUNT = ":ROW_COUNT,"JobControl") 

* Open sequential file.
OPENSEQ FileName TO FILE ELSE CALL DSLogFatal("Abort: Cannot Create/Open file","JobControl")

*Seek to the end of the file.
SEEK FILE, 0, 2 ELSE CALL DSLogFatal("Abort: Cannot SEEK to end of file","JobControl")

WRITESEQ ROW_COUNT TO FILE ELSE CALL DSLogFatal("Abort: Cannot write file","JobControl")

CLOSESEQ FILE

ErrorCode = 0      ;* set this to non-zero to stop the stage/job

jwiles · Post by **jwiles** » Wed Feb 02, 2011 10:00 am

That routine will work fine. I assume it's being called as an AfterJob routine. The only concern I would have with using it is that whoever maintains the job knows that if they change the link names within the job they will also need to ensure that the correct link names are still passed to the routine. Shouldn't be an issue, but you may wish to add an annotation to the job design.

There are certainly several ways to meet your requirements, as you can see. I would suggest that you create a second job that uses the aggregator as was suggested and compare overall performance to the method you've been requested to use. They may be close or they may be quite different, but you won't know for certain until you can compare them and you will also expand your knowledge in parallel job design.

Regards,

karthi_gana · Post by **karthi_gana** » Thu Feb 03, 2011 7:22 am

I will compare the performance and come back...meanwhile small change in the job design...

i have to run one more query in the job.

Existing Structure:

Code: Select all

rowgenerator --> transformer --> column export --------------- 
                                                                                         |
ODBC --> transofrmer --> column export -----------------> funnel ----file
                    |                                                                    |
                    |                                                                    | 
                    v                                                                   v 
                 aggregator --> transformer --> column export  ---

I have to run one more sql query and need to concatenate both the result.
I can use UNION in a single query. But the problem is....
as we pull the historical records if i put 'union' between two query, query is taking long time. Note, the structure of the queries are same.

so I have planned to add the below setup once again in the existing setup.

ODBC --> transofrmer --> column export --> funnel

I hope It should work fine. But the problem is getting row count for the FOOTER ???

can you please help me to resolve this issue?

jwiles · Post by **jwiles** » Thu Feb 03, 2011 8:12 am

Assuming the outputs of the two queries are the same layout--they should be based on your description:

Use a funnel to combine (union) the two query outputs before entering the transformer.

Where your job looks like:

Code: Select all

ODBC --> transformer --> column export

it will look like this:

Code: Select all

ODBC --> funnel --> transformer --> column export
         /
ODBC ---/

The rest of the job can remain as is.

Regards,

karthi_gana · Post by **karthi_gana** » Thu Feb 03, 2011 8:57 am

I thought FUNNEL stage won't get data directly from ODBC stage. i.e we need atleast one transformer...bad thought:(

I will try and get back to you...

karthi_gana · Post by **karthi_gana** » Thu Feb 03, 2011 9:01 am

is there any other way to do fast load? I mean other than ODBC stage.

I saw some posts about RedBrick Stage..as i am not familiar with that stage, I have a thought like ' I may use RedBrick stage to do fast load'.

Have to read about that stage...but is there any other stage available to do fast load?

I am using sybase.

karthi_gana · Post by **karthi_gana** » Thu Feb 03, 2011 9:04 am

oops...footer part is not working properly...

(1621260 rows affected)

But..in the file i see

UTRL|3.62126

i don't understand this...

jwiles · Post by **jwiles** » Thu Feb 03, 2011 10:16 am

For the count value: Define the output count column from the Aggregator as a BigInt. It's probably a double-precision floating point right now (the default for Aggregator).

For the rows affected message: Since you didn't include any other message, we can't determine what the actual problem is. There should be more messages in the log file from that operator.

Regards,

jwiles · Post by **jwiles** » Thu Feb 03, 2011 10:30 am

Redbrick is another database product, as is Oracle, Teradata, Neteeza, SQL Server and so on.

For Sybase, the Sybase Enterprise stage (only available for parallel jobs) may help increase performance. It requires that the Sybase open client software be installed on the Datastage server.

Regards,