Header & Footer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

Jwiles,

Thanks for your detailed explanation. As i am new to Parallel job, once i see that email, i just confused and i was not able to reply back to them like this will not hit performance. I will bring all the points mentioned here to their notice. I am standing like 'CAT ON THE WALL'.
:(
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

The below routine is being used to add the footer.

Code: Select all

$INCLUDE DSINCLUDE jobcontrol.h

StageName = FIELD(StageLinkName,",","1")
LinkName = FIELD(StageLinkName,",","2")
FileName = FIELD(StageLinkName,",","3")
Row_Count_Plus = DSGetLinkInfo(DSJ.ME,StageName,LinkName,DSJ.LINKROWCOUNT)+2
* Get the number of rows of the current Job and current Active Stage using the Link Name passed in.
ROW_COUNT = "UTRL":" ":Fmt(Row_Count_Plus,"10'0'R")

* Uncomment line below for debugging
*CALL DSLogInfo("FileName = ":FileName:" ROW_COUNT = ":ROW_COUNT,"JobControl") 

* Open sequential file.
OPENSEQ FileName TO FILE ELSE CALL DSLogFatal("Abort: Cannot Create/Open file","JobControl")

*Seek to the end of the file.
SEEK FILE, 0, 2 ELSE CALL DSLogFatal("Abort: Cannot SEEK to end of file","JobControl")

WRITESEQ ROW_COUNT TO FILE ELSE CALL DSLogFatal("Abort: Cannot write file","JobControl")

CLOSESEQ FILE

ErrorCode = 0      ;* set this to non-zero to stop the stage/job
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

That routine will work fine. I assume it's being called as an AfterJob routine. The only concern I would have with using it is that whoever maintains the job knows that if they change the link names within the job they will also need to ensure that the correct link names are still passed to the routine. Shouldn't be an issue, but you may wish to add an annotation to the job design.

There are certainly several ways to meet your requirements, as you can see. I would suggest that you create a second job that uses the aggregator as was suggested and compare overall performance to the method you've been requested to use. They may be close or they may be quite different, but you won't know for certain until you can compare them and you will also expand your knowledge in parallel job design.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

I will compare the performance and come back...meanwhile small change in the job design...

i have to run one more query in the job.

Existing Structure:

Code: Select all

rowgenerator --> transformer --> column export --------------- 
                                                                                         |
ODBC --> transofrmer --> column export -----------------> funnel ----file
                    |                                                                    |
                    |                                                                    | 
                    v                                                                   v 
                 aggregator --> transformer --> column export  ---
I have to run one more sql query and need to concatenate both the result.
I can use UNION in a single query. But the problem is....
as we pull the historical records if i put 'union' between two query, query is taking long time. Note, the structure of the queries are same.

so I have planned to add the below setup once again in the existing setup.

ODBC --> transofrmer --> column export --> funnel

I hope It should work fine. But the problem is getting row count for the FOOTER ???

can you please help me to resolve this issue?
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Assuming the outputs of the two queries are the same layout--they should be based on your description:

Use a funnel to combine (union) the two query outputs before entering the transformer.

Where your job looks like:

Code: Select all

ODBC --> transformer --> column export 
it will look like this:

Code: Select all

ODBC --> funnel --> transformer --> column export
         /
ODBC ---/
The rest of the job can remain as is.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

I thought FUNNEL stage won't get data directly from ODBC stage. i.e we need atleast one transformer...bad thought:(

I will try and get back to you...
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

is there any other way to do fast load? I mean other than ODBC stage.

I saw some posts about RedBrick Stage..as i am not familiar with that stage, I have a thought like ' I may use RedBrick stage to do fast load'.

Have to read about that stage...but is there any other stage available to do fast load?

I am using sybase.
Karthik
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

oops...footer part is not working properly...

(1621260 rows affected)

But..in the file i see

UTRL|3.62126

i don't understand this...
Karthik
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

For the count value: Define the output count column from the Aggregator as a BigInt. It's probably a double-precision floating point right now (the default for Aggregator).

For the rows affected message: Since you didn't include any other message, we can't determine what the actual problem is. There should be more messages in the log file from that operator.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Redbrick is another database product, as is Oracle, Teradata, Neteeza, SQL Server and so on.

For Sybase, the Sybase Enterprise stage (only available for parallel jobs) may help increase performance. It requires that the Sybase open client software be installed on the Datastage server.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply