Fatal Error: APT_SYSselect returned error status -1

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Fatal Error: APT_SYSselect returned error status -1

Post by bcarlson »

I am getting an error that I have not seen before:

<funnel,0> Fatal Error: APT_SYSselect returned error status -1 and no inputs reached EOF.

The exact same code runs on our production environment and does not have any issues. The job is very simple:

Code: Select all

dataset1 -> hash on acctnum
                            > funnel -> db2write
dataset2 -> hash on acctnum
Can anyone give any insight as to what is going on?

Thanks!

Brad.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

A little more log info:
##F TFIO 000153 18:58:22(003) <input repartition(1),0> Fatal Error: Unable to allocate communication resources
##E TFPM 000000 18:58:22(001) <node_edwdev2> operator [{natural="/u001/DataStage_work/at/at_dmnd_dep_tran_final.ds", synthetic="inpu
t repartition(1)"}], partition 0 of 8, processID 8,212,494 on edwdev2, player 2 terminated unexpectedly.
##E TFPM 000338 18:58:22(001) <main_program> Unexpected exit status 1
##E TOFN 000001 18:58:22(000) <funnel,2> Failure during execution of operator logic.
##I TOFN 000163 18:58:22(001) <funnel,2> Input 0 consumed 63525 records.
##I TOFN 000163 18:58:22(002) <funnel,2> Input 1 consumed 86123 records.
##I TOFN 000094 18:58:22(003) <funnel,2> Output 0 produced 149648 records.
##F TFOR 000151 18:58:22(004) <funnel,2> Fatal Error: APT_SYSselect returned error status -1 and no inputs reached EOF.
Also, one of the input datasets are pretty large. DS1 is 48 million records, DS2 is about 500,000. However, I ran against a much smaller set and got the same error (about 3.9 million and 0 recs). This is a generic process and deals with any size input dataset and writes to a parameterized DB2 table via db2write.

Brad.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This may not be related, but why hash on acctnum only to repartition using DB2 after the Funnel stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Hmmm, good point. The datasets that it is using are already partitioned, so we really don't need the hash in front of each. Let me try it without the hash and see what happens.

On one hand, hopefully it will work and I can move on. On the other hand, then we still don't really know why it failed in the first place.

Still pondering...

Brad.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Okay, I updated the program to not re-hash the datasets going into the funnel stage and that eliminated the issue. However, the fact that it works does not give me warm fuzzy feelings when I don't know or understand why it failed to begin with.

The exact same code works in our production environment just fine. Why does it fail in dev? I am guesing that it is something environmental (not necessarily code related). I have tried the process with varying input sizes and come to the conclusion that input record count does not matter - it fails with large and small volumes.

Does anyone have some suggestions about what might be causing the error? This is a generic process used by dozens of production jobs. I am loathe to update a production process without a true understanding of why this failure is occuring - especially when the production process is running just fine.

Any help would be greatly appreciated!

Brad.

ps. I am NOT flagging this with a workaround. I don't know about the rest of you, but I tend to ignore entries that are resolved or marked as workarounds. :wink:
uegodawa
Participant
Posts: 71
Joined: Thu Apr 27, 2006 12:46 pm

Post by uegodawa »

just for curiosity
1. Are you running same server version on both Production and
Development Environments ?
2. Are you running same job with same node configuration on both systems ?
3. Environment Variables under Parallel branch are exactly same on both systems ?
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

DataStage version is the same on dev and prod, as are environment variables/settings. The node configuration is different, but only in terms of the number of nodes. The way the nodes are configured is the same.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Continuous, sort/merge or sequential Funnel? I suspect the second, and that the process that watches all inputs simultaneously to figure out which is next to preserve sorted order has taken some kind of abort. It's not totally clear why - but that's the line of investigation I'd be following, even unto re-instating the original partitioning to see whether it's reproducible. Maybe production has the "preserve partitioning" flag set to Clear but the development has "Set" or "Propagate"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Just a quick update. Turns out it was probably an IBM DataStage patch that caused our problems.

On development, this patch was installed in May. However, the job we are working with is run very rarely in dev so we never ran into the error until very recently. We therefore did not put together the connection between the patch and our error.

On the other hand, this patch was just installed into our production environment and lo and behold the same job failed instantly with the same error message.

The patch has been backed out and we are now awaiting a fix from IBM to resolve the issue.

When I get the patch identification information and the related fix, I will post as much info as I can.

Until then...

Brad.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

There were 2 patches applied at the same time: 102853 & 117136. We are not sure which one causes the error. Again, we are awaiting a fix from IBM.

Brad.
shin0066
Premium Member
Premium Member
Posts: 69
Joined: Tue Jun 12, 2007 8:42 am

Post by shin0066 »

Hi bcarlson,

we are also having the same issue and our environment also installed with same patches you mentioned. As you mentioned that in topic that you have a workaround... could you please specify what is that workaround until we get a fix to the patches.

Appreciate
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The workaround was not to partition the inputs to the Funnel stage (but, perhaps, to do it upstream of there).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply