Migration 7.5x2 to 8.0

Raftsman · Post by **Raftsman** » Fri Mar 16, 2007 6:30 am

We migrated to the new version of Datastage and received a few issues that we can't seem to resolve.

1. After the migration, none of our DS datasets that we created in 7.5 could be found or read by 8.0. Is there a migration routine for datasets?

2. Smaller jobs consisting of 5 to 10 stages work fine in 7.5 and 8.0 but jobs that contain more than this and worked in 7.5 abort in 8.0. We received node failure errors. Is there configuration parameters that need modifying? The error messages are very vague and really don't tell us where to start looking. Has anyone experienced this?

Thanks

ray.wurlod · Post by **ray.wurlod** » Fri Mar 16, 2007 1:39 pm

Data Sets don't migrate. Why would you want to? You normally overwrite them each run anyway. There is a post by kali recently on the question of how to migrate Data Sets, but no useful reply. I don't know of a reliable way to move them.

Can't help with your second point, as I still have not played with version 8.0.

DSguru2B · Post by **DSguru2B** » Fri Mar 16, 2007 2:07 pm

Best to get in touch with support as 8.0 is still new in the market and not very many users.

lstsaur · Post by **lstsaur** » Fri Mar 16, 2007 2:57 pm

How much RAM and how many CPUs that you have on the machine? I was told by IBM support during my Hawk Beta 2 testing that large jobs or too many jobs (1,000 or more), you need "at least" 4GB of RAM. Otherwsie, your jobs might get aborted.

ray.wurlod · Post by **ray.wurlod** » Fri Mar 16, 2007 6:31 pm

One of the presenters at IOD 2006 also mentioned that your client machine needs 2GB minimum memory.

How much is needed on the server side depends on how you distribute the various server components. For example, you can have the Application Server, the Domain Server and the database server all on one (heavy duty) machine or on multiple machines. Such is the flexibility of service-oriented architecture.

vmcburney · Post by **vmcburney** » Fri Mar 16, 2007 6:43 pm

Use the new performance monitoring reports and graphs to monitor jobs that fail. Do they fail when they are run individually or when they are run with a lot of other jobs? Compare your current environment and project variables to the values you had pre migration to find out if any important settings have been changed. Check your temp paths are the same.

Agree with Ray on the datasets. If you have persistent datasets that need to be migrated then your design is wrong. Datasets should be treated as temporary tables.

kumar_s · Post by **kumar_s** » Sat Mar 17, 2007 2:58 am

By the way, does it mean the version of Dataset has been changed in Ver 8?

vmcburney · Post by **vmcburney** » Sat Mar 17, 2007 3:15 am

It doesn't need to change. While the metadata repository has changed the parallel engine hasn't. Datasets should be the same. It's the cataloguing of datasets that is causing your problems. If you have upgraded it should still be available through the dataset manager. If it isn't get in touch with Ascential support.

ray.wurlod · Post by **ray.wurlod** » Sat Mar 17, 2007 7:55 am

kumar_s wrote:By the way, does it mean the version of Dataset has been changed in Ver 8?

Someone with version 8.0 might use the Data Set Management tool (under Tools menu in DataStage/QualityStage Designer, since there is no longer a Manager client) and let us know what the Data Set version number is. In any case, they should be upwards compatible.

Raftsman · Post by **Raftsman** » Mon Mar 19, 2007 11:18 am

This is the following error I receive from the job.

Here's a brief overview of what I have attempted to determine the problem.

I took the job that keeps aborting and started removing stages to see if I could narrow down the problem. What is left is, two aggregators stream being joined into one dataset. This aborts. I removed the join and put the stream into their own unique datasets. The job ran fine. I took the datasets, created a new parallel job and joined them together creating a new dataset. This worked fine.

So in summary, my initial job will not work if I join the aggregates in one dataset. I get the following error;

buffer(1),7: Failure during execution of operator logic.
buffer(1),7: Input 0 consumed 0 records.
buffer(1),7: Output 0 produced 0 records.
buffer(1),7: Fatal Error: Cannot find protocol entry for tcp protocol

Can anyone please interpret what the messages means.

Thanks

ray.wurlod · Post by **ray.wurlod** » Mon Mar 19, 2007 7:05 pm

Take a look at the score. This will show you the buffer operators that were inserted to avoid data flow deadlock situations. There are at least two of these (buffer(1) is the second). Based on the virtual Data Sets these are using, you might discern what is happening.

My guess is that the TCP port numbers (by default 10000 and 11000) used by conductor, section leader and player processes to communicate with each other is blocked by your firewall.

Another possibility is that you're in a multi-machine configuration, and that some form of repartitioning is required. This would employ TCP/IP sockets, but for whatever reason, the TCP protocol has not been set up (or has been disabled or blocked). There are environment variables that specify the default port number used by the APT_Communicator class; you can find this in Chapter 6 of the Parallel Job Advanced Developer's Guide

Raftsman · Post by **Raftsman** » Tue Mar 20, 2007 12:20 pm

We do not have a firewall and during the installation, we deferred to the installation defaults. We are running a 8 node Windows servers with plenty of memory. We are opening up tickets with IBM in order to solve this issue.

After reading through the information, we are still unclear on what is causing the error. The message states a TCP protocol error but we think it's more than that. We are have trouble with numerous jobs and not every job has the same error.

We are contemplating moving back to 7.5x2. At least we could move forward.

Raftsman · Post by **Raftsman** » Thu Mar 22, 2007 7:48 am

More information on the issue.

I have been dissecting the job into smaller chunks to help debug the problem.

I have narrowed it down to the following. Within the job there are two joins and 5 aggregate stages. If I remove the final join and created two datasets, the job runs fine. As soon as I put the join back in where its two inputs are aggregate stages and create one dataset, the job aborts on TCP protocol issues.

Please remember, this job ran fine in 7.5x2. There must be some config setting we have overlooked.

Does anyone have any feedback for this

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Mar 22, 2007 7:15 pm

7.5x2 was a whole heap of compromises held together by duct tape. Fortunately, duct tape is a very versatile substance.

Version 8.0 is far more likely to be rigorous about such things as correct partitioning and sorting of input links when required. Try inserting Copy stages on the links between the Aggregator stages and the Join stage. You may even benefit from Sort stages set to "don't sort (already sorted)".

Raftsman · Post by **Raftsman** » Wed Mar 28, 2007 2:19 pm

Hi all,

Here's an update on my current situation. The jobs that worked in 7.5x2 and not in 8.0 has been defined as a partitioning error. The job will work using 1 or 2 nodes. Sometimes they work with 3. As soon as we change the configuration to 4 or more, the jobs abort. The issue is in IBM tech supports hands. Looks like a bug in version 8.0.

I will let you know when this issue gets resolved.

Thanks