Page 1 of 1

Read and write from single dataset in job

Posted: Tue Aug 05, 2008 7:11 am
by shaimil
I'm trying to use a single dataset(once as lookup and again as target) in the same job. Can someone please let me know if this is possible as I keep getting the following error.

Operator initialization: A link between two operators should be named with a .v; insert a copy operator to save a persistent copy of the data

Thanks

Posted: Tue Aug 05, 2008 7:59 am
by sanjumsm
In a job two dataset can not have same name. You have to split the job.

Posted: Tue Aug 05, 2008 9:02 am
by shaimil
So is it not possible to implement the equivalent of a hash file r/w in a single job using PX?

Posted: Tue Aug 05, 2008 4:21 pm
by ray.wurlod
It is not. It's what's called a "blocking operation" and would interfere with pipeline parallelism, so is forbidden.

Posted: Tue Aug 05, 2008 4:36 pm
by shaimil
Ray,

I guess the lookup file set doesn't suffer from the same restrictions as that seemed to work, although I'm still testing to establish whether the lookup fileset does the equivalent of a truncate and load rather than an append.

What in yor mind would be a reasonable approach to replicating the Server hash file lookup/write.

Posted: Tue Aug 05, 2008 5:32 pm
by ray.wurlod
There isn't one. Stop thinking like a server job developer. Start envisaging virtual Data Sets, and learn how an index to a virtual Data Set is built "on the fly" on a reference input link to a Lookup stage, except when that link is serviced by a Lookup File Set or when sparse lookup is specified.

Posted: Thu Sep 11, 2014 8:01 am
by olgc
This is not a case for server job or enterprise job. It's problem how enterprise edition can handle it. Such as for surrogate key case: you generate a surrogate key for a new customer. In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?

There are other situations fit in the scenario, such as in web service, a session token from previous request, is used in following request, ...

I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.

Thanks,

Posted: Thu Sep 11, 2014 8:18 am
by chulett
olgc - let's keep your question in one place rather than repeating it when you find a similar topic from the past. This one is much more 'on topic' so I removed your post in the other thread.

Thanks.

Posted: Thu Sep 11, 2014 9:17 am
by qt_ky
olgc wrote:I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.
If you are doing a normal lookup, then it reads all the reference data into memory before processing the main input data.

If you are able to do a sparse lookup, then it does not read any reference data into memory, rather it looks up each record one by one. Sparse lookup should handle your scenarios.

Posted: Thu Sep 11, 2014 12:44 pm
by olgc
Thanks, Eric, for your good suggestion. But only database table look up can be configured with sparse look up as statement below. For look up into sequential file, there is no sparse look up possible. Thanks,
Configuring sparse lookup operations
Data that is read by a database stage can serve as reference data to a Lookup stage. By default, this reference data is loaded into memory like any other reference link. When directly connected as the reference link to a Lookup stage, you can configure the Lookup Type property of the DB2 connector to Sparse and send individual SQL statements to the database for each incoming Lookup row.

Posted: Thu Sep 11, 2014 7:38 pm
by qt_ky
Perhaps you could remove duplicates prior to assigning surrogate keys.

Posted: Fri Sep 12, 2014 7:20 am
by olgc
Even they are the same customer, they are from different service providers, so it's a different account, you can not remove them. Business needs both of them in the system.

Posted: Fri Sep 12, 2014 9:38 am
by qt_ky
I guess I don't understand what you are asking.
olgc wrote:In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?
If both of them are needed in the system, then shouldn't each of them have unique keys generated?

Posted: Mon Sep 15, 2014 4:24 am
by priyadarshikunal
what you can do is to hold the records and not write before you de-duplicate it and then insert only one customer record in the target.