Reading Multiple DataSets using File Pattern

Nagin · Post by **Nagin** » Thu Jan 27, 2011 10:55 pm

Hi,
Is there a way to read multiple DataSets with similar pattern in a single stage?

For example, I have multiple datasets with same metadata and same partitioning.

TestData_1.ds
TestData_2.ds
TestData_3.ds

I would want to pick up all these datasets TestData_*.ds like we can do for flat files.

I dont see a pattern option in DataSet stage. Is there any way to achieve this?

Thanks.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 27, 2011 11:39 pm

No. Use separate Data Set stages and run them into a Funnel stage.

Nagin · Post by **Nagin** » Fri Jan 28, 2011 1:04 am

ray.wurlod wrote:No. Use separate Data Set stages and run them into a Funnel stage.

The problem I have is I wouldn't know how many DataSets will be there. They will be generated dynamically. Today I may have 10 DataSets tomorrow it could be 20.

ray.wurlod · Post by **ray.wurlod** » Fri Jan 28, 2011 1:26 am

The answer is still no.

gssr · Post by **gssr** » Fri Jan 28, 2011 2:26 am

Nagin wrote:
ray.wurlod wrote:No. Use separate Data Set stages and run them into a Funnel stage.
The problem I have is I wouldn't know how many DataSets will be there. They will be generated dynamically. Today I may have 10 DataSets tomorrow it could be 20.

Replace the dataset with sequential file as the target stage in the job that creates the Dataset(dynamic)

meet_deb85 · Post by **meet_deb85** » Fri Jan 28, 2011 2:51 am

Well, I faced the same challenge but i could do it in the following way.

You will need to have one common sequence job and a parallel for this

Parallel job
Datastet1 -------------------->>>Dataset2
Parameterize the Dataset name in the Dataset 1 and put any name of your choice in Dataset2

Sequence Job
I am mentioning only the first two stages, the rest I guess you will be able to fogure out
Stage 1 - Execute command, put this command in the stage -
orchadmin truncate #Name of the Dataset used in Dataset2 of the parallel job#

Stage 2- Execute command, put this command in the stage:-
ls #The pattern of your Datasets#

Stage 3- Start Loop
and run the loop as many times as the number of datasets you have figured out in Stage 2

Don't forget to keep append mode in the Dataset-2 of the parallel job.