Extract and Load many tables - Short and fast

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
martinkirouac
Participant
Posts: 9
Joined: Wed Nov 14, 2007 11:00 am
Location: Montreal

Extract and Load many tables - Short and fast

Post by martinkirouac »

Hi everyone. Hope you are doing well.

I am on a migration project at the moment. I have to ETL legacy apps information for the new applications tables but I also have to move some legacy apps tables with no modification for backward compatibility.

There is not that much tables but I was wondering if there is a better way to do this. Right now, I have 3 simple jobs moving data from DB A to DB B. Theses jobs contain like 10 times ODBC stage (in) -> ODBC stage (out) with runtime column propagation.

It works but even with parameter sets, it's boring and ugly. Using SSIS, I know I can create a loop (while read a file for table names) and automatically pass table names to db stages. It is also easy to create sql scripts to do that kind of stuff... but it a perfect world when using a centralized ETL tool, you'd like to use that centralized tool to control and have a centralized view of data transfers as well.

What would you do in a situation like that ? Any tricks ? What do you think ?

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can put the loop in a sequence, perhaps driving the loop from a query against the source system's system tables to derive a list of table names to be transferred.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
martinkirouac
Participant
Posts: 9
Joined: Wed Nov 14, 2007 11:00 am
Location: Montreal

Post by martinkirouac »

Hi Ray! Indeed, that's exactly what I use to do with SSIS but I am having difficulties to pass that list to the StartLoop stage in DataStage. I normally pass values using $userStatus but I can't find the way to pass a list correctly and in a clean way.

I am sure there is a neat way to do this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code: Select all

Execute Command  --->  User Variables  --->  Start Loop  <-------  End Loop
                                                  |                  ^
                                                  |                  |
                                                  +----->  Job  -----+
In the Execute Command activity list the file using cat or type command.

In the User Variables activity convert any @FM (line terminators) in the command output into some useful delimiter characters that does not occur in the data. Also remove any empty final line (for example by taking all but the final character from the string).

Use this delimiter in the Start Loop activity and pass the user variable as the "list of things".

Pass the Counter activity variable from the Start Loop activity to the Job activity.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
martinkirouac
Participant
Posts: 9
Joined: Wed Nov 14, 2007 11:00 am
Location: Montreal

Post by martinkirouac »

ray.wurlod wrote:

Code: Select all

Execute Command  --->  User Variables  --->  Start Loop  <-------  End Loop
                                                  |                  ^
                                     ...[/quote]

I assume you mean Execute Command for the SQL query and get the list then filled in a user variable to use with the loop. That's what I've done now.
I still have that annoying Controller problem: Error calling DSSetParam(currentLoopValue), code=-4 [ParamValue/Limitvalue is not appropriate]. But I feel I am getting near the answer.

I am debugging... I'll keep you posted.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

An exact search for "ParamValue/Limitvalue is not appropriate" will tell you what is causing that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
martinkirouac
Participant
Posts: 9
Joined: Wed Nov 14, 2007 11:00 am
Location: Montreal

Post by martinkirouac »

I am putting as much detail as I can in case someone can use this eventually... Sorry for my English, it is not my first language. Feel free to contact me if you need more details. Anyways...

I had to parse the string to get rid of the code=-4 [ParamValue/Limitvalue is not appropriate] error. There was some garbage characters in the query result from the temp table containing tables names. I did that using
field(convert(@FM,',', Execute_Command_0.$CommandOutput),',',1)

But instead to achieve the purpose of that job, I did use a sequential file removing all delimiters except end of record and use a comma. That created a file with a single row delimited by comma. Then I used the executeCommand to do a cat C:\myfile.txt and shipped it directly (no bad character parsing needed) to the StartLoop operator. Then I used the $counter variable to fill a job parameter of the job in that loop.

Everything works fine... well, nearly :-). Had to change the APT_Transport_Block_size since the legacy app had crazy tables with columns like varchar(8000).

Thanks for your help everyone.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ummm... not "garbage" characters but Field Marks as in the @FM character you used Field() to strip from the strings. It's important to understand both what they are and why they are there, all of those @ "marks" are some of the fundamental underpinnings of DataStage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When DataStage ingests a string containing line terminator characters it converts them into field marks (this allows DataStage to be operating system agnostic). The field mark character is available through the @FM system variable - its value in hex is 0xFE.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply