My company is in the midst of a philosophical debate over how best to break up multi-step processes. Here are two ETL examples:
1. Download and ingest data:
1.a. Check to see whether the file exists on the remote site.
1.b. Run an FTP job to transfer the file locally.
1.c. Load the data from the file into a staging table.
1.d. Run validation against the data.
1.e. Transfer the data to the production database.
1.f. Clean out the staging table.
2. Transfer data from one database to another.
2.a. SQL trigger watches for a new record in DB1.
2.b. SQL job inserts the new record into a staging table in DB2.
2.c. Perform validation against the data in the stagng table.
2.d. Transfer the data to the production tables in DB2.
2.e. Clear out the staging table.
Historically, we've often done this by putting all the steps in a single program or perl script. Everyone agrees that this is a terrible way to go. If 1.c fails, the only way to move forward after you've fixed whatever environmental issue caused the failure is to make a copy of the program and rewrite it to skip 1.a and 1.b..
Our initial thought was that we would simply break each job down into its constituent parts, and each stage would get its own Autosys job. The jobs would go into a box, and that box would replace the original job. Now if 1.3 fails, you fix the environmental issue, force_start job #3 and let the process continue on its way.
Others have said that Autosys doesn't do a particularly good job at this sort of thing and they would prefer that we use a dedicated ETL tool like SSIS or Informatica that can be set up to break each "task" into "sub-tasks." If 1.3 fails, you fix the environmental issue and then just re-run the SSIS task #1 as a whole. It's smart enough to realize that 1.a and 1.b have already run and can be skipped.
I can see the argument - if job 2.c failed and box #2 were force_started, then 2.a and 2.b would run a second time, with potentially catastrophic results if the staging table had never been cleared out (though in theory it's the developers' job to make sure that any error situation returns to the pre-transaction state). My concern is visibility: if the job fails, the only way to identify which step failed might be to launch a separate SSIS console, which strikes me as a less supportable model (but maybe SSIS can be configured to write to stdout after each sub-task was completed; problem solved).
I'd like to open this up and see what the community thinks.