Monday, March 12, 2012

Heterogeneous Data *Target*

I want to create a custom read-only pull subscriber to populate a non-RDBMS
database, for which I can't use any of the existing subscriber types (e.g.
SQL, ODBC/OLE).
I understand enough about how the replication process and tables work (SQL2K
at the moment) to do this. So, for example, I can do the following:
1. Set up a local (i.e. same SQL Server as publisher) SQL subscriber to
replicate the data set I want into a database on the same SQL Server. E.g.
replicate from myserver.sourcedb to myserver.targetdb. This works well; i.e.
I can make updates in sourcedb and see them propagate through to targetdb,
and when sp_MSdistribution_cleanup runs it cleans up
distribution..msrepl_commands correctly.
2. I can then disable / remove the Replication Distribution job, and
commands will begin to queue up in distribution..msrepl_commands.
3. I can then service these commands myself using my custom distributor,
parsing the commands to update my target database. The rest of the process
remains the same; i.e. the log reader and the standard cleanups, so I'm only
replacing the Replication Distribution process / job.
My questions are:
1. As this specific activity is (apparently) undocumented, can I rely on MS
continuing to use a similar model for replication in future SQL Server
versions? I suppose we can tell this for 2005 already although I don't know
the answer to that yet. I wouldn't find a small amount of change a problem
as long as the basic model remained the same.
2. I am parsing the SQL commands in msrepl_commands.command. All the
commands I've seen so far are 'simple'; i.e. they are of only a handful of
forms such as:
'update <table> set <column> = <value>'
, or
'delete <table> where <primary key col 1> = <value 1> and <primary key col
2> = <value 2>'
, etc.
Then there are a few snapshot-related commands which I can recognize.
Again, is this something I can rely on? This may not be worth the effort if
it transpires a sophisticated parsing function is actually required. Note: I
turn off SP replication and ignore schema changes for now.
3. Can I be reasonably sure that some routine or ad-hoc maintenance process
will not - as a goodwill gesture - 'fix' the replication for me and reinstate
the standard Replication Distribution job?
Thanks.
this is not a good path to go down. You are making yourself vulnerable to
the next sp, patch or version of SQL Server MS releases. Also if things go
south you cannot rely on MS for support. Also you are locking your company
into a solution which only you understand.
This being said, you should be safe for the immediate future. SQL 2005 is
similar to SQL 2000 tranny repl.
I take it you are using sp_browsereplcmds to correctly format everything for
you. You can also use sp_dumpparam (I think that's the name of the command)
for more info.
Hilary
"sellotape" <sellotape@.discussions.microsoft.com> wrote in message
news:3116B0C9-967D-48A4-9B65-60394E33E181@.microsoft.com...
>I want to create a custom read-only pull subscriber to populate a non-RDBMS
> database, for which I can't use any of the existing subscriber types (e.g.
> SQL, ODBC/OLE).
> I understand enough about how the replication process and tables work
> (SQL2K
> at the moment) to do this. So, for example, I can do the following:
> 1. Set up a local (i.e. same SQL Server as publisher) SQL subscriber to
> replicate the data set I want into a database on the same SQL Server.
> E.g.
> replicate from myserver.sourcedb to myserver.targetdb. This works well;
> i.e.
> I can make updates in sourcedb and see them propagate through to targetdb,
> and when sp_MSdistribution_cleanup runs it cleans up
> distribution..msrepl_commands correctly.
> 2. I can then disable / remove the Replication Distribution job, and
> commands will begin to queue up in distribution..msrepl_commands.
> 3. I can then service these commands myself using my custom distributor,
> parsing the commands to update my target database. The rest of the
> process
> remains the same; i.e. the log reader and the standard cleanups, so I'm
> only
> replacing the Replication Distribution process / job.
> My questions are:
> 1. As this specific activity is (apparently) undocumented, can I rely on
> MS
> continuing to use a similar model for replication in future SQL Server
> versions? I suppose we can tell this for 2005 already although I don't
> know
> the answer to that yet. I wouldn't find a small amount of change a
> problem
> as long as the basic model remained the same.
> 2. I am parsing the SQL commands in msrepl_commands.command. All the
> commands I've seen so far are 'simple'; i.e. they are of only a handful of
> forms such as:
> 'update <table> set <column> = <value>'
> , or
> 'delete <table> where <primary key col 1> = <value 1> and <primary key
> col
> 2> = <value 2>'
> , etc.
> Then there are a few snapshot-related commands which I can recognize.
> Again, is this something I can rely on? This may not be worth the effort
> if
> it transpires a sophisticated parsing function is actually required.
> Note: I
> turn off SP replication and ignore schema changes for now.
> 3. Can I be reasonably sure that some routine or ad-hoc maintenance
> process
> will not - as a goodwill gesture - 'fix' the replication for me and
> reinstate
> the standard Replication Distribution job?
> Thanks.
|||"Hilary Cotter" wrote:

> this is not a good path to go down. You are making yourself vulnerable to
> the next sp, patch or version of SQL Server MS releases. Also if things go
> south you cannot rely on MS for support. Also you are locking your company
> into a solution which only you understand.
Thanks; that's the answer I was expecting.
I was wondering something that I didn't ask explicitly in the original
question. This is a common requirement (in general); namely to replicate a
part of a data source to another repository. It's not always the case that
the target is a compatible RDBMS. Some companies replicate parts of their
personnel data (via LDAP) to ADS (or other DS) trees, others replicate to
authentication or access systems, address lists and so on. What we're all
doing in the absence of a fairly generic solution like I'm suggesting is
probably a combination of *additional* triggers, *additional* extended SPs,
polling (more highly latent and possibly less efficient) for changes, or
taking full snapshots (again more highly latent and possibly less efficient).
Mostly, we are either taking a latency and / or efficiency hit, or we are
just re-inventing the existing replication metaphor. Usually we are also
unnecessarily modifying the data source (additional triggers, extended procs,
timestamps) to achieve this.
As MS already provides a documented solution for replication into MSSQL from
heterogeneous sources; is it not a little limiting that it does not provide
one for replication out to heterogeneous targets? Then again perhaps it
does, but I've not spotted it yet.
Then just as a sidebar, in this context it's perhaps interesting to note
that even when sticking only to fully vendor-supported solutions this is no
guarantee that additional work will not be required when upgrading. E.g.
several VC++ / VB upgrades have not been backwardly compatible and required a
fair amount of manual rework. I'm not saying it's not understandable; just
that it happens.

No comments:

Post a Comment