Wednesday, March 7, 2012

Help: Stored Procedure performance issue.

Hi,

I have a large SQL Server 2000 database with 3 core tables.

Table A : 10 million + records
Table B : 2 million + records
Table C : 6 million + records

One of the batch tasks that I have to perform firstly builds a list of
all keys for records from each of the three tables that I need to
process (along with a flag to tell me which table the key is from).
This list is populated into a table variable.

I then loop through the table variable to process all the records with
the particular key value.
The updates are run in order of the tables ... Table A first, B next
and finally C.

The table variable will typically hold 3000 keys.

My problem is this ..... the processing of the key records from Table
A runs well - it takes around 40 minutes which is acceptable for the
level of processing being carried out. Though when I start processing
the transactions for Table B the first couple of statements execute
successfully though then the subsequent statements take a long time (in
some cases hours) to complete. The format of the statements for all
tables is virtually the same and the tables have been indexed
appropriately.

The thing is that if I alter the stored proc to only process records
from Table B or Table C ... the procedure flies through and processes
the records in a flash ... 1-2 minutes.

Can anyone suggest what might be the issue here ?
I have read many posts though can't seem to find the solution.
Should I break up my processing so that it processes each table
individually ?
I've tried running the Profiler though it doesn't provide me with much
in the way of solutions.

Regards,
IanUnless you provide table definitions and the code it is unlikely that
anyone will be able to help very much.

Quote:

Originally Posted by

>I then loop through the table variable to process all the records with
>the particular key value.


It sounds as though this "process" is where things go bad, but all we
know about the process is that it loops. Looping sounds like cursors.
(Cursors are powerful and indispensable tools - but are often
misused.)

There are other possible hints in the description... building a table
of keys instead of (potentially) using a derived table or view, using
a table variable with 3000 values, the fact that the processing of the
different tables is similar but apparently different brings the
cohesion of the procedure into question... but without specifics it
isn't even worth guessing.

Roy Harvey
Beacon Falls, CT|||Hi Roy,

Thanks for your response ..... to supply all the tables and code is
tough due to the volume.

The looping that I'm using is via a table variable

I create the table variable :

declare @.t_ElligiblePromotions table
(ElligiblePromotionRowNum int IDENTITY (1, 1) Primary key NOT NULL ,
[prefix] int NULL,
[body] int NULL,
[entryrectype] varchar(3) NULL,
[DateMostRecentRevsion] smalldatetime NULL)

Insert the key records :

INSERT INTO @.t_ElligiblePromotions
([prefix],
[body],
[entryrectype],
[DateMostRecentRevsion])
select distinct
t_Orders.prefix,
t_Orders.body,
@.RunType_Order,
getdate()
from t_Orders

then loop through the variable

SELECT @.ElligiblePromotions_RowCnt = (select min
(ElligiblePromotionsRowNum) from
@.t_ElligiblePromotions)
SELECT @.MaxElligiblePromotions = (select max
(ElligiblePromotionsRowNum) from
@.t_ElligiblePromotions)

while @.ElligiblePromotions_RowCnt <= @.MaxElligiblePromotions
begin
select @.Prefix = Prefix,
@.Body = Body,
@.entryrectype = entryrectype,
@.DateMostRecentRevsion = DateMostRecentRevsion
from @.t_ElligiblePromotions
where ElligiblePromotionsRowNum = @.ElligiblePromotion_RowCnt

-- Match the record to the control table - then loop through each prize
draw to determine elligibility
--
BEGIN
EXEC @.ErrorNum = proc_QualificationDetail @.ControlID,
@.Prefix, @.Body,
@.EntryRecType, @.DateMostRecentRevsion, @.Identity,
'FM', @.Result
OUTPUT

SET @.ElligiblePromotions_RowCnt =
@.ElligiblePromotions_RowCnt
+ 1
END

I've been reading about parameter sniffing and might try some of the
suggestions that have been put forward for it as it is the statements
in the child procedure proc_QualificationDetail where the issues are
arising with the updates to the 2nd and 3rd table.

Regards,
Ian.

Roy Harvey wrote:

Quote:

Originally Posted by

Unless you provide table definitions and the code it is unlikely that
anyone will be able to help very much.
>

Quote:

Originally Posted by

I then loop through the table variable to process all the records with
the particular key value.


>
It sounds as though this "process" is where things go bad, but all we
know about the process is that it loops. Looping sounds like cursors.
(Cursors are powerful and indispensable tools - but are often
misused.)
>
There are other possible hints in the description... building a table
of keys instead of (potentially) using a derived table or view, using
a table variable with 3000 values, the fact that the processing of the
different tables is similar but apparently different brings the
cohesion of the procedure into question... but without specifics it
isn't even worth guessing.
>
Roy Harvey
Beacon Falls, CT

|||Hopefully you saw my response to your other message.

Roy Harvey
Beacon Falls, CT|||On 29.10.2006 04:45, Catch_22 wrote:

Quote:

Originally Posted by

Hi,
>
I have a large SQL Server 2000 database with 3 core tables.
>
Table A : 10 million + records
Table B : 2 million + records
Table C : 6 million + records
>
One of the batch tasks that I have to perform firstly builds a list of
all keys for records from each of the three tables that I need to
process (along with a flag to tell me which table the key is from).
This list is populated into a table variable.
>
I then loop through the table variable to process all the records with
the particular key value.
The updates are run in order of the tables ... Table A first, B next
and finally C.
>
The table variable will typically hold 3000 keys.
>
My problem is this ..... the processing of the key records from Table
A runs well - it takes around 40 minutes which is acceptable for the
level of processing being carried out. Though when I start processing
the transactions for Table B the first couple of statements execute
successfully though then the subsequent statements take a long time (in
some cases hours) to complete. The format of the statements for all
tables is virtually the same and the tables have been indexed
appropriately.
>
The thing is that if I alter the stored proc to only process records
from Table B or Table C ... the procedure flies through and processes
the records in a flash ... 1-2 minutes.
>
Can anyone suggest what might be the issue here ?
I have read many posts though can't seem to find the solution.
Should I break up my processing so that it processes each table
individually ?
I've tried running the Profiler though it doesn't provide me with much
in the way of solutions.


Maybe the DB engine just runs out of buffer cache. Since table A is the
large one it may be that once your process gets to table B it has too
many dirty buffers and needs to flush them to disk before it can
continue processing. IMHO you should be able to detect that situation
if you look at physical IO (either with profiler or perfmon).

Kind regards

robert|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Next, you have described a sequential magentic file system and an
RDBMS. Rows are nothing like records. We do not use loops in SQL; it
is declarative language, not a procedural one.

Quote:

Originally Posted by

Quote:

Originally Posted by

>One of the batch tasks that I have to perform firstly builds a list of all keys for records [sic] from each of the three tables that I need to process (along with a flag [sic] to tell me which table the key is from). <<


Batch processing! We do not use flags in SQL. Flags are part of
process control, not data. We might have types. codes or other kinds
of attributes that are used declaratively. Next, a key is the key of
one and only one table; it can be referenced in several places. This
sounds like attribute splitting -- another characteristic of sequential
magentic file systems.

Quote:

Originally Posted by

Quote:

Originally Posted by

>This list is populated into a table variable. <<


Using proprietary features to mimic a scratch tape?

Quote:

Originally Posted by

Quote:

Originally Posted by

>I then loop through the table variable to process all the records[sic]with the particular key value. <<


Loops in a declarative language?

Quote:

Originally Posted by

Quote:

Originally Posted by

>The updates are run in order of the tables ... Table A first, B next and finally C.<<


I wish you had given the tables better names. If the tables have no
relationship among themselves, then you would simply use 3 UPDATE
statements. If the tables have no relationship among themselves,
either DRI actions or triggers would propagate related changes among
the tables. That is how a relational solution works.

Quote:

Originally Posted by

Quote:

Originally Posted by

> My problem is this ..... the processing of the key records [sic] from Table A runs well - it takes around 40 minutes which is acceptable for the level of processing being carried out. Though when I start processing the transactions for Table B the first couple of statements execute successfully though then the subsequent statements take a long time (in some cases hours) to complete. <<


Your looping has completely subverted any set-oriented processing which
SQL is designed to do.

Quote:

Originally Posted by

Quote:

Originally Posted by

>The format of the statements for all tables is virtually the same .. <<


This sounds like you have split a single entity across multiple tables.
Tables need to be very different becasue they each model a lpgically
different set of entities. For example, would you split Personnel into
a table of short people and a table of tall people? Of course not!
You would have a height attribute (column) in a single Personnel table.

Quote:

Originally Posted by

Quote:

Originally Posted by

>Can anyone suggest what might be the issue here? <<


Bad DDL, and non-relational programming.

Quote:

Originally Posted by

Quote:

Originally Posted by

>Should I break up my processing so that it processes each table individually ? <<


Ask yourself, if this is one LOGICAL operation. The quick test is can
it be given a name in the form <verb><object -- like
"UpdateInventory" or "InsertNewEmployee". The same rules you learned
about coupling and cohesion in your Software Engineering courses still
apply in SQL.

Please do not say that you cannot rewrite this stuff. Your DB will
only get bigger

I recently did a consulting job at a software company where they had
been trying to merge legs of a journey into longer trip -- say, you
have a journey from A to B, then from B to C, you want to show it as a
journey from A to C if certain temporal (consecutive legs of the trip
are no more than n days apart) and other constraints are met (the cargo
is of the same type for the same shipper, etc.). They were using
cursors and procedural code.

Thinking in sets, they reduced the existing procedure that was creating
a table to a single VIEW. The code was 90% smaller ~10 times faster
and the VIEW is *always* correct while the created table had to be
updated constantly. Years ago, I got a 2300% improvement and removed
~300 lines of code in an update procedure at an educational publisher
in California (I am good, but this was really because the original code
was that bad).

I know from a few decades of experience that it is worth it.|||1) The silly, redundant "" prefixes are in violation of both common
sense and ISO-11179 rules. Hey, we only have one data structure in
SQL!

2) Never put audit data in the table that is being audited. This is
the same principle as not putting the log file on the same drive as the
database. Revision dates never go into a table in the schema, but are
kept externally to it. Under SOX rules in the US, you can go to jail
or be fined.

3) Why did you fill your data dictionary with vague names that beg to
be defined? Prefix of what? Body of what?

4) Eligibility has only one "l", so your names are a bitch to read.
I really aware of that error; I have a tattoo that lets the world know
I am O-positive and have a Penicillin "Alergy" – Really ?!

5) I see you immediately jump to procedure calls with
"proc_qualificationdetail". Again the redundant prefix andpoor
name are in involution of basic software engineering. Tell me what the
procedures does, not what the parameters are. That means a
<verb><objectname, probably something like
"Determine_Eligibility()" instead.

Of course since this is a logical function, you probably should not use
a procedure or function at all. It ought to be in the WHERE clause of
the UPDATE statement.

Quote:

Originally Posted by

Quote:

Originally Posted by

>I create the table variable : <<


Because in the file system you are creating, all data must be
materialized to be persisted. In SQL, we have VIEWs for data that is
computed over and over as the schema changes. Also this is proprietary
and good programmers write to standards, not to dialect.

I gather that this routine is supposed to assign prizes to entries in a
promotion (much easier to understand that than just Tables A, B and C,
isn't it?). Without any more information, all anyone can do is guess
at the DDL, but here is my shot in a skeleton schema

CREATE TABLE Entries
(entry_id INTEGER NOT NULL PRIMARY KEY,
..);

CREATE TABLE Promotions
(promo_id INTEGER NOT NULL,
prize_id INTEGER NOT NULL,
PRIMARY KEY (promo_id, prize_id),
..);

CREATE VIEW Winners (entry_id, promo_id, prize_id)
SELECT entry_id, promo_id, prize_id
FROM Entries AS E, Promotions AS P
WHERE << assignment predicate >>

I am assuming that prizes are a part of promotion, not separate
entities that can exist alone. Having done something like this
before, the << assignment predicate >is a lot of hand-waving if the
rules are complex. But it can be done (cannot have won a prize in the
last 30 days, not an employee of Radio Station WXYZ, etc.) with careful
use of CASE expressions, you can do a decision table (Google "Logic
Gem") and be sure that you have a complete rules set.

No comments:

Post a Comment