Optimizing Performance

Innovative does not provide end-user support for creating SQL queries. Sierra Direct SQL Access users assume responsibility for the effects of custom SQL. Poorly constructed queries can compromise system performance.

The following tips help you ensure optimal system performance:

Optimizing Reports on Material Type and Bib Level

How to Ensure that Your Queries Are Well-tuned

Optimizing Reports on Material Type and Bib Level

Given that material type and bib level are stored in different BCODE columns for different sites, Innovative has explicitly (and redundantly) stored these values in dedicated tables to make writing reports clearer.

How to Ensure that Your Queries Are Well-tuned

The default system timeout for a Sierra Direct SQL Access query is 10 minutes.

SQL queries can be written in a variety of ways to achieve the same result. Some statements are dramatically more efficient than others. Inefficient statements might take hours (or even days!) to complete. Worse, they can seriously degrade the performance of the production Sierra application.

Example: Using the EXPLAIN command to analyze the "cost" of an SQL statement.

Like most relational databases, PostgreSQL supports the EXPLAIN command, which "explains" how the database executes the SQL statement. PostgreSQL regularly analyzes the structure of the database (tables, indexes, index types, and so forth) as well as the contents of the database, such as how many rows each table has. It determines an optimal plan for executing the SQL statement by comparing the "cost" (measured in the number of 8K pages of I/O) of the various options. It will use the lowest cost plan and present that plan in response to the EXPLAIN command.

The following SQL sample includes the EXPLAIN command before the query:

--
-- list bibs with more than x items
--
EXPLAIN SELECT
 b.record_num                AS "bib_record",
 date(b.cataloging_date_gmt) AS "bib_cataloged_dt",
 b.title                     AS "bib_title",
 count(l.id)                 AS "item_count"
FROM
 sierra_view.bib_view b
   JOIN sierra_view.bib_record_item_record_link l ON l.bib_record_id = b.id
GROUP BY 1, 2, 3
HAVING count(l.id) > 100
ORDER BY 4 DESC

PostgreSQL returns the following plan:

                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=168530.75..168534.81 rows=1623 width=53)
 Sort Key: (count(bib_record_item_record_link.id))
 ->  HashAggregate  (cost=168419.86..168444.21 rows=1623 width=53)
       Filter: (count(bib_record_item_record_link.id) > 100)
       ->  Nested Loop  (cost=75459.66..168399.58 rows=1623 width=53)
             ->  Merge Join  (cost=75459.66..167643.59 rows=541 width=53)
                   Merge Cond: (md.id = br.record_id)
                   ->  Merge Join  (cost=60430.93..209623.84 rows=546 width=53)
                         Merge Cond: (md.id = brp.bib_record_id)
                         ->  Index Scan using pk_record_id on record_metadata md  (cost=0.00..604517.90 rows=5442 width=12)
                               Filter: (((record_type = 'b'::bpchar) AND (char_length((campus_code)::text) = 0)) OR (campus_code IS NULL))
                         ->  Index Scan using idx_bib_record_property__bib_record_id on bib_record_property brp  (cost=0.00..38850.44 rows=588982 width=41)
                   ->  Index Scan using bib_record_record_key on bib_record br  (cost=0.00..63943.76 rows=584471 width=24)
             ->  Index Scan using uc_bib_record_item_record_link on bib_record_item_record_link  (cost=0.00..1.24 rows=12 width=16)
                   Index Cond: (bib_record_id = br.id)

This query is quite expensive at 168,534 "8K pages of I/O".

PgAdmin III's Explain tab provides a graphical representation of the plan:

PgAdmin Graphical EXPLAIN

You can add EXPLAIN ANALYZE to a query to get the actual run timing:

 --
 -- list bibs with more than x items
 --
 EXPLAIN ANALYZE SELECT
  b.record_num                AS "bib_record",
  date(b.cataloging_date_gmt) AS "bib_cataloged_dt",
  b.title                     AS "bib_title",
  count(l.id)                 AS "item_count"
 FROM
  sierra_view.bib_view b
    JOIN sierra_view.bib_record_item_record_link l ON l.bib_record_id = b.id
 GROUP BY 1, 2, 3
 HAVING count(l.id) > 100
 ORDER BY 4 DESC

PostgreSQL returns the following plan:

                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=168530.75..168534.81 rows=1623 width=53) (actual time=23819.456..23819.497 rows=233 loops=1)
 Sort Key: (count(bib_record_item_record_link.id))
 Sort Method: quicksort  Memory: 45kB
 ->  HashAggregate  (cost=168419.86..168444.21 rows=1623 width=53) (actual time=23500.412..23819.018 rows=233 loops=1)
       Filter: (count(bib_record_item_record_link.id) > 100)
       ->  Nested Loop  (cost=75459.66..168399.58 rows=1623 width=53) (actual time=628.412..21255.439 rows=1752105 loops=1)
             ->  Merge Join  (cost=75459.66..167643.59 rows=541 width=53) (actual time=628.375..13420.045 rows=588467 loops=1)
                   Merge Cond: (md.id = br.record_id)
                   ->  Merge Join  (cost=60430.93..209623.84 rows=546 width=53) (actual time=628.357..11540.587 rows=588467 loops=1)
                         Merge Cond: (md.id = brp.bib_record_id)
                         ->  Index Scan using pk_record_id on record_metadata md  (cost=0.00..604517.90 rows=5442 width=12) (actual time=628.318..9858.445 rows=1074590 loops=1)
                               Filter: (((record_type = 'b'::bpchar) AND (char_length((campus_code)::text) = 0)) OR (campus_code IS NULL))
                         ->  Index Scan using idx_bib_record_property__bib_record_id on bib_record_property brp  (cost=0.00..38850.44 rows=588982 width=41) (actual time=0.023..800.874 rows=588467 loops=1)
                   ->  Index Scan using bib_record_record_key on bib_record br  (cost=0.00..63943.76 rows=584471 width=24) (actual time=0.012..1070.222 rows=588467 loops=1)
             ->  Index Scan using uc_bib_record_item_record_link on bib_record_item_record_link  (cost=0.00..1.24 rows=12 width=16) (actual time=0.005..0.010 rows=3 loops=588467)
                   Index Cond: (bib_record_id = br.id)
Total runtime: 23819.693 ms

An alternative SQL statement that produces the same result might be more efficient:

--
-- list bibs with more than x items
--
EXPLAIN ANALYZE SELECT 
  b.record_num                AS "bib_record",
  date(b.cataloging_date_gmt) AS "bib_cataloged_dt",
  b.title                     AS "bib_title",
  a.item_count                AS "item_count"
FROM
 (SELECT
    l.bib_record_id AS "bib_record_id",
    count(l.id) AS "item_count"
  FROM
    sierra_view.bib_record_item_record_link l
  GROUP BY l.bib_record_id
  HAVING count(l.id) > 100) AS a
    JOIN sierra_view.bib_view b ON a.bib_record_id = b.id

PostgreSQL returns the following plan:

                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=207567.02..235100.81 rows=140 width=53) (actual time=3294.564..6361.711 rows=233 loops=1)
 Merge Cond: (md.id = br.record_id)
 ->  Merge Join  (cost=60430.93..209623.84 rows=546 width=53) (actual time=605.104..3639.233 rows=571278 loops=1)
       Merge Cond: (md.id = brp.bib_record_id)
       ->  Index Scan using pk_record_id on record_metadata md  (cost=0.00..604517.90 rows=5442 width=12) (actual time=605.065..2356.296 rows=1056815 loops=1)
             Filter: (((record_type = 'b'::bpchar) AND (char_length((campus_code)::text) = 0)) OR (campus_code IS NULL))
       ->  Index Scan using idx_bib_record_property__bib_record_id on bib_record_property brp  (cost=0.00..38850.44 rows=588982 width=41) (actual time=0.021..646.761 rows=571278 loops=1)
 ->  Sort  (cost=132107.36..132484.78 rows=150967 width=24) (actual time=2639.468..2639.595 rows=233 loops=1)
       Sort Key: br.record_id
       Sort Method: quicksort  Memory: 43kB
       ->  Hash Join  (cost=107238.06..119121.28 rows=150967 width=24) (actual time=2340.560..2639.222 rows=233 loops=1)
             Hash Cond: (a.bib_record_id = br.id)
             ->  Subquery Scan on a  (cost=79925.46..83322.21 rows=150967 width=16) (actual time=1800.379..2033.996 rows=233 loops=1)
                   ->  HashAggregate  (cost=79925.46..81812.54 rows=150967 width=16) (actual time=1800.377..2033.845 rows=233 loops=1)
                         Filter: (count(bib_record_item_record_link.id) > 100)
                         ->  Seq Scan on bib_record_item_record_link  (cost=0.00..66773.69 rows=1753569 width=16) (actual time=0.013..448.341 rows=1752106 loops=1)
             ->  Hash  (cost=16581.71..16581.71 rows=584471 width=24) (actual time=540.008..540.008 rows=588467 loops=1)
                   Buckets: 32768  Batches: 2  Memory Usage: 16086kB
                   ->  Seq Scan on bib_record br  (cost=0.00..16581.71 rows=584471 width=24) (actual time=0.018..312.221 rows=588467 loops=1)
Total runtime: 6368.756 ms

The cost has increased to 235,100 8K pages, yet the resulting runtime has dropped from 24 seconds to 6.3 seconds. (Even when run repeatedly!) Obviously, there is much more to runtime performance than I/O alone (on which cost is solely based). Assessment of the number and nesting of loops, the size of the sorts (and whether they are in memory or on disk), and other such considerations can all have a dramatic effect on runtime. Many people, even experienced developers, liken interpreting these plans to reading tea leaves.