Description:
1 - This sample db examines various alternatives for getting Top values (as per
date) so as to identify the most efficient method in the context of large data. Main
data table has test results recorded for various patients over various dates. The
objective is to find (Top result means latest result as per date):
(a) Top 3 Results per test per patient.
(b) Top 2nd and 3rd Results per test per patient.
Note - Task (b) involves extra processing load as compared to (a)
2 - As a query acting directly on the whole data at a time is found to be
impractical (it is extremely slow or hangs up altogether in case of large data size),
five alternative methods for handling each of the objectives mentioned at 1 (a) & (b)
are demonstrated.
3 - For facilitating experimentation with test data, form F_FillTables enables
programmatic population of data tables as per the size desired. For this task, two
alternative styles are demonstrated:
(a) Based purely upon recordsets (Very slow).
(b) Combination of recordsets and append query (Extremely fast).
Note - For building up test data consisting of 1200,000 records (1000 patients,
20 tests per patient, 60 readings per test), method (a) takes 4,605 seconds while
method (b) does it in just 210 seconds.
4 - Performance of methods using form interface (A, AA and B to E, F, FF and G
to K ) for getting the top results gets automatically logged in table T_Log.
5 - For academic interest, sample queries acting directly upon the whole data, are
also given (in series Q_Z_L to Q_Z_R). These can be tested on comparatively
small sized data on account of slow speed and tendency to hang up if data size is
large.
6 - Conclusions:
Results of certain trials are contained in table T_Conclusions. Conclusions
based upon these are summarized below:
6.1 - Piecemeal processing of data in convenient packets, through saved
queries with Top clause, using filter criteria based upon global variables fed via user
defined functions, and storage of output in an interim table (T_Result), is found to
be fastest (Form options A, B, F and G).
6.2 - Using reference to form controls (Form options AA and FF) instead of user
defined functions based upon global variables as in 4.6.1 above for performing the
same job, results in noticeable slow-down. For instance, using option A or B, a result
set having 40,000 Top 2nd and 3rd results based upon data having 1200,000
records gets displayed in 581 sec, whereas, using option AA, it takes 798 sec.
6.3 - Action similar to 6.1 above, but using SQL string built at run time (Form
options D and J), is found to be slower than that using saved queries with filtering
via global variables, though faster than saved queries with filtering via reference to form controls. For instance, using option D, a result set having 40,000 Top 2nd and 3rd results based upon data having 1200,000 records gets displayed in 630 sec as compared to 581 sec for options A or B, and 798 sec for option AA.
4.6.4 - Action similar to 6.1 above, but using saved query with a count subquery (Form options C and H), is found to be very slow. For instance, using option C, a
result set having 4,000 Top 2nd and 3rd results based upon data having 120,000
records takes 1,082 sec to get displayed, as compared to mere 17 sec for options A
or B.
6.5 - Using a supplementary table (T_Temp) to store the filtered piecemeal
records (Form options E and K), before processing the same and appending to
results table (T_Result), does not offer any advantage. Instead, it takes almost
double the time for options A and F.
6.6 - Queries acting directly upon the whole data at a time are all found very
slow and prone to hang up in case of larger data size. For academic information,
some of the comparative results, based upon data consisting of 24,000 records are
furnished below:
6.6.1 - Top 3 results - 1200 records in result set. (This task, if performed by
form option F, takes MERE 2 Seconds).
(a) Query: Q_Z_L_Top3_TotalsQry_TopInHaving - 150 Sec (First screenfull
appears in 5 Sec).
(b) Query: Q_Z_M_Top3_TotalsQry_TopInWhere - 140 Sec (All Results at
completion only).
6.6.2 - Top 2nd and 3rd results - 800 records in result set. (This task, if
performed by form option A, takes MERE 3 Seconds).
(a) Query: Q_Z_N_Top2nd3rd_PlainQryWithCountSubQuery -
212 Sec (All Results at completion only).
(b) Query: Q_Z_O_Top2nd3rd_TotalsQryCountInHaving -
217 Sec (First screenfull appears in 10 Sec)
(c) Query: Q_Z_P_Top2nd3rd_TotalsQryCountInWhere -
210 Sec (All Results at completion only).
(d) Query: Q_Z_R_Top2nd3rd_PlainQryWithUDF -
130 Sec (All Results at completion only).
Versions - Access 2000 File Format (Developed on Access 2003. Also tested on Access XP & 2K)
Download Access 2000 format: http://www.rogersaccesslibrary.com/forum/uploads/37/Query_TopsPerSubGrpLargeData.zip -
|