Print Page | Close Window

Query_TopsPerSubGrpLargeData.mdb

Printed From: Roger's Access Library
Category: Other Download Libraries
Forum Name: Tejpal, A.D.
Forum Description: A.D. Tejpal's Library
URL: www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=61
Printed Date: 17 Oct 2017 at 7:29am


Topic: Query_TopsPerSubGrpLargeData.mdb
Posted By: A.D. Tejpal
Subject: Query_TopsPerSubGrpLargeData.mdb
Date Posted: 07 Jul 2008 at 11:10pm
Description: 

    1 - This sample db examines various alternatives for getting Top values (as per 
date) so as to identify the most efficient method in the context of large data. Main 
data table has test results recorded for various patients over various dates. The 
objective is to find (Top result means latest result as per date):
     (a) Top 3 Results per test per patient.
     (b) Top 2nd and 3rd Results per test per patient. 
     Note - Task (b) involves extra processing load as compared to (a)

    2 - As a query acting directly on the whole data at a time is found to be 
impractical (it is extremely slow or hangs up altogether in case of large data size), 
five alternative methods for handling each of the objectives mentioned at 1 (a) & (b) 
are demonstrated.

    3 - For facilitating experimentation with test data, form F_FillTables enables 
programmatic population of data tables as per the size desired. For this task, two 
alternative styles are demonstrated:
     (a) Based purely upon recordsets  (Very slow).
     (b) Combination of recordsets and append query (Extremely fast).
     Note - For building up test data consisting of  1200,000 records (1000 patients, 
20 tests per patient, 60 readings per test), method (a) takes 4,605 seconds while 
method (b) does it in just 210 seconds.

    4 - Performance of methods using form interface (A, AA and B to E, F, FF and G 
to K ) for getting the top results gets automatically logged in table T_Log.

    5 - For academic interest, sample queries acting directly upon the whole data, are
 also given (in series Q_Z_L to Q_Z_R). These can be tested on comparatively 
small sized data on account of slow speed and tendency to hang up if data size is 
large.

    6 - Conclusions:
     Results of certain trials are contained in table T_Conclusions. Conclusions 
based upon these are summarized below:

     6.1 - Piecemeal processing of data in convenient packets, through saved 
queries with Top clause, using filter criteria based upon global variables fed via user 
defined functions, and storage of output in an interim table (T_Result), is found to
 be fastest (Form options A, B, F and G).

     6.2 - Using reference to form controls  (Form options AA and FF) instead of user 
defined functions based upon global variables as in 4.6.1 above for performing the 
same job, results in noticeable slow-down. For instance, using option A or B, a result 
set having 40,000 Top 2nd and 3rd results based upon data having 1200,000 
records gets displayed in 581 sec, whereas, using option AA, it takes 798 sec.

     6.3 - Action similar to 6.1 above, but using SQL string built at run time (Form 
options D and J), is found to be slower than that using saved queries with filtering 
via global variables, though faster than saved queries with filtering via reference 
to form controls. For instance, using option D, a result set having 40,000 Top 2nd
and 3rd results based upon data having 1200,000 records gets displayed in 630 sec as
compared to 581 sec for options A or B, and 798 sec for option AA. 4.6.4 - Action similar to 6.1 above, but using saved query with a count subquery
(Form options C and H), is found to be very slow. For instance, using option C, a result set having 4,000 Top 2nd and 3rd results based upon data having 120,000 records takes 1,082 sec to get displayed, as compared to mere 17 sec for options A or B. 6.5 - Using a supplementary table (T_Temp) to store the filtered piecemeal records (Form options E and K), before processing the same and appending to results table (T_Result), does not offer any advantage. Instead, it takes almost double the time for options A and F. 6.6 - Queries acting directly upon the whole data at a time are all found very slow and prone to hang up in case of larger data size. For academic information, some of the comparative results, based upon data consisting of 24,000 records are furnished below: 6.6.1 - Top 3 results - 1200 records in result set. (This task, if performed by form option F, takes MERE 2 Seconds). (a) Query: Q_Z_L_Top3_TotalsQry_TopInHaving - 150 Sec (First screenfull appears in 5 Sec). (b) Query: Q_Z_M_Top3_TotalsQry_TopInWhere - 140 Sec (All Results at completion only). 6.6.2 - Top 2nd and 3rd results - 800 records in result set. (This task, if performed by form option A, takes MERE 3 Seconds). (a) Query: Q_Z_N_Top2nd3rd_PlainQryWithCountSubQuery - 212 Sec (All Results at completion only). (b) Query: Q_Z_O_Top2nd3rd_TotalsQryCountInHaving - 217 Sec (First screenfull appears in 10 Sec) (c) Query: Q_Z_P_Top2nd3rd_TotalsQryCountInWhere - 210 Sec (All Results at completion only). (d) Query: Q_Z_R_Top2nd3rd_PlainQryWithUDF - 130 Sec (All Results at completion only). Versions - Access 2000 File Format (Developed on Access 2003. Also tested on Access XP & 2K)

 

Download Access 2000 format: http://www.rogersaccesslibrary.com/forum/uploads/37/Query_TopsPerSubGrpLargeData.zip -



Print Page | Close Window