Query_TopsPerSubGrpLargeData.mdb |
Post Reply |
Author | |
A.D. Tejpal
Microsoft MVP Joined: 30 Jun 2008 Status: Offline Points: 192 |
Post Options
Thanks(0)
Posted: 07 Jul 2008 at 11:10pm |
Description: 1 - This sample db examines various alternatives for getting Top values (as per date) so as to identify the most efficient method in the context of large data. Main data table has test results recorded for various patients over various dates. The objective is to find (Top result means latest result as per date): (a) Top 3 Results per test per patient. (b) Top 2nd and 3rd Results per test per patient. Note - Task (b) involves extra processing load as compared to (a) 2 - As a query acting directly on the whole data at a time is found to be impractical (it is extremely slow or hangs up altogether in case of large data size), five alternative methods for handling each of the objectives mentioned at 1 (a) & (b) are demonstrated. 3 - For facilitating experimentation with test data, form F_FillTables enables programmatic population of data tables as per the size desired. For this task, two alternative styles are demonstrated: (a) Based purely upon recordsets (Very slow). (b) Combination of recordsets and append query (Extremely fast). Note - For building up test data consisting of 1200,000 records (1000 patients, 20 tests per patient, 60 readings per test), method (a) takes 4,605 seconds while method (b) does it in just 210 seconds. 4 - Performance of methods using form interface (A, AA and B to E, F, FF and G to K ) for getting the top results gets automatically logged in table T_Log. 5 - For academic interest, sample queries acting directly upon the whole data, are also given (in series Q_Z_L to Q_Z_R). These can be tested on comparatively small sized data on account of slow speed and tendency to hang up if data size is large. 6 - Conclusions: Results of certain trials are contained in table T_Conclusions. Conclusions based upon these are summarized below: 6.1 - Piecemeal processing of data in convenient packets, through saved queries with Top clause, using filter criteria based upon global variables fed via user defined functions, and storage of output in an interim table (T_Result), is found to be fastest (Form options A, B, F and G). 6.2 - Using reference to form controls (Form options AA and FF) instead of user defined functions based upon global variables as in 4.6.1 above for performing the same job, results in noticeable slow-down. For instance, using option A or B, a result set having 40,000 Top 2nd and 3rd results based upon data having 1200,000 records gets displayed in 581 sec, whereas, using option AA, it takes 798 sec. 6.3 - Action similar to 6.1 above, but using SQL string built at run time (Form options D and J), is found to be slower than that using saved queries with filtering via global variables, though faster than saved queries with filtering via reference |
|
Sponsored Links | Want a good read? Try The Summer of His Life (available on Amazon!) |
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |