Child pages
  • Automatically remove unused LEFT JOIN from list count database queries
Skip to end of metadata
Go to start of metadata

Imported form: http://groups.google.com/group/in-portal-dev/browse_thread/thread/0d1d8f386ed62b53#

That's not a rare case, when data from referenced table should be displayed along with data from table being viewed.

To do that "LEFT JOIN" clause is added a SELECT query being used for data retrieval from database.

That's not a bad habit actually. But, when table contains 1 million records and has 5 LEFT JOINs on it, then it really slows COUNT database query, used to calculate pagination for that grid.

I propose that we dynamically remove "LEFT JOIN" parts of database query if JOINed table isn't used anywhere in that database query. For example this query doesn't use JOINed table at all, so we can remove it automatically: 

SELECT table1.*
FROM table1
LEFT JOIN on table2 ON table1.field = table2.field
WHERE table1.field = 'test';

remove_unused_left_joins_from_list_count_sql.patch

Related Tasks

INP-788 - Getting issue details... STATUS

24 Comments

  1. Hi Alex,

    First of all, thanks for the patch!

    I suppose we are talking about Admin part at all?

    Also, this accounts for Catalog Permissions if we are counting Category
    Items. Please correct me if I am wrong.

    DA

  2. This will affect both admin and front-end. Only SQLs, that are used to get
    data to be displayed in grids are affected.

    Haven't tested, when subselect is used.

    On Fri, Jan 28, 2011 at 5:44 PM, Dmitry A. <dandre...@gmail.com> wrote:
    > Hi Alex,

    > First of all, thanks for the patch!

    > I suppose we are talking about Admin part at all?

    > Also, this accounts for Catalog Permissions if we are counting Category
    > Items. Please correct me if I am wrong.

    > DA

  3. Can you give me an example of SUB-SELECT in Grids.

    I don't think we are having any at the moment in default installation, do
    we?

    DA

  4. We don't, but you can create a calculated field with such sql:

    'GroupCount' => 'SELECT COUNT(*) FROM ' . TABLE_PREFIX . 'UserGroup ug LEFT
    JOIN ' . TABLE_PREFIX .'PortalGroup g ON g.GroupId = ug.GroupId WHERE
    ug.PortalUserId = %1$s.PortalUserId'

    It should show user membership group count for user.

    On Fri, Jan 28, 2011 at 10:54 PM, Dmitry A. <dandre...@gmail.com> wrote:
    > Can you give me an example of SUB-SELECT in Grids.

    > I don't think we are having any at the moment in default installation, do
    > we?

    > DA

  5. Very questionable in its usefulness patch. Has anyone measured how many
    milliseconds it allows you to win / save?

    Test on the table with 2.7 million entries. The result - 485.529 records.

    SELECT c . *
    FROM `inp_Cities` c
    WHERE c.Country
    IN (
    'RU', 'US', 'CN', 'IN', 'JP'
    )

    SELECT c . *
    FROM inp_Cities c
    LEFT JOIN inp_Regions r ON c.State = r.RegionId
    WHERE c.Country
    IN (
    'RU', 'US', 'CN', 'IN', 'JP'
    )

    Both requests are equally fast - for 0.004 seconds.

  6. As I've pointed out in this discussion title (maybe you haven't noticed),
    but what I've optimized is only COUNT SQL. When counting it does improve
    execution time. When doing "select *" of course it doesn't.

    You really think, that I will be positing patch to a group, that does
    nothing?

  7. You really think, that I will be positing patch to a group, that does
    nothing?

    Nothing personal - i got special invitation from Dmitry A. to comment this
    patch.

    My further experiments shows that count sqls with unused joins are equally
    fast too -

    SELECT COUNT(CityId)
    FROM `inp_Cities` c
    WHERE c.Country
    IN (
    'RU', 'US', 'CN', 'IN', 'JP'
    )

    SELECT COUNT(CityId)
    FROM inp_Cities c
    LEFT JOIN inp_Regions r
    ON
        c.Country = r.Country
        AND c.State = r.Code
    WHERE c.Country
    IN (
    'RU', 'US', 'CN', 'IN', 'JP'
    )

    Please give me example from real In-Portal system tables where SQL execution
    times with/without joins significantly differs.

  8. Project, where I tested it has large speed improvements. Then miracle
    happened, since according to your tests this shouldn't be happening at all
    :)

    I have nothing more to add, since you don't see my point anyway.

  9. Can I see table structures and SQLs which makes "miracle"? Why it is
    initially defined as "That's not a rare case", but now this is some secret
    very special project?

  10. Hi Erik,

    I guess it's time to look at the actual data if you don't mind.

    Please Export your 3-4 tables with data, archive and upload to the DEV
    server so we can take a closer look.

    Cheers!

    DA

  11. You can launch my SQLs as is at resumark dev database.

  12. I suppose that Alex must provide proof  of his patch's usefulness by
    uploading actual data to DEV and by posting SQL requests here.

  13. Hi Erik,

    As a matter of fact I have tested Alex's patch. I can tell you right away
    that I did NOT have a table of 1 million records with JOIN on 1 million
    which actually will make things pretty hard for any optimization. The patch
    does quite a good improvement when you have VERY large main table (ie. Users
    with 2-3+ mil. records, Links) and JOINs on other tables that can be removed
    since are NOT critical for the Counts.

    Make sense?

    As far as I have learned from different resources including Zend forums and
    webminars JOINing very big tables is not always a good idea and developer
    should consider adjusting his Table architecture in order to get desired
    results or running multiple queries.

    We'll be happy to look at your particular case to see what we can offer.

    DA

  14. In my patricular case there no problems - main table is 2.7 mln records, and
    is pretty good optimized with join and without join.

    If goal of this patch is fast count query, then good testing must show
    significant difference between "with patch" and "without patch" - in
    seconds.

  15. Hi Erik,

    I have looked at your particular SQLs and have nothing to add there is
    nothing to add there since Yes the patch is not applying ANY particular
    changes to them due to no need - at least I believe so!

    Now take a look at this query, it should familiar:

    SELECT %1$s.* %2$s FROM %1$s
     LEFT JOIN '.TABLE_PREFIX.'UserGroup ug ON %1$s.PortalUserId =
    ug.PortalUserId AND ug.PrimaryGroup = 1
     LEFT JOIN '.TABLE_PREFIX.'PortalGroup g ON ug.GroupId = g.GroupId
     LEFT JOIN '.TABLE_PREFIX.'%3$sPortalUserCustomData cust ON %1$s.ResourceId
    = cust.ResourceId
     LEFT JOIN '.TABLE_PREFIX.'%3$sImages img ON img.ResourceId =
    %1$s.ResourceId AND img.DefaultImg = 1',

    Try having 2 million Users there (which will have 2 mil. Custom Data
    records) and quote a few records in Images table (general table). Now comes
    the question do you really need that JOIN on CustomData and Images table for
    Counts? I don't think so - this is what Patch does - removes it from there
    in case if NONE of these tables are involved in current Filters. Simply try
    this scenario.

    Let me know if it makes sense to you?

    DA

  16. I believe that on this structure we'll got difference in query processing
    time by removing joins. Root of the problem is that some joins are wrong. If
    you don't want solve problem cardinally - then use this particular patch,
    and tons of  other patches. But right solution is in right data structure.

    I already recommended store PrimaryGroupId to the PortalUser table. By
    analogy there is required adding DefaultImageId field to the PortalUser
    table. And then you got right structure, that will be faster, with joins,
    without joins, with filters and without filters.

    In current u prefix there is not problem "leak unused joins", there is other
    problem - wrong joins. And thay are wrong always - when used, and when
    unused.

  17. Erik, we are happy to make changes to the system and improve currently used
    JOINs not only for Users but for all other Prefixes!

    Take a moment and carefully read your original post where you proposed to
    disable some things from the Grid to make things work differently for
    counts. From my point of view it can be considered as a Work-around as well
    as Alex's patch.

    I am sure you can notice that here in Groups and Issue Tracker that we are
    constantly working on improving the structure of Database and the Code.

    Said that - we'll be happy to have you on board and help with improving the
    ROOT problems that we all face in In-Portal.

    By all means, please start a new task in Issues Tracker and help In-Portal
    by improving those JOINs!

    Cheers!

    DA

  18. This Alex's patch is effective only in case when joins are incorrect.

    I already created task, which include suggestions to change data structure
    and to solve SQL performance related problems, where structural changes will
    not help - https://groups.google.com/forum/#!topic/in-portal-dev/J9pnGGWxhK0

  19. Hi Erik,

    Well let's say that incorrectly setup JOINS were inherited with the
    software.

    Now let's say we have fixed this SQL by doing you have suggested and will
    result in something like this:

    SELECT %1$s.* %2$s FROM %1$s
     LEFT JOIN '.TABLE_PREFIX.'%3$sPortalUserCustomData cust ON %1$s.ResourceId
    = cust.ResourceId

    Now you have the case with 1 million records in Users table and 1 million
    in PortalUserCustomData table. Now you can keep the JOIN as is or you can
    remove it when it's NOT needed (columns from Custom table not used in WHERE
    at all). Obviously, it will run faster, correct?

    Now, you say it's the only SQL like this, unfortunately NO - we have it for
    each CategoryItems and in fact we can have it in other places. The matter of
    fact is that we have quite a Complex Tables structures which definitely can
    be optimized and re-architectured, but we need make sure we don't optimize
    in a way when we loose functionality.

    Erik, I hope we are agree on this? If, you have a different opinion on what
    I have said above - you are welcome to explain why!

    Thanks for starting the new discussion (
    https://groups.google.com/forum/#!topic/in-portal-dev/J9pnGGWxhK0 ) - it's
    very big step and we always look forward for ideas. We'll definitely
    continue talking about what you have posted in that discussion.

    We are here to talk and not to fight - I am sure we all understand that.
    --

    Best regards,

    Dmitry A.

  20. PortalUserCustomData join also is wrong. To improve it is enough replace
    ResourceId index with unique one in PortalUserCustomData table. Then this
    particular patch will have no effect. No dependance of record count. In
    unique index case SQL server "understands" that relation is "one to one", so
    no matter how much records are in left joined table.

  21. I see, that your comment isn't related to this discussion, but rather to
    another discussion about database optimizations. Please post it there
    instead.

  22. No, this post is about this particular patch. - "Then this particular patch
    will have no effect." - Please read text carefully.

  23. Hi Erik,

    I think we have explained where and when this Patch kicks-in. If it doesn't
    work for your particular SQL it doesn't mean it won't work for mine. The
    example that were shown here are from current default In-Portal
    installation. I can easily write here a SQL which is custom and will apply
    to it, and it's going to be correct.

    We can continue argue about current SQLs in In-Portal, but again I strongly
    believe it has NO relevance to this patch since the patch does VERY simple
    and straight forward job:

    "*Remove UNUSED Joins from COUNT SQL when it's possible*".

    I have started a new discussion particularly to cover "Reviewing &
    Optimizing Database Queries<https://groups.google.com/d/topic/in-portal-dev/uaTOwx0vAMc/discussion>
    ".

    DA

  24. "*Remove UNUSED Joins from COUNT SQL when it's possible*"

    This patch is not needed as the work of this patch is done fine by
    MySQL-server - MySQL-server itself ignores joins that not alter the result
    of the SQL-query. The positive effect of this patch can be observed only in
    the case of incorrect (BUG) data structure. In such cases, the patch is also
    not needed - instead, should be corrected BUGs in the data structure.