PostgreSQL
 sql >> Database >  >> RDS >> PostgreSQL

La query PostgreSQL viene eseguita più velocemente con la scansione dell'indice, ma il motore sceglie l'hash join

La mia ipotesi è che stai usando il valore predefinito random_page_cost = 4 , che è troppo alto, rendendo la scansione dell'indice troppo costosa.

Provo a ricostruire le 2 tabelle con questo script:

CREATE TABLE replays_game (
    id integer NOT NULL,
    PRIMARY KEY (id)
);

CREATE TABLE replays_playeringame (
    player_id integer NOT NULL,
    game_id integer NOT NULL,
    PRIMARY KEY (player_id, game_id),
    CONSTRAINT replays_playeringame_game_fkey
        FOREIGN KEY (game_id) REFERENCES replays_game (id)
);

CREATE INDEX ix_replays_playeringame_game_id
    ON replays_playeringame (game_id);

-- 150k games
INSERT INTO replays_game
SELECT generate_series(1, 150000);

-- ~150k players, ~2 games each
INSERT INTO replays_playeringame
select trunc(random() * 149999 + 1), generate_series(1, 150000);

INSERT INTO replays_playeringame
SELECT *
FROM
    (
        SELECT
            trunc(random() * 149999 + 1) as player_id,
            generate_series(1, 150000) as game_id
    ) AS t
WHERE
    NOT EXISTS (
        SELECT 1
        FROM replays_playeringame
        WHERE
            t.player_id = replays_playeringame.player_id
            AND t.game_id = replays_playeringame.game_id
    )
;

-- the heavy player with 3000 games
INSERT INTO replays_playeringame
select 999999, generate_series(1, 3000);

Con il valore predefinito di 4:

game=# set random_page_cost = 4;
SET
game=# explain analyse SELECT "replays_game".*
FROM "replays_game"
INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
WHERE "replays_playeringame"."player_id" = 999999;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1483.54..4802.54 rows=3000 width=4) (actual time=3.640..110.212 rows=3000 loops=1)
   Hash Cond: (replays_game.id = replays_playeringame.game_id)
   ->  Seq Scan on replays_game  (cost=0.00..2164.00 rows=150000 width=4) (actual time=0.012..34.261 rows=150000 loops=1)
   ->  Hash  (cost=1446.04..1446.04 rows=3000 width=4) (actual time=3.598..3.598 rows=3000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 106kB
         ->  Bitmap Heap Scan on replays_playeringame  (cost=67.54..1446.04 rows=3000 width=4) (actual time=0.586..2.041 rows=3000 loops=1)
               Recheck Cond: (player_id = 999999)
               ->  Bitmap Index Scan on replays_playeringame_pkey  (cost=0.00..66.79 rows=3000 width=0) (actual time=0.560..0.560 rows=3000 loops=1)
                     Index Cond: (player_id = 999999)
 Total runtime: 110.621 ms

Dopo averlo abbassato a 2:

game=# set random_page_cost = 2;
SET
game=# explain analyse SELECT "replays_game".*
FROM "replays_game"
INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
WHERE "replays_playeringame"."player_id" = 999999;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=45.52..4444.86 rows=3000 width=4) (actual time=0.418..27.741 rows=3000 loops=1)
   ->  Bitmap Heap Scan on replays_playeringame  (cost=45.52..1424.02 rows=3000 width=4) (actual time=0.406..1.502 rows=3000 loops=1)
         Recheck Cond: (player_id = 999999)
         ->  Bitmap Index Scan on replays_playeringame_pkey  (cost=0.00..44.77 rows=3000 width=0) (actual time=0.388..0.388 rows=3000 loops=1)
               Index Cond: (player_id = 999999)
   ->  Index Scan using replays_game_pkey on replays_game  (cost=0.00..0.99 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=3000)
         Index Cond: (id = replays_playeringame.game_id)
 Total runtime: 28.542 ms
(8 rows)

Se si utilizza SSD, lo abbasserei ulteriormente a 1.1.

Per quanto riguarda la tua ultima domanda, penso davvero che dovresti rimanere con postgresql. Ho esperienza con postgresql e mssql e ho bisogno di triplicare lo sforzo in quello successivo affinché funzioni la metà rispetto al primo.