N+1 vs. Spring Data vs. Hibernate

Aneb Eager je past

Created by Petr "Glubo" Sykora

Čo vy si predstavujete pod takým N+1?


            SELECT * FROM article;
            SELECT * FROM author WHERE id = 1;
            SELECT * FROM author WHERE id = 2;
            -- ...
        

Možnosti řešení

Fetch join


            SELECT article.*, author.*
            FROM article
            LEFT JOIN author ON article.author_id = article.id;
        
  • Jeden dotaz ✅
  • Násobení řídkých dat N*M ❌

Možnosti řešení

Constant number queries


            SELECT * FROM article;
            SELECT * FROM author WHERE id IN (...);
        
  • Konstantní počet dotazů ✅
  • Žádné násobení řídkých dat N*M ✅
  • Ale víc jak jeden dotaz ❌

Spring Data a Hibernate

Fakt hodně vrstev

  • Spring Data
  • Spring Data JPA
  • JPA/Jakarta.persistance
  • Hibernate
  • JDBC, pgbouncer, postgresql

Co nezměříš, nevíš!

Možnosti meření

Hibernate log queries


        application.yml:
        ...

        spring:
            jpa:
                show-sql: true
                properties:
                    hibernate:
                    format_sql: true
        

Možnosti meření

Hibernate log queries


        Hibernate:
        select
            aem1_0.id,
            aem1_0.author_id,
            aem1_0.name,
            aem1_0.perex
        from
            article aem1_0
        

Možnosti meření

Hibernate statistics


            application.yml:
            ...

            spring:
                jpa:
                    properties:
                        hibernate:
                            generate_statistics: true
        

Možnosti meření

Hibernate statistics


            10:10:06.322 [pool-1-thread-1 @coroutine#7] INFO  o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
                5950 nanoseconds spent acquiring 1 JDBC connections;
                7000 nanoseconds spent releasing 1 JDBC connections;
                361002 nanoseconds spent preparing 23 JDBC statements;
                4345252 nanoseconds spent executing 23 JDBC statements;
                0 nanoseconds spent executing 0 JDBC batches;
                0 nanoseconds spent performing 0 L2C puts;
                0 nanoseconds spent performing 0 L2C hits;
                0 nanoseconds spent performing 0 L2C misses;
                1954341 nanoseconds spent executing 1 flushes (flushing a total of 224 entities and 0 collections);
                9386541973450 nanoseconds spent executing 1 pre-partial-flushes;
                1770 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
            }
        

Možnosti měření

TestContainers a pg_stat_statements

  • "Trochu" komplikovanější setup

        10:10:06.324 [pool-1-thread-1 @coroutine#7] INFO  PgStatsService - Query statistics:
        calls: 22 totalExec: 0s totalPlan: 0s query: 'select aem1_0.id,aem1_0.name from author aem1_0 where aem1_0.id=$1'
        calls: 1 totalExec: 0s totalPlan: 0s query: 'SELECT 1 FROM pg_stat_statements_reset()'
        calls: 1 totalExec: 0s totalPlan: 0s query: 'select aem1_0.id,aem1_0.author_id,aem1_0.name,aem1_0.perex from article aem1_0'
        calls: 2 totalExec: 0s totalPlan: 0s query: 'BEGIN'
        

Many To One

  • Tak tam hod Eager a bude...

Many To One - Nope ❌

FetchType.EAGER


            ...
            @ManyToOne(fetch = FetchType.EAGER)
            var author: AuthorBatchSizeMTO,
            ...
        

Many To One - Nope ❌

@BatchSize


            ...
            @Table(name = "article")
            @Entity
            @BatchSize(size = 100)
            class Article(
                @ManyToOne()
                @BatchSize(size = 100)
                var author: AuthorBatchSizeMTO,
            ...
        

Many To One - Nope ❌

FetchMode.JOIN


            ...
            @Fetch(FetchMode.JOIN)
            var author: Author,
            ...
        

Many To One - Win ✅

adhoc @EntityGraph


            ...
            @Repository
            interface Repo : CrudRepository<Article, Long> {

                @EntityGraph(attributePaths = [ "author" ])
                abstract override fun findAll(): MutableList<Article>
            ...
        

Many To One - Win ✅

Named @EntityGraph


            @Entity
            @NamedEntityGraph(
                name = "myDescriptiveName",
                attributeNodes = [
                    NamedAttributeNode("author"),
                ],
            )
            class Article(
            ...
            @Repository
            interface Repo : CrudRepository<Article, Long> {

                @EntityGraph(value = "myDescriptiveName")
                abstract override fun findAll(): MutableList<Article>
            ...
        

Many To One - Win ✅

HQL/JPQL join fetch


            ...
            @Repository
            interface Repo : CrudRepository<Article, Long> {
                @Query(
                    "select a" +
                        " from Article a" +
                        " join fetch a.author",
                )
                fun joinFetch(): MutableList<Article>

            ...
        

Many To One - Win ✅

Criteria explicit fetch


            ...
            fun findAllFetch(): List {
                val builder = em.criteriaBuilder
                val query = builder.createQuery(ArticleDefaultWinMTO::class.java)
                val entity = query.from(ArticleDefaultWinMTO::class.java)
                entity.fetch("author")

                return em.createQuery(query).resultList
            }
            ...
        

One to Many, Many to Many

  • Triky, které zabírají u Many to One fugnují také
  • Navíc funguje @BatchSize na atributu
  • U kolekcí Set vs. List

Spring Data -- poznamky

  • Magické metody jou criteria query na úrovni JPA a hloubš
  • Ale mají @EntityGraph

Workflow

  • Z monitoringu zjistím bottleneck
  • Bottleneck si reprodukuju v testu/kontrolovaném prostředí
  • Řeším N+1
  • Po nasazení zkontroluji v monitoringu, že jsem bottlenecku pomohl

Závěr

  • Hibernate preferuje fetch join
  • Eager je past
  • Většina článků/odpovědí/chatgpt nezahrnuje vliv všech vrstev
  • Moc vrstev, je potřeba kontrolovat výsledky
  • Pozor na data classes jako entity

Kam dál