Data Loaders (the N+1 problem) - GRAPHQL API IN .NET w/ HOT CHOCOLATE #6
Summary
TLDRThis video script discusses optimizing GraphQL application performance by integrating a DataLoader to address the 'n plus one problem'. It covers removing unnecessary joins, fetching data only when requested, and demonstrating how DataLoaders can batch multiple queries into a single database request, ultimately improving efficiency.
Takeaways
- π§ The video discusses performance considerations when integrating a GraphQL application with a database, specifically focusing on query optimization to prevent frequent or inefficient database hits.
- π€ The 'n plus one problem' in GraphQL is highlighted, where multiple queries are made for each item in a list, leading to performance degradation.
- π To address the n plus one problem, the video introduces the use of a DataLoader to batch multiple queries into a single database request, improving efficiency.
- π The script demonstrates removing unnecessary joins in the database queries to prevent over-fetching data when not required by the GraphQL client.
- π The importance of conditional data fetching is emphasized, where data is fetched only when requested by the client to avoid unnecessary database hits.
- π§ The process of creating a service and repository for instructors is shown, to handle database operations related to instructor data separately from courses.
- π The concept of using a DataLoader is further explained by creating an 'InstructorDataLoader' that batches queries for multiple instructor IDs into one database call.
- π The script covers the implementation of the DataLoader in the GraphQL resolver, showing how to pass the DataLoader into the resolver and use it to fetch data.
- π The video explains how to pass data from one resolver to a nested resolver using a property on a class, which can be included or excluded from the GraphQL schema.
- π The process of creating a method in the repository to handle batch fetching of instructors by multiple IDs is detailed, which is crucial for the DataLoader to function.
- π The script concludes with a demonstration of the DataLoader in action, showing how it reduces the number of database hits from multiple to a single query for all required data.
Q & A
What is the 'n plus one problem' in the context of GraphQL?
-The 'n plus one problem' in GraphQL refers to a performance issue where a single query returns a list of items, and for each item in the list, an additional query is made to fetch related data. This results in n+1 queries being executed, which can significantly degrade performance.
Why is over-fetching data a concern in GraphQL?
-Over-fetching data in GraphQL is a concern because it can lead to unnecessary network traffic and database load, which can impact the performance of the application. It happens when more data is requested than what is actually needed by the client.
How does a DataLoader help in improving GraphQL query performance?
-A DataLoader helps in improving GraphQL query performance by batching multiple requests for the same resource into a single query to the database. This reduces the number of database hits and solves the 'n plus one problem' by fetching all related data in one go.
What changes were made to the Courses repository in the script?
-In the script, the joins on the Instructor and Students tables were removed from the Courses repository to prevent unnecessary data fetching when the client does not request that data.
Why can't the properties for Instructor and Student be removed from the Course DTO?
-The properties for Instructor and Student cannot be removed from the Course DTO because they are used to describe the relationships between tables in the schema of the application, which is necessary for the GraphQL framework to understand the data structure.
How is the Instructor data fetched in the script's resolver?
-The Instructor data is fetched in the resolver by implementing a service that hits the database to retrieve the Instructor information only when the GraphQL client requests it, thus avoiding over-fetching.
What is the purpose of creating a separate Instructor repository in the script?
-The purpose of creating a separate Instructor repository is to encapsulate the data access logic for Instructor entities and to facilitate the use of DataLoader for batching queries to improve performance.
How does the script demonstrate passing data from one resolver to a nested resolver?
-The script demonstrates passing data from one resolver to a nested resolver by defining a property on the query type that can be used in the nested resolver, allowing the Instructor ID to be passed down to the Instructor resolver.
What is the role of the Instructor DataLoader in the script?
-The Instructor DataLoader in the script is used to batch multiple requests for Instructor data into a single database query, thus solving the 'n plus one problem' and improving the efficiency of data fetching.
How does the script address the potential inefficiency of making two separate database queries?
-The script addresses the potential inefficiency by using a DataLoader to batch all the Instructor queries into one database request, reducing the number of database hits and improving performance.
What is the trade-off mentioned in the script regarding the use of joins versus separate queries?
-The trade-off mentioned is that while using joins might be more efficient when querying for related data, making separate queries can be faster when the related data is not needed, as it avoids unnecessary data fetching.
Outlines
π Optimizing GraphQL Queries with DataLoader
In this segment, the focus is on optimizing a GraphQL application's performance when querying a database. The issue of over-fetching data and the 'n plus one' problem are discussed, where multiple queries are made for each item in a list. The solution proposed is the use of a DataLoader to batch these queries. The speaker demonstrates how to integrate a DataLoader into a GraphQL application, starting with removing unnecessary joins in the database queries to prevent over-fetching. The process involves modifying the repository to fetch data only when requested by the client, thus avoiding unnecessary database hits. The speaker also shows how to implement database logic in a service and use the Instructor repository to fetch data efficiently.
π Passing Data Between Resolvers in GraphQL
This paragraph delves into the mechanics of passing data between resolvers in a GraphQL application. The speaker explains how to pass the instructor ID from the root courses resolvers to nested resolvers using a property on the query type. Initially, the instructor ID is considered for GraphQL ignore, but the speaker reconsiders and decides to leave it in the schema for direct querying. The process involves defining a property on the query type, mapping the instructor DTO to an instructor type, and setting the instructor ID in the course type. The speaker also demonstrates the 'n plus one' problem by executing a query for all courses and observing the number of times the database is hit, highlighting the inefficiency of multiple queries for each item in a list.
π Implementing DataLoader to Solve n plus One Problem
The speaker introduces the DataLoader concept to address the 'n plus one' problem, which involves making multiple queries for each item in a list. The DataLoader is implemented by creating a new folder for data loaders and defining an Instructor DataLoader that inherits from BatchDataLoader. The DataLoader is designed to batch all queries for instructors into a single database request. The speaker demonstrates how to use the DataLoader in the course type resolver, showing how it batches queries for instructor IDs into one query. The implementation involves registering the DataLoader in the Startup.cs file and using it in the course type to fetch instructors efficiently. The speaker also discusses the potential for using DataLoaders when dealing with data from different databases or APIs, emphasizing their utility in such scenarios.
π Summary of Database Query Optimization Techniques
In the final paragraph, the speaker summarizes the steps taken to optimize database queries in a GraphQL application. The process included removing unnecessary joins in the courses repository to speed up queries when instructors are not needed, demonstrating the trade-off between speed and efficiency based on query requirements. The speaker also discusses the use of DataLoader to batch requests for instructor IDs into a single database query, effectively solving the 'n plus one' problem. The speaker emphasizes the importance of using DataLoaders in scenarios where data is spread across different databases or APIs, and invites viewers to apply these techniques to their own applications. The segment concludes with an invitation for feedback and a reminder to subscribe or become a member for more content.
Mindmap
Keywords
π‘GraphQL
π‘Database Integration
π‘N plus One Problem
π‘DataLoader
π‘Hot Chocolate
π‘Entity Framework
π‘DTO (Data Transfer Object)
π‘Resolver
π‘Batch Data Loading
π‘Green Donut
π‘Service Attribute
Highlights
Integration of a hot chocolate GraphQL application with a database leads to performance considerations.
GraphQL can suffer from the 'n plus one problem' where multiple queries are made for a single result.
Data loaders are introduced to solve the inefficiency of multiple queries in GraphQL.
Removing unnecessary joins in the courses repository to prevent overfetching data.
Adjusting the course DTO to return null for instructor and student properties when not queried.
Demonstration of fetching instructor data only when requested by the GraphQL client.
Introduction of a new instructor repository to handle database operations for instructors.
Using the instructor repository to fetch instructor data in the course type resolver.
Passing the instructor ID from the course DTO to the nested resolver for fetching data.
Discussion on the trade-off between removing joins and making separate database queries.
Implementation of a custom data loader to batch multiple instructor queries into one.
Creating a 'get many by ids' method in the instructor repository for batch querying.
Registering the instructor data loader in the startup configuration for use in the application.
Using the data loader in the course type to load instructor data efficiently.
Explanation of how to pass a list of keys to the data loader for batching multiple requests.
Demonstration of the n plus one problem and its solution using the data loader.
Summary of the process, including the removal of joins, resolver adjustments, and data loader implementation.
Highlighting the importance of data loaders in scenarios where database joins are not possible.
Encouragement for viewers to apply the demonstrated techniques to their own applications.
Transcripts
last time we integrated our hot
chocolate graphql application with a
database now whenever you integrate with
the database you have to start thinking
about your application's performance
when executing queries against the
database are we executing queries too
often are our queries not optimized are
we over fetching data and this is
definitely a big problem in graphql
because sometimes we suffer from
something called the n plus one problem
where we make one query and then we get
a bunch of results and say it's a list
of results well then for every item in
that list we have to make another query
so if you get n items back from the list
you're gonna have to make n more queries
and in graphql we solve that with a data
loader so in this video i'm going to
make some performance adjustments and we
are going to integrate a data loader to
make our queries more efficient so first
off i want to head into the courses
repository and what i did last time was
i included all of this data so what this
is telling entity framework to do is to
execute joins on the instructor table
and the students table but what if i
query for a course and i don't ask for
that data back well then i'm doing this
join for no reason so for this demo i'm
gonna get rid of these joints so let me
just clear those out and all we're gonna
get back is the courses now this is kind
of weird because we're returning this
course dto and it has properties for the
instructor and student and they're going
to be null so ideally i would just
remove these but i can't do that because
i'm using all these dtos to describe my
nad framework schema so these describe
the relationships between my tables so
can't remove those so ideally i would
map this to like some other object where
i didn't have those properties but
anyways we're not going to be including
that data it's going to be null and same
for get by id and that should be
everything we have to change here so now
let's head into our query and as you can
see whenever we get a course
we're grabbing data from our course dto
which has an instructor dto but this
instructor dto is going to be gnaw so we
can't really do this let's just let's
just get rid of that and same for get
course by id let's get rid of that too
so now our course type is going to
return null for the instructor which is
okay if we're not querying for the
instructor but if we are querying for
the instructor we're going to get an
exception because we can't allow this
value to be null plus we should return
the data because the client is asking
for it so what we're going to do is
fetch the data in our course type so
we're going to open this up as a
resolver and inside here we're going to
hit our database and get the instructor
back so essentially what we're doing is
only fetching the instructor if the
graphql client is asking for the
instructor so no overfetching but we are
gonna have to hit our database again
here
so let's actually implement that
database logic i'm gonna put this into a
service let's put a new folder in here
i'll call this instructors and similar
to how we had a courses repository we're
going to have an
instructor's
repository there we go and in fact i'm
just going to copy
some of this stuff from my courses
repository because we're going to need
this db context factory and even some of
these methods are going to be the same
so i'm just going to copy everything
from get by id up to the context factory
field
and paste that in my repository and
let's go through and rename everything
import everything we need so all good
here constructor that takes our context
factory and then we're not going to have
a get all method we're going to have
something similar in the future but we
are going to have a get by id and this
is going to return an instructor dto
we're going to pass in an instructor id
we'll create our db context and then
we'll take our instructor's db set on
our db context
and we'll get the first instructor where
their id matches the instructor
that we pass in that should be all good
for our repository this is all going to
change once we implement our data loader
to optimize our data fetching but let me
head into my startup.cs and add the
instructors repository and import that
and now we're ready to use this
repository
to fetch the instructor in our course
type so we're going to need our
instructor repository passed into our
resolver i think having it as a
parameter and using the service
attribute is going to be the easiest way
to do this so we don't have to have a
constructor and pass that in it'll just
get injected into this method by hot
chocolate so we'll take the instructor
repository
that's what we want as the parameter
here and this is now going to be an
async method because we're hitting our
database and all we're going to do is
take our repository
and get an instructor by id and pass in
the instructor id and return that from
this method but the instructor id we
don't even have that how are we going to
get that so essentially what we have to
do is pass the instructor id on this
course dto from our root courses
resolvers
down to our course type resolver so the
way to pass data down to our nested
resolvers is actually defining a field
on our query type that we can use in
this resolver so we want the instructor
id in this method we're just going to
define a property for it so a property
it's going to be a good because it's the
instructor id we'll call it that
instructor id now since this instructor
id
is just a property that we're using to
pass the id down to this resolver i
don't really want this as part of my
graphql schema so i am gonna have a
graphql ignore here so it's ignored from
the schema
because if they want the instructor id
then they can just query for the
instructor actually now i think about it
if they want the instructor id
why not just let them query that so that
they don't have to query the entire
instructor so on that note maybe we can
just leave it as part of our schema
don't have to ignore it but anyways now
that we have the instructor id as a
property on this class we can pass that
to our get by id method on our
repository inside of this resolver and
we get an error here uh because i forgot
we have to do our mapping so this
returns an instructor dto and we have to
map that to an instructor type so this
mapping is pretty straightforward all we
have to do is instantiate our instructor
type and pass in all of these properties
so the id comes from our dto and i think
we got three other properties on here
first name last name
and salary grab those out of our dto i
really wish i had automapper set up
we're going to do that eventually but
anyways now we're ready to demonstrate
the n plus 1 problem so i'm going to put
a breakpoint
right here in this instructor resolver
and i am going to hit
the query for all courses and let's see
how many times we hit this breakpoint
all right so first i'm going to create a
bunch more courses
so try and get some more data in my
database they all have the same name but
that's fine just want to demonstrate
this
and now we're going to get all courses
so let's execute that all right so we
hit a breakpoint in our repository we're
going to get the instructor
and oh the instruct oh i know what
happened so the instructor id
is just an empty guide because we never
set the instructor id
on our course type so we never set this
property the way we need to do this we
have to actually pass that instructor id
down that is gonna be
in our courses query so we're gonna have
to set the instructor id to
the instructor id that we get back
from our course dtos and same thing for
our other resolver down here let's try
this again whoops wrong property name
here fix that and let's get courses all
right so we hit the breakpoint the
instructor id was set that's better so
let's continue all right so this is the
second time third time fourth time fifth
time six time seven all right so seven
times i think we hit the database so
that explains that n plus one so hitting
the database end times
for the instructor of a course and then
hitting the database one more time just
to get all the courses so ideally i
don't want to do this i want to hit the
database once to get all the courses
and then hit the database just one more
time to resolve all the instructors that
i need
for the courses so the way to do this is
to use a data loader so that we can
batch all the instructor queries into
just one database query for all of our
courses so data loaders are actually
built into the hot chocolate package i
think it implements the data litters
using a sibling package called green
donut continuing the little name theme
we got going on here with hot chocolate
strawberry shake etc anyways let's
create a new folder in our project for
data loaders i assume you could put this
in your services folder too but i'm just
going to drop it in a new folder and
inside here
we are going to have our instructor data
loader so create that and we're going to
inherit from batch data loader so that
makes sense because we're batching all
of our queries for instructors into just
one
request to our database and this is a
generic type it takes a key type and a
value type so the key type is going to
be a guide because we're querying for
instructors by id and the id is a good
so that is our key and the value we're
getting back is
our instructor dto so import that and
let's implement this class and we also
are going to have to generate a
constructor to pass down to this base
class so let's do that and then this
class is also going to have to use
our instructor repository because
obviously we're gonna have to hit our
database to actually get our instructors
back so let's get a field for that going
and just resolve that in the constructor
so add that as a parameter and now we're
good to go so this is gonna be an async
method because we're going to hit our
database and we get these keys passed in
as a parameter and these keys represent
all the instructor ids
that we want to batch query for so we
are going to pass all of those keys to
our instructor's repository so instead
of get by id we'll have another method
that can take multiple keys and we'll
call this get
many by ids and just pass in our
instructor ids so we're going to get
back an i enumerable for all the
instructor dtos so now let's generate
this so this is going to be async i
guess i should have awaited that method
in our data later so it would have
generated all that but anyways we're
going to take our db context again
so let's just paste that in there we're
getting our instructor ids passed as a
parameter and we're going to use these
to get all the instructors for these ids
so obviously instead of first or default
we are going to use a where clause here
and we only want the instructor if the
instructor's id is inside of our
instructor ids
list here so we want to check if the
instructor ids list contains the
instructor id and then we'll execute
that query with two list async so this
will do the batch query for all the
instructors so we should only hit this
once to get all of our instructors now
back in our data later let's actually
use this so we're gonna have to await
this and now all we have to do is
convert these instructors to a
dictionary that maps the instructor id
to the instructor so that's super easy
with some link we can just take the
instructor's eye in the umbrable and do
a to dictionary and the key for the
dictionary is going to be the instructor
id all right that is epic so let's go
into our startup.cs we're all done with
our data loader
and are going to register this
as scoped so the instructor
data loader and now that is registered
let's head back to our course type and
we are going to get our instructor data
loader resolved in here and we're going
to use that data loader and that has a
method to load async so load async takes
the id of the instructor that we want to
load so when we hit this resolver like
seven times end times or whatever the
data letter is going to batch all of the
queries for the instructors into just
one query but this method also takes a
cancellation token so let's just pass in
an empty one so we can do cancellation
token
import that
and just specify none and then the other
thing i wanted to point out before we
try this out which is super exciting
is that you can also pass in
a list of keys right here obviously we
don't have to do that now because
there's only one instructor but we would
definitely want to do that if we wanted
to bachelor these students we would want
to pass in all of the student ids into
this load method although i don't even
think that would be efficient based on
our schema and how the students
relationship is set up let's finally run
this let's start the api let's execute
this let me put one more breakpoint in
the instructor resolver and let's
execute all right so we're hitting our
data later one two three four five six
now we are hitting our database to get
all of the instructors for all of our
courses so instructor ids there's only
one in here and i think that's because
all of our courses have the same
instructor so this actually isn't going
to be as fun as i thought but regardless
we're only hitting our database once to
load all of our course instructors so
let's continue uh we timed out okay but
that's okay let's run it again let's see
only hitting the database once all right
there we go continue
one hit on the database and we get all
of the instructors back so yeah we only
have one instructor maybe we can add
another one real quick for fun so
another instructor sean singleton
100 salary let me just copy this old id
paste that in here change it around a
little bit let me copy it as well
let's save this table i think we're good
let's put a breakpoint back in our
repository let's start this we're gonna
have to create a new course
that points to
our other instructor so we can do that
set the instructor id there we go
created and now let's get our courses so
hit that breakpoint and we got two
instructor ids now so we batch load all
of those in just one database query and
there we go got back all of our data so
we've solved the n plus one problem
instead of n plus one database queries
we now have one database query to get
all the courses and then another single
database query to get all of the
instructors for the courses so just to
summarize in our courses repository
we've removed those joins on the
instructors and students table we didn't
really have to do that i feel like the
joints would definitely be more
efficient than making two separate
database queries so one for all the
courses and one for all the instructors
but still if we're not querying for the
instructors then this is definitely
gonna be faster now i guess it's all
kind of like a trade-off so if you don't
want the instructor then now this is
faster but if you do want the instructor
then
it's gonna be slower because we have two
queries regardless i just want to show
off data loaders because sometimes you
just have to use a data later like what
if our instructors were in a completely
different database or on a completely
different api then joining the data on a
database query wouldn't even be possible
so we would have to use the data later
in that case anyways i'm just rambling
at this point so we remove the joins on
the course's query and now we query the
instructor in a resolver i also
demonstrated how you can pass data
from one resolver to a nested resolver
using a property which you could also
ignore from your schema if you wanted to
and then demonstrated the n plus one
problem which we eventually solved by
creating our custom data letter that
batches a request for all of our
instructor ids into just one database
request so hopefully you can apply this
to your own application so that you're
not hitting the database n times because
who knows n could be like a thousand
definitely want to use a data later
anyways if you have any questions
criticisms or concerns be sure to leave
them below in the comments section if
you enjoyed the video or enjoying the
channel consider becoming a member link
in the description other than that leave
a like or subscribe for more thank you
Browse More Related Video
What Could Go Wrong with a GraphQL Query and Can OpenTelemetry Help? - Budhaditya Bhattacharya, Tyk
What Is GraphQL? REST vs. GraphQL
Why GraphQL within CMS Projects can COST you thousands!
Entity Framework Setup - GRAPHQL API IN .NET w/ HOT CHOCOLATE #5.1
useEffect to the Rescue | Lecture 140 | React.JS π₯
Learn useMemo In 10 Minutes
5.0 / 5 (0 votes)