Extract Key Information from Documents using LayoutLM | LayoutLM Fine-tuning | Deep Learning
Summary
TLDRThis YouTube video tutorial introduces LayoutLM, a state-of-the-art model for understanding document layouts and extracting entities. It covers the limitations of traditional OCR and NER, explaining how LayoutLM incorporates text and positional information for more accurate document processing. The presenter demonstrates using the Funds dataset to train the model for key-value pair extraction, outlining steps from data preparation with tools like Label Studio to model training and inference using Hugging Face's Transformers library.
Takeaways
- 📄 The video introduces LayoutLM, a document understanding model that excels at extracting entities from structured documents.
- 🔍 Traditional OCR and NER methods struggle with changing document structures, whereas LayoutLM considers both text and layout for better accuracy.
- 💾 The script discusses the use of a dataset called 'funds' to demonstrate how LayoutLM can extract key-value pairs from documents.
- 🖼️ LayoutLM processes images of documents, identifies text, and determines the position of each word within the image.
- 🔎 The model generates embeddings that incorporate both text and positional information to understand document structure.
- 🔧 A Faster R-CNN model is used in conjunction with LayoutLM to detect regions of interest within the document images.
- 📊 The video outlines the architecture of LayoutLM, explaining how it handles text, positional, and image embeddings.
- 🛠️ The tutorial covers the steps to train LayoutLM using the 'funds' dataset, emphasizing the importance of maintaining document structure.
- 📈 The presenter demonstrates how to preprocess data, train the model, and evaluate its performance, achieving 75% accuracy with five epochs.
- 🔗 The video provides a link to a GitHub repository containing the code for preprocessing and training the LayoutLM model.
- 🔎 The final part of the script shows how to use the trained LayoutLM model to infer and extract information from new, unstructured document images.
Q & A
What is the main topic of the video?
-The main topic of the video is about a document understanding model called LayoutLM, which helps in understanding documents and extracting relevant entities.
What does LayoutLM do differently compared to traditional OCR and NER?
-LayoutLM takes into account more information than just text from OCR and named entity recognition (NER). It considers the layout and structure of the document to better understand and extract entities.
What kind of data set is used to demonstrate LayoutLM in the video?
-The data set used in the video is called 'funds', and it is used to extract relevant information like key-value pairs from documents.
How does LayoutLM handle documents where the structure keeps changing?
-LayoutLM helps maintain the structure of documents by keeping the layout information intact, which is crucial as the document structure can change and traditional OCR might fail.
What are the three key pieces of information that LayoutLM uses for training?
-LayoutLM uses text information, the position of the text in a particular image, and the image embedding itself as the three key pieces of information for training.
What role does the Faster R-CNN model play in the LayoutLM architecture?
-The Faster R-CNN model helps detect the region of interest where the words are located within the document.
How does the video demonstrate the process of training the LayoutLM model?
-The video demonstrates training the LayoutLM model by using the 'funds' data set, fine-tuning the model, and evaluating its performance with metrics like accuracy, precision, recall, and F1 score.
What is the significance of the unique labels in the training process?
-The unique labels are significant as they represent the different classes or categories that the model needs to learn to identify and classify during training.
How can one improve the accuracy of the LayoutLM model as shown in the video?
-One can improve the accuracy of the LayoutLM model by increasing the number of training epochs, which allows the model more opportunities to learn from the data.
What is the final output the video aims to achieve using LayoutLM?
-The final output aims to achieve is the ability to extract and annotate information from structured documents, such as invoices, with high accuracy using the trained LayoutLM model.
Outlines
📄 Introduction to LayoutLM Model
The speaker introduces the LayoutLM model, a state-of-the-art document understanding model that excels at extracting entities from various document types. Traditional methods like OCR and NER are mentioned as less effective due to their inability to handle changing document structures. The speaker contrasts this with LayoutLM's capability to understand and extract information while preserving document layout. An example dataset called 'funds' is introduced to demonstrate the model's application in extracting key-value pairs from documents, which is particularly useful in finance and retail sectors. The limitations of OCR for processing structured documents are discussed, emphasizing the need for a model like LayoutLM that can handle layout variations.
🖥️ LayoutLM Architecture and Data Processing
The speaker delves into the architecture of the LayoutLM model, explaining how it processes image documents. The model uses OCR to extract text and word positions, creating embeddings that include both text and positional information. These embeddings are then combined with image embeddings from a Faster R-CNN model to detect regions of interest. The process results in a comprehensive set of features that the LayoutLM model uses for training. The speaker also discusses the use of the 'funds' dataset for training, mentioning the need for GPU resources and the installation of necessary libraries. The process of data extraction and preparation is outlined, including the use of Hugging Face's resources and the structure of the dataset.
🏗️ Building and Training the LayoutLM Model
The speaker describes the process of building and training the LayoutLM model using annotated datasets. The dataset includes text, bounding box information, and labels for various elements within the document images. The speaker explains how to use tools like Label Studio to annotate documents and prepare datasets. The importance of understanding document structure for accurate information extraction is emphasized. The speaker also provides a link to Label Studio and discusses the steps for preparing the dataset, including pre-processing and mapping labels to ID codes.
🔍 Data Preparation and Model Training
The speaker outlines the steps for preparing the data for training the LayoutLM model, including the use of a tokenizer from the LayoutLM library and data loaders from PyTorch. The process involves mapping labels to ID codes and converting the dataset into a format suitable for training. The speaker also discusses the training process, including the use of the Token Classification class from the Transformers library and the model training itself. The training is demonstrated with five epochs, but the speaker notes that more epochs could improve accuracy.
📊 Evaluating and Saving the Model
After training, the speaker discusses the evaluation of the model using a test dataset. The evaluation metrics, including loss, precision, recall, and F1 score, are presented, showing an accuracy of 75% with five epochs of training. The speaker suggests that training for more epochs could进一步提升 accuracy. The process of saving the trained model using PyTorch's `torch.save` method is also covered.
🔎 Inferencing with the Trained Model
The speaker explains how to use the trained LayoutLM model for inferencing on new images. This involves cloning a GitHub repository for preprocessing steps, installing Python Tesseract for OCR processing, and using the trained model to make predictions on new documents. The speaker demonstrates the process of loading the model, processing an image, and visualizing the predictions. The model's ability to classify different elements of the document, such as questions and answers, is shown. The speaker concludes by encouraging viewers to train their own models and seek further clarification in the comments if needed.
Mindmap
Keywords
💡LayoutLM
💡OCR (Optical Character Recognition)
💡NER (Named Entity Recognition)
💡Faster R-CNN
💡Text Embeddings
💡Dataset
💡Fine-tuning
💡Token Classification
💡Inference
💡Hugging Face
💡Label Studio
Highlights
Introduction to LayoutLM model for document understanding and entity extraction.
LayoutLM is a state-of-the-art model that processes documents more effectively than traditional OCR and NER methods.
LayoutLM considers both text and layout information for document understanding.
Demonstration of the Funds dataset used for extracting key-value pairs from documents.
LayoutLM's ability to handle documents with changing structures where OCR might fail.
Explanation of how LayoutLM preserves document structure during entity extraction.
Architecture of LayoutLM, including OCR and positional embeddings.
Role of Faster R-CNN in detecting regions of interest within documents.
Process flow of information within LayoutLM for extracting entities.
Importance of text positioning information in image documents for accurate extraction.
How LayoutLM training is facilitated using the Hugging Face library.
Demonstration of data extraction and preparation for training the LayoutLM model.
Use of Label Studio for annotating and preparing datasets for training.
Explanation of the data processing steps required for training and inferencing with LayoutLM.
Training process of the LayoutLM model using the Funds dataset.
Evaluation of the trained LayoutLM model's performance on test data.
Instructions on saving the trained LayoutLM model for future use.
Inferencing process using the trained LayoutLM model on a new document image.
Potential applications of LayoutLM in finance, retail, and invoice processing.
Final thoughts on the power and utility of the LayoutLM model for document understanding.
Transcripts
hello all and welcome to my youtube
channel so today in this particular
video we are going to see
a very good document understanding model
that is layout lm model which help us to
understand the documents and extract the
relevant entities from the documents
so this is a
kind of a state of art model which is
available
and we are able to process the document
uh in a very easy way
so earlier what happened was to extract
the relevant entities from a tabular
data or from any document
data or any kind of unsafe data we used
to have a
ocr used to do ocr on those kind of
documents and then
do the ner to extract the relevant
entities and then do the necessary
processings to obtain the results in our
required format
but here this layout
takes up the much more information than
just taking up
information from ocr and ner to do the
understanding of a document and
extracting the entities
so we are going to see that so before we
go into the layout
architecture how it is helping us
uh i just want to uh demonstrate a uh
data set that we're going to use which
is called funds t
and in this uh data set or we want to
extract this relevant information that
is uh like this kind of information just
present in this kind of documents so
sometimes this is called as key and
value key and value pairs so likewise uh
we want to extract the information from
a particular document in a real world
scenario
so this might be helpful in the finance
world in a retail world or where you
want to extract the
some information from the invoices so
such kind of information if you want to
extract uh from the tableau kind of data
or such kind of data then our ocr tends
to get failed because ocr gets to start
reading the data in a single line format
and then applying on ocr uh becomes very
difficult right so
to
keep an intact of the structure of the
document
that's where the layout alarm will help
us
so
the ocr will just not help
in the real scenario by just doing a
ocr on this particular kind of document
and just doing the ner because
what happens is uh the document
structure keeps on changing and ocr will
keep on uh iterating new and new
structure right so any uh for for the
for the any other job becomes very
difficult to extract the relevant
entities
because it is not taking care of the
layout information since the layout
keeps on changing according to the
document so to keep the information of
the structure of the document or the
layout of the information of the
document the layout alarm help us to
keep the information intact and this is
how uh
this model will help us but uh uh this
is how this will move the model be able
to establish information like how it has
been shown here so this is what we want
yeah to this scenario right
we just do it with ocr and do uh
necessary anything but
so uh to eradicate such kind of things
such kind of problems of structure
information uh for each and every
document a layout term will help us so
this is a data set we are going to use
uh
for our training uh layout area model
and
now let's let's just jump to the
uh layout lm uh architecture
so let me go through the architecture uh
and this is a brief introduction of it
so you can see this is the architecture
of layout lm model so here you can see a
image document image where we have this
information lying up here
and we want to get the relevant
information from this particular page
from this relevant document right so
what happens is it takes up this
document
and we pre-process this particular
document like we apply ocr above this
document and with ocr we also get to
know the position of the words in a
particular document so let's suppose if
i extract a word from this particular
document a
and then i want to also
extract the information of
uh the word a present in an image
document so that bonding block
information will be uh also be a store
so i just
so this is the information that it
generally takes up uh this layout model
like it takes the word
it will extract the words from this
particular page
and then it also takes the position of
those words in a particular image
now so this information is being passed
into this ocr and this ocr information
will be text will be extracted in the
form of uh text embeddings
and positionability so these positional
medics are nothing but uh the
information of a particular word
and this is a text
which is present in a particular
document so you can see this is the word
e date which is embedding of a date so
this word date
is a text
and it's a particular position of the
word date present in a image so that's
information it takes up
and it takes the text embedding as well
now this particular uh model this layout
lm model will prepare uh embedding
considering these two informations the
text as well as the position and
embeddings
so positional bendings are nothing but
the position of one particular word in
an image so this two information is
being passed into this layout lm model
and and and a layout lm embeddings is
generated
so this embedding consists of the text
information
and
the relevant position information of the
particular word presented in particular
image
all right that's how the embedding of
layout fml model listing process and
parallely this image is also being
forwarded to faster rcl model which help
us to detect the region of interest
where the words are being lying right so
you can see this word date
and its respective
image of this
date word in an in a particular document
is being captured by this faster rcd
model
so that's what it is doing so you can
see this is the layout embedding of date
world and this is the image of a date
word in a particular image that has been
cropped and the embedding of this
image or of the words are being prepared
and that's how the the total embeddings
like the image abilities and the layout
embeddings will be uh calculated and
added up and then it is prepared for the
downstream tabs
so this is how the information flows in
a layout lm model it takes up the
three entities or three informations
that is a text information second is the
position of a text in a particular image
and third thing is
image embedding itself so three
information uh flows into this
particular layout lm model and that's
how the model is getting trained
so this is a general article
architecture of a layout lm model and
this this layout information this extra
information of
the text positioning image help us to
understand the uh image and structure of
a particular image and extract the
relevant entities from the image with a
particular accuracy or with a good
accuracy
so that's how the layout alarm is
working
now
to demonstrate the working of this
faster rcn and
layout lmm model uh we are going to use
this fundsd data set
and we are going to fine tune
uh this fund's data set or on our uh on
our data set by using the uh
layout lm model
so for that uh i'm going to use
uh hugging face uh import uh that is
available over here in the hugging phase
so we can just directly use uh their
model it is it is being available in the
working phase so i'm just trying to
directly use it
so before that we have to just make sure
that gpus are available in our local
moment
and now after this we are going to
import some of the libraries
which are necessary and required for
installing the layout lm models so these
are the dependencies that we need to
install
and once this uh
information is being all libraries are
being installed we can
proceed with the data so data extraction
is also a process
but here i am just directly using the
data that is available on the uh hugging
face so you can just uh download this
data by using this link uh present here
so let me just run this
and once once it get downloaded we can
use this particular data set
for our fine tiering of layout lm model
so we will be able to extract the
entities from
from the image directly without using
without using uh any kind of
uh two steps model or three-step model
uh it will just take up the structure
information
and uh text information and the
immediate totally it will train a model
and we'll get the embeddings right so
this is what uh the flow will be
so now we're just trying to uh get the
data
and once that is done uh we can uh move
on to the uh preparation of this data
so you can see the data has been
downloaded and we can just take a look
at the uh image data that is being
downloaded so you can see the data is in
this format
it's a structure format and we want to
extract this name or this key and the
name the date and its date information
supervisor manager so likewise you want
to extract the information from this
particular uh page so you can you can
understand that this information or this
this structure or this flow of a
prevention of a document might change
and accordingly we have to uh make sure
that model also learn the structure of a
document so that's how the layout model
is helping us to understand the
structure as well as the information
which is present in the structure
right
so let's take the uh data set
which will which are annotated
so you can see
in a data set i am getting a text
rnd this is a text
and it's a bonding box information that
is what is the location in a particular
image
right so you can see rnd what is present
here so its location is
in the image is at this coordinates
and you can see the label it has given
as other so it has been labeled with
other tag right
like we are trying to label this
particular word as in uh as an other
class
and the words
uh information being stored as this and
there is no link right this we are not
providing any kind of relation between
the two words it's just a single token
classification model that we are going
to build so it will just take up the
word and it will classify into a
particular category that will be other
question or or any kind of other kind of
classes right so we are going to uh
produce such kind of things and uh
likewise we are going to produce uh the
whole scenario and we'll build a model
so now let us see a particular image
and draw some information over it this
this embeddings or this particular
labels that we have seen over here in
the in the form of text so we just draw
this particular
labels over this image so you can see
this is a
same image that i have demonstrated
above
but now this is with the annotation so
you can see this particular image has
been or this particular word has been
classified as other tag
and this header
these are the headers this is the
question
and this is the answer similarly this
date has it has been annotated as
question and this is answer so these are
nothing but the classes that are being
annotated by uh
by the tools so we have to use a tool to
annotate such kind of information and to
prepare the dataset accordingly
so you might be an uh
you might be uh in a position to
understand like what we are going to do
we are just going to pass this
particular image and get this kind of
labels over the particular words in a in
a particular document and that's how
we're going to extract
and to annotate this such kind of
documents you can use label studio so
here is the link which you can use it
this is a free tool you can
go to go through this particular
documentation and install the label
studio and prepare some script and
do this kind of prepare the data set
accordingly
so i will provide this link into the in
the description you can go through this
particular documentation and you can
prepare the data set likewise and update
it in the same manner like how it is
being shown here right
so this is how we generally do we
prepare a data set like this so we just
take up the document we drag and drop or
we
pick up the image or word information
and give a class to it
and that's how we annotate it
and once this annotations are done we
prepare the data set and we pro we will
pre-process it right
so to pre proceed there is a uh
there is a
code given by the layout lm
directly so we are just directly using
it to pre-process the annotations
according to the required format that
model accepts so we're just going to run
this particular cell so that the
annotations and
everything get
into the required format
so once that is done uh we are going to
take up the annotations
and we were just going to identify that
what are the unique
labels available so let me just run this
and we'll understand what what is this
unique label means
so you can see it has been saved into
the labels.txt
so let's just go through it and
it will save into label.txt
so you can see these are the unique
labels available
uh so these are these are the classes
you can say answer is a class
header is a class question is a class
and others is a class
such why such such kind of things are uh
the labels which are available you can
see
others is a class letter is a class
question is a class an answer is a class
and these are the unique labels right so
that's what the information we need to
get so these are the unique labels or
unique classes you can say that's what
we have extracted from this
pre-processing uh that we have done
so once this setup is done now we can
start the training offer or we can
process the data set in a form of
pythons and then we can start the
training
so before we process that data uh we
have to make sure that this uh
this unique label uh should be prepared
right
and then
we have to rest just run this particular
cell uh so that we can prepare the data
set in the python format so it will just
take up this particular
data set that that we prepared this
label dataset
unique label dataset and then it will
prepare a
map that it will prepare or it will map
a particular label into an id code
that means it will give a number
uh to a to a label so
we cannot just rightly pass a class
named right in the form of text we have
to convert into the some number right so
that's what it is doing here it's it is
loading up the data it is taking up the
uh this label and it is mapping into the
number right so that's what it is doing
and that's what the simple function is
also helping us to do it so we'll just
run up this code and we'll convert the
labels to the ids
and now we can check on to the labels
how it is there
so you can see these are the unique
labels available right and if you want
to see this label map uh we can check it
also
so you can see uh each each label has
been converted to the
ids is its respective number right so
this two is two represented by this
particular thing and likewise other
labels are uh been being represented
so once this is once this setup is done
then we have to prepare the pythos data
set
so for this we have to import uh
tokenizer from layout lm
uh that is available in transformers and
then we have to import some
classes from the
layout
library or you can say from the from the
resource code so we are import using
that
and the some data loaders from the uh
torch libraries to convert the
particular data to a data loader form
data data loader format right so uh
these are the arguments that particular
model takes up
and
this is uh the uh this is the class that
will take up the uh this uh this
particular uh dictionary that we've
prepared to map the labels
and it will prepare it in an argument
format that's what this class is doing
so this arguments is being prepared
and then we are going to use this layout
lm pretend model uh for the tokenizer
and then once that is done now we are
going to use the response data set from
the transformer
and then we are going to pass this
arguments that we have given here in
tokenizer and the labels that we have
prepared at the top and then pattern
tokens and we have to give the train
mode and like likewise we have to
prepare a data set for the training so
this this whole step is for the
preparing the data set in a form of
for for loading up the data in a pythons
right so this is this is for training
and similar way the test
has been done
so likewise we'll prepare the data set
for uh
training and test by using the pytorch
loader and once that is done we can see
some data set here
uh the length of there
and then we'll print out some data set
that is being prepared uh from this data
loader so it's just gonna run so you can
see this particular
image uh information is being ocr
and
and it has been given an or it has been
tokenized into this particular unknown
tags and the padding has been done so
you can see this is the uh input that we
are going to pass
so once that all the setup is done now
we are finally into the training of the
model
so for that uh we have to just import
this token classification class from the
transformer and then we are going to
load up that
model
from the
transformer library
so this is what we are going to do
and once that is done
we can just run up this
particular code
that has been provided
and we can just start training the model
on the prepared data set
so right now i'm giving the five epochs
to be trained
but if you want more accuracy
or more
accurate predictions you can get up to
more or you can train up to four more
epochs but for this tutorial i'm just
using five epochs for the training so
let's just run this particular result to
train the model
and let's wait for few minutes to get
the bottle drained
okay so model has got trained now we'll
just value the model on the test data
set
so
this is the code that has been written
to
evaluate the data set uh on the trained
model so we'll just run up this
particular cell to get the predictions
or look at the metrics uh for the
evaluation
so
you can see the loss is point seven four
and present prison is 71 percent and
recall is 78 and f1 score is 75
so you can see just with five epochs now
we are able to achieve 75 percent
accuracy but if i want more accuracy i
have to just increase the epochs and
continue training for more epochs right
so once that is done
uh we can save this particular model
and we can just do the torch.save and we
can save it in a dictionary format of
this model state
and then once that is done we can just
do go for the inferencing so we'll just
uh take up a particular image and
we'll pass this image to our trained
model
and
then we are going to see the predictions
of this particular model how it is doing
right
so first uh we have to uh import this or
you have to you can say you have to
clone up this particular uh
github and then we have to install this
particular
pi tesseract why we are trying to
install this python set because whatever
the processing we have done for the
training data set while training the
same processing has to be done for the
new image right
so in the in the processing uh we
haven't seen any kind of uh the major
step that that are being involved
but yeah but internally what it is being
done is white annotation uh we are we
are just taking an ocr so you must
remember this particular
architecture we are taking this image
and
passing it to the ocr getting the uh
text from the ocr and his respective uh
words uh position or from the image so
these are the processing steps uh which
has been done and which has been done in
a particular data set that is funds data
set so we are not aware of it but it has
been done and it has been readily
available right so it becomes very easy
for to train our model but while
inferencing we have to do the same steps
whatever whatever we have done for while
training so the same steps are being
applied so that's why we are using the
pi tesseract uh to process the uh
documentary page that way that is that
is the new major that will be coming up
we'll process it
we'll pass it to the ocr we'll get the
text of that particular text from those
particular images and the respective uh
bounding box information where the text
are present in a particular image so
once that is done we have to run this
environment
so let me restart this moment
and
we have to just
load up this model that we have saved
so
let me just view the image that we are
going to process
okay i think it is not available let me
okay so we have
not processed it let me
take up this particular github that we
have imported right so this is the uh
data that we've imported here so we'll
just take up this uh importer github and
this file which will help us to
pre-process all the processing and that
we have to do it for the new uh text or
new image so that's how the all the code
is being written if you go through this
particular code all the pre-processing
that has to be done is being given in
this particular uh pi file so we're just
writing going to use it so for that
reason i have just imported this
particular or cloned this particular
repository and got this particular file
to process the same thing whatever we
have done for the
training and we just import this
and
let it just let it get imported and once
that is then we are able to
see the image the new image you can see
this is the new image
that we are passing it to which which is
not uh trained in the model so we are
going to pass this particular new image
to the model and want to extract this
information right so we'll do up this uh
by by uploading up the model that we
have trained so it has the model has
been saved here you can see the current
directory
out lm dot pt so i'm doing the same
model
and
and and try to load up the model uh with
the with the labels that we have trained
right so i think we haven't
uh we we don't have the numbers of the
labels
so let me just run this particular and
then we'll check up the particular
things you can see yeah it's a it's
giving up the error so let me go
and run a particular cell so that we can
get the number of labels right
so
yeah here it is so we want to get this
number of labels so i will just run up
this code
to get the numbers now we'll get back to
the same step
to look up the
or to render or to influence the
particular
image right so you can see this model we
have loaded now we will pass this uh
this particular image a new image to
pre-process it so you can see uh this
pre process is coming from this
particular uh
import that we have done this particular
uh preprocess file so this is from where
the print process is coming up this
function is coming up and we are passing
an image
so what is it is returning is you can
see image it is returning the image it
it returns the word it returns the
boxes
that means the
boxes that has been uh processed that
means uh us our scaled uh
which we are not using the actual
information of the image but we are
doing it or scaling it into some uh
information right of the image to make
it in a same information level so we can
say standardization we are doing it
right and these are the actual box
information that's the information i was
talking about right when we do the
pre-processing uh this is what happens
actually it gets the image of the word
the text information the bonding box
information and its actual bonding box
information right so that's what we
generally get from the processing and
then we pass this
pre-processing word
and convert this into the into the
features that means we are doing the
encoding uh using a
layout element tokenizer so whatever the
step we have done here uh
like encoding and all everything that we
have done here uh in this step right uh
here you can see
we use this tokenizer in order to encode
this particular text information that
has been coming so the same steps are
being given inside this particular
layout lm process dot pi file
so we are uh using this convert two
features if you go into this layout
element processor you can see the
function is you can see the previous
function
and the convert to feature
uh
is present you can see it is uh
tokenizing the input
whatever we are doing it and it is
giving the uh tokenized information
right so that's what it is doing you can
see the i am passing the information
process information to the tokenizer and
it is getting the encoding uh for
for the model to do the protections so
once that processing is done
and
the predictions are being happening so
once that is done we are able to operate
the model
so once that is in you can see the
the the encoding is being done and all
the predictions are being happened and
now we are just going to check on or
visualize it uh on the particular image
right so these are the predicted images
so these are the predicted information
on the particular image
so you can see the model is able to
print this is a question this is the
answer is the question is the answer the
header and this is the other so pay
violent points others other class
you can see this this information is
being uh
pretty nicely you can see there are a
lot of there is some missed information
also being predicted but yeah it can be
improved if we train for a longer time
right it we are just streaming for five
epochs so right another predictions are
uh getting uh observed but yeah we can
train it for a longer time and we can
get the right predictions and we can
just get the information saved into the
required format in adjacent format
so that's how
we can train a particular layout lm
model and get the information from a
structured document
uh getting the structure information of
the document as well into a particular
document
so that's how it will help in
understanding document and extracting
the information from any kind of
document any kind of structure document
any kind of tabular data document
any kind of invoice document any kind of
unsuch document which you want to
extract information
so that's how
this leotard model is powerful
and this is how we can use it is how we
can train it uh the whole uh
code will be as well in the description
so you can just go through the code and
you can just train your own model and
let me know if you have any doubts in
the comments right so thank you this is
all about this particular video
and
if you like my channel please subscribe
to chat
thank you
Посмотреть больше похожих видео
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Train and use a NLP model in 10 mins!
Automatic number plate recognition (ANPR) with Yolov9 and EasyOCR
Image classification + feature extraction with Python and Scikit learn | Computer vision tutorial
YOLOv8: How to Train for Object Detection on a Custom Dataset
Text to Image generation using Stable Diffusion || HuggingFace Tutorial Diffusers Library
5.0 / 5 (0 votes)