Invoice Extraction: Extract PDF Invoice to Excel with UiPath
Summary
TLDRIn this instructional video, Thomas from Tombstack Academy demonstrates a straightforward method to extract invoice data using UiPath, without the need for regular expressions. He guides viewers through the process of reading a PDF invoice, extracting metadata such as invoice number and date, and processing invoice lines. The tutorial includes creating directories for input and processed invoices, utilizing UiPath's PDF activities, and exporting the extracted data into an Excel file for further analysis, ensuring a user-friendly approach suitable for beginners and efficient for processing multiple invoices.
Takeaways
- 😀 The video is a tutorial on extracting invoice data using UiPath without regular expressions.
- 📄 The example invoice has an invoice number 100, a date, and various invoice lines including decorative clay potteries.
- 🔍 The presenter, Thomas from tombstack Academy, demonstrates how to read and process a PDF invoice in UiPath.
- 📁 The video includes instructions to create directories for input and processed invoices to manage files effectively.
- 🛠️ The tutorial requires installing the 'UiPath.PDF.Activities' package for PDF text extraction functionalities.
- 📝 It explains how to use 'Read PDF Text' activity to extract text from the PDF and write it to a text file for analysis.
- 🔑 The method relies on the consistent use of double spaces between important data elements like quantity, product description, and price.
- 📑 The script details how to extract specific data like invoice number and date using 'Text to Left/Right' activities in UiPath.
- 📊 The presenter shows how to convert extracted text into a structured data table using 'Generate DataTable from Text' activity.
- 🔄 The process includes dynamically processing multiple PDF files by setting up a 'For Each File' activity.
- 📊 The final steps involve moving processed files and saving extracted data into an Excel file with appropriate sheet names.
Q & A
What is the main topic of the video?
-The video is about extracting invoice lines and metadata from an invoice using UiPath without regular expressions.
Who is the presenter of the video?
-The presenter is Thomas from Tombstack Academy.
What is the invoice number processed in the video?
-The invoice number processed in the video is 100.
What is the date on the invoice shown in the video?
-The invoice date is the first of January 2023.
What product was ordered in the invoice example?
-The customer ordered 100 decorative clay potteries.
How much does each decorative clay pottery cost according to the invoice?
-Each decorative clay pottery costs 13 units of currency per piece.
What is the method used in the video to extract the text from the PDF?
-The method used is the 'Read PDF Text' activity in UiPath, which reads the entire text of the PDF into a variable.
How does the video suggest to handle multiple spaces in the PDF text?
-The video suggests using a simple method where the number of spaces between words is consistent, and replacing these spaces with a unique symbol like a pipeline for easier extraction.
What activity in UiPath is used to split text based on a separator?
-The 'Text to Left/Right' activity is used to split text based on a custom separator.
How can the extracted data be saved into an Excel file in the video?
-The video demonstrates using the 'Write Cell' and 'Write DataTable to Excel' activities to save the extracted data into an Excel file.
What is the purpose of moving the processed PDF to a different folder?
-The purpose is to prevent the processed file from being touched again and to keep the workflow organized.
How does the video ensure the workflow processes multiple files?
-The video shows how to make the file selection dynamic by using 'For Each File' and 'For Each Folder' activities, allowing the workflow to process all PDF files in a specific folder.
What is the condition for using the simple extraction method shown in the video?
-The condition is that the spaces between different elements of the invoice lines, such as quantity and product description, must be consistent, typically two spaces.
How does the video demonstrate extracting invoice number and date?
-The video uses the 'Text to Left/Right' activity with specific separators to extract the invoice number and date from the PDF output.
What is the final output of the workflow shown in the video?
-The final output includes an Excel file with two sheets: 'invoice lines' containing the details of the invoice items and 'invoice info' containing the invoice number and other metadata.
Outlines
📑 Introduction to Invoice Processing with UiPath
In this video, Thomas from Tombstack Academy demonstrates how to extract invoice lines and metadata from a PDF invoice using UiPath without regular expressions. The video begins with an introduction to the sample invoice, which includes an invoice number, date, and various invoice lines. Thomas guides viewers on how to download the invoice, set up directories for input and processed files, and navigate to UiPath to begin the automation process. The focus is on using the 'Read PDF Text' activity from the UiPath PDF package to extract text, which is then written to a text file for analysis.
🔍 Extracting Text and Data from PDF Invoices
This paragraph details the process of extracting specific information from the PDF invoice text. Thomas explains the importance of consistent spacing between words to facilitate the extraction method used. He demonstrates how to use the 'Text to Left/Right' activity to extract the invoice number and date by specifying separators and saving the results to new variables. The method relies on identifying patterns in the text, such as two spaces between quantities and descriptions, to accurately split the text into usable data.
📊 Converting Extracted Text into a Data Table
The speaker proceeds to show how to convert the extracted invoice lines into a structured data table. He uses the 'Generate DataTable from Text' activity, adjusting the column separator to a unique symbol to ensure correct data parsing. After replacing double spaces with a pipeline symbol for clarity, the invoice lines are successfully split into quantity, product name, price, and total line price. The resulting data table is then saved for further use in the automation process.
🔄 Automating File Processing and Excel Output
The final paragraph outlines the automation of processing multiple PDF invoices and dynamically writing the extracted data to an Excel file. Thomas demonstrates how to use a 'For Each File' activity to loop through PDF files in the input directory, read their content, and save the extracted invoice number and data table to an Excel sheet named 'Invoice Info' and 'Invoice Lines,' respectively. The automation also includes moving processed files to a 'Processed' folder and ensuring that the Excel file path is dynamic, allowing for the processing of any number of invoices.
👋 Conclusion and Future Tutorials
Thomas concludes the tutorial by summarizing the steps taken to extract and process invoice data using UiPath. He confirms that the method demonstrated is the simplest but may not work for all invoice formats. He mentions an alternative tutorial on his channel that covers extracting invoices with regular expressions for viewers whose documents do not fit the criteria for the method shown in this video. Thomas expresses hope that the tutorial was helpful and looks forward to engaging with viewers in his next video.
Mindmap
Keywords
💡UIPath
💡PDF Invoice Extraction
💡Metadata
💡Regular Expressions
💡Text File
💡Variable
💡Data Table
💡Excel
💡Dynamic
💡For Each
💡Message Box
Highlights
The video demonstrates a method to extract invoice lines and metadata from a PDF using UiPath without regular expressions.
Introduction to the channel 'tombstack Academy' by Thomas, the presenter.
Overview of the sample invoice with an example of 100 decorative clay potteries, their price, and tax.
Instructions on downloading the invoice and accessing the template for creating custom invoices.
Step-by-step guide to set up the workflow in UiPath, starting with adding the PDF package.
Explanation of creating directories for input and processed invoices within UiPath.
Demonstration of reading PDF text using UiPath's 'Read PDF Text' activity.
How to write the output text to a file for further analysis.
The importance of consistent spacing in the PDF for the extraction method to work effectively.
Technique to extract the invoice number using 'Text to Left/Right' activity in UiPath.
Using message boxes to display the extracted invoice number for verification.
Method to refine the extraction process to avoid capturing unwanted text.
Process of extracting invoice line items by identifying specific separators in the text.
Conversion of extracted text into a structured data table using 'Generate DataTable from Text'.
Adjusting the column separator to a unique symbol for accurate data table generation.
Automating the workflow to process all PDF files in a folder and move them post-processing.
Integrating Excel activities to save extracted data into an organized spreadsheet.
Final walkthrough of the complete workflow from PDF extraction to Excel output.
Alternative method mention for invoices that do not fit the criteria for the demonstrated extraction technique.
Conclusion and invitation to the next video in the series.
Transcripts
PDF invoice extraction with uipad
doesn't have to be difficult in this
video I will show you how you can
extract the invoice lines and as well
the metadata from an invoice which
uipath without using regular Expressions
if you're new to the channel my name is
Thomas and you're watching tombstack
Academy let's start right away
the invoice that we're going to process
today is this one
so you see that we have an invoice
number 100 we have an invoice a date
first of January 23 and then we have all
kind of invoice lines so in this case
the customer ordered a 100 decorative
clay potteries for 13 per piece
I see here as well the subtotal the
total price and the sales tax there is a
link in the description of this video
where you can download this invoice and
there is also a link to the page where
you can find the invoice template so you
can make the invoice yourself in word
and then extract it to a PDF so let's
download the invoice and I already have
it here so I'm going to copy it
and I'm going to navigate to uipath
click on the main.xml open file location
and then I'm going to create a new
directory here new folder
let's call this one invoices
and let's make an input folder
and a folder for invoices which have
already been processed
like this
I'm going to move this invoice to the
input folder and as soon as your advert
has processed it you have it will move
it then to the process folder so it
won't be touched again so let's open UI
pads and let's start with adding the PDF
package
manage packages make sure that you have
all packages here and then let's search
for PDF
the one that we need is this one UI bit
dot pdf.activities so click your
uninstall and just install the latest
version
then navigate to activities search for
and the one that you're going to need is
read PDF text
so they get put in the main sequence
and then let's also take the file that
we want to process for now I'm not going
to make it Dynamic I'm just going to
select this file and later I'm going to
show you how to make everything Dynamic
so we can process multiple files instead
of just one
um if you click on read PDF text and you
navigate to the properties panel so
that's this one here
then you see as well output text here
I'm going to press Ctrl K to create a
new variable
and let's call this one PDF
outputs
like this
and then I'm going to write to a text
file
my text file
I'm going to write the entire text of
this PDF a to a text file
so use a variable PDF output
right to file name let's open the
advanced editor
and let's call this one PDF oh sorry
double quotation marks pdf.txt
and now run your robot
and if all went well when you navigate
to project you should now see a text
file here
pdf.txt open it and now you will see
what your iPad sees when it reads a PDF
file and or test for today is to see how
we can extract this text and how to make
sense of this data
what's really important is that the
method that we're using today is the
simplest method to extract PDFs but it
only works
so you see that this product consists of
multiple words and between every word
there is one space
but between between for example the
quantity and the products there are two
spaces right
I see also that here between the price
and the product are also two spaces and
between the product price and the Total
Line price are also two spaces
and this is a precondition if you want
to use this method so let's say that you
have for example a decorative clay
pottery here but here you have one space
and then here you have also one space
then it's not possible to use this
method
if you have for example two or three or
four spaces here it doesn't matter how
many but it should be equal all the time
then you can use this very simple method
okay let's start with extracting the
invoice number and you can do that you
see that the invoice number is 100 so
I'm just going to extract the text in
front of it
this one is Ctrl C I'm going to go to
uipath
and the activity that you're going to
need is
a text to left right so just search for
left
text to left slash right take it here
text to split
is variable that's the PDF output
and then the separator I'm going to say
custom and the separator in this case
is going to be this text that was in
front of the invoice number
and then I'm going to say that I want to
save the text to the right
and I want to save that in a new
variable let's call this one invoice no
okay let's try this out let's use a
message box
and let's put invoice number in a
message box
so you will see that we get the exact
invoice number that looks good but the
robot doesn't stop there it also takes
everything underneath the invoice number
and we're gonna change that
so let's use another text to left right
and now we're not going to start with
PDF output but now we're going to start
with invoice number
so text to split is now equal to invoice
number
and the separator
is a new line and we're going to take
everything on the left side of that new
line so his variable inverse number
and now you will see that it will work
and now we only have the invoice number
100 of course if you want to extract the
date you just take the date instead of
the invoice number
just take this part and you can use the
text left slash right to extract this
information okay I think you're most
interested in extracting the exact line
items of this invoice so I'm going to
show you how to do that so go back to
pdf.txt
and then take this line
and copies
and then we're gonna do another text to
left right
X to left right
text split that is PDF outputs and the
separator that's now going to be open
advanced editor the line that I just
added so basically the headers of the
underneath table
click ok
and I'm going to say save text to in the
right
create variable
let's call those ones um
invoice
lines
and I want to stop extracting those
inverse lines as soon as as soon as I
see subtotal
so let's take that one as well
let's add another text to left right
text to split
invoice lines
separator
subtotal
and I want to save the text on the left
side
and invoice lines
okay let's just make sure that what we
have done until now makes sense
let's add a message box
and use variable invoice lines
and of course this invoice only has one
invoice line but this method works just
as good for invoices with notable
invoice lines
so you see it works for now okay so
let's now see how we can extract this
one to a data table okay let's just
search for table
any activity that you need is generate
data table from text
so take it put it below
and now select inputs and input is of
course invoice lines and navigate to the
options
and there let's take the actual invoice
line or lines if you have a PDF with
multiple invoice lines
put them here in Sample inputs and now
you can see that we can provide a format
so we can say that we want to split the
columns based on a column A separator
and that that is for example one or two
spaces but if you use two spaces and you
provide preview you still see it's not
working because it's now splitting every
word instead of splitting the quantity
and then the entire description of the
products and then the price Etc so we're
going to do something different and just
make sure that you copy this one again
click unlock
add an assign activity
here
and then we're going to say that invoice
lines
is equal to invoice lines
but I'm going to replace
two spaces I'm going to replace them for
a pipeline like this
I'm using a pipeline but you can use any
symbol as long as it's Unique
so click here on Arc
go back to options
and now I'm gonna change all the two
spaces for a pipeline
pipeline
pipeline I'm going to say that the
column separator is a pipeline like this
click preview and now you see that it
works because your web is able to split
the quantity it's able to split the
product's name
um the price and the total price of this
line
so now click on Ock and now we can also
determine where we want to save the data
table so I'm going to save this data
table
and let's call this one DT
invoice lines
press enter
and others also build the rest of the
flow
so let's start with a four each
for each file and folder
so we're going to process all the files
in a specific folder
and this is my project
invoices that's going to be input and I
only want to process invoices that have
a PDF extension
so let's say star dot PDF
look
and what do we want to do with all of
those just click here on read PDF text
now press Ctrl and select all of them
control
and now make sure that you output them
in this do box
and now let's also change 100 so let's
change this one
to current file full name includes full
pads and that way we make this Dynamic
and we can pick up any PDF that is
dropped in this folder okay so we're
reading from current file full name this
one we can remove because we don't need
it anymore
so we're done saving the invoice number
and of course we're also saving the data
table
let's as well add a move file activity
because we want to move to file
to the other folder as soon as it has
been processed move file
and just put it here at the end
and then I'm going to say that I want to
move the file current file full name
inclusive pads
and I want to move it
to invoices processed click select
folder
and make sure to also enable override so
that the file is overwritten as soon as
a new file with the same file name
appears so click here
again now there is one thing that we
still need to do and let's save
everything in an Excel because of course
we want to process further what we have
now extracted so
search for use Excel file activity
and this one
and I'm just gonna add an Excel file
here into invoices processed
and let's call this one Excel
I'm going to add two sheets
I'm gonna let's call this one um invoice
on lines
and let's call this one invoice
info
let's select the file
invoices processed Excel I'm later going
to change this to make it dynamic as
well but for now let's just keep it like
this
and then I'm going to say write cell
this one the first thing I want to do is
I want to write then the number of the
invoice to the invoice info tab so what
to write
let's use variable and that's invoice no
invoice number
and where to write it
let's say that we want to indicate that
in Excel
so let's write to invoice info and then
B1 confirm
and let's also write a description
so what to write
um fence editor
invoice
a number
and we have to write
and again in Excel A1 confirm
okay so now we have written the invoice
number and also a description of the
invoice number
to this invoice info sheet
and now let's also write the data table
write data table to excel that's exactly
what we need
and the content of this invoice we want
to write it what to write
so let's use variable
DT invoice lines that's a data table
and destination let's Excel and invoice
lines
you can exclude the headers
I'm going to close the Excel file and
I'm also going to make sure that this
file is dynamic
so I'm not going to use inverse is
processed Excel
but I'm going to use the invoice number
here
so add to double quotation marks here to
plus signs
and then I'm going to call this one
invoice no
click on Arc
and of course you can do the same for
the date that is on the inverse and
automata data you can also write it to
the same sheet invoice info but for now
I'm just going to do the invoice number
to keep this video really lean okay
let's just go back
let's remove this one let's make sure
that the invoice is in the input folder
which is the case
and let's run about
then let's go to the input folder you
see that the input folder is now empty
and let's go one folder up let's go to
process I see in process that we have
the PDF which has been moved and we also
have the Excel file which has been
generated
you see that we have invoice info which
contains invoice number and it also
contains the number 100 and invoice
lines which contains all the lines so
you see the quantity see the name of the
product you see the price of products
and you also see the total price of the
line This is the easiest way to extract
information from an invoice it doesn't
work for every format so if it doesn't
work for your invoice format there is
another video on my channel that
explains how to extract invoices with
regular expressions
I hope this video was useful for you and
I hope to see you in my next video
浏览更多相关视频
Automate PDF Invoices Data Transfer to Google Sheets with ChatGPT & Zapier | Tutorial
Formatted Reports (Commercial Invoice) in Oracle APEX - Part 34
Como Emitir a NFS MEI Pelo Celular em 2024?
Issue a customer invoice | Odoo Accounting
3.4 Source Documents for Credit Transactions
23.Copy data from multiple files into multiple tables | mapping table SQL | bulk
5.0 / 5 (0 votes)