Build your own Amazon price scraper on Google sheets
Summary
TLDREste video ofrece una guía paso a paso para extraer precios de productos de la página web de Amazon y cargarlos en una hoja de cálculo de Google utilizando Google Apps Script. El proceso comienza abriendo el editor de scripts, donde se crea una función para capturar el contenido HTML de la página de Amazon. A través de expresiones regulares, se extrae el precio mostrado. Se utiliza la clase UrlFetch para obtener la página y se requiere autorización para ejecutar el script. Dado que Amazon restringe el acceso a sus datos a través de scripts automatizados, se sugiere el uso de un scraper de terceros que maneje la obtención de contenido HTML. Este scraper evita los problemas de IP y proporciona el contenido HTML deseado. El video también muestra cómo registrarse para obtener una clave de API gratuita y cómo integrarla en la hoja de cálculo de Google para realizar la extracción de datos. Finalmente, se explica cómo crear un bucle para extraer precios de múltiples productos y cómo manejar posibles errores o variaciones en la estructura de la página web de Amazon. El video concluye con una invitación a los espectadores a comunicarse si tienen dudas y a suscribirse al canal.
Takeaways
- 🛍️ Primero, se necesita abrir el editor de scripts de Google Apps, nombrándolo de manera significativa como 'Amazon Scraper'.
- 📄 Se utiliza la clase `UrlFetchApp` para obtener el contenido HTML de la página de Amazon deseada.
- 🔍 Se emplea expresiones regulares para extraer el precio de los productos mostrado en la página web.
- 🆔 El identificador clave para cada producto en Amazon es su ASIN (Amazon Standard Identification Number), que es único.
- ✅ Para evitar errores y restricciones, se recomienda utilizar una API de terceros para obtener el contenido HTML.
- 🔑 Es necesario registrarse en el sitio web de la API de terceros y obtener una clave API para usar sus servicios.
- 📚 Se debe autorizar el acceso a la API la primera vez que se ejecute el script.
- 📈 Se puede hacer uso de Postman para probar y visualizar cómo funciona la API antes de implementarla en la hoja de Google.
- 📝 Se extrae la clave API de la hoja de configuración en Google Sheets y se utiliza en el script.
- 🔄 Se utiliza un bucle `for` para recorrer en iteración todas las filas en la hoja de trabajo, actualizando el precio de cada producto.
- 🤖 El script debe manejar posibles fallos, como cuando no se encuentra una coincidencia para el precio o cuando el formato de la página cambia.
- 📋 Se debe actualizar la hoja de Google Sheets con el precio extraído después de cada iteración exitosa.
Q & A
¿Qué es lo que se busca hacer en el video?
-El objetivo del video es mostrar cómo extraer precios de productos de la página web de Amazon y agregarlos a una hoja de cálculo de Google utilizando Google Apps Script.
¿Qué es necesario hacer antes de comenzar a escribir el código para extraer información de Amazon?
-Es necesario abrir el editor de scripts, ir a Herramientas y luego a Editor de scripts, y crear un nuevo proyecto nombrado 'Amazon Scraper'.
¿Cómo se obtiene el contenido HTML de una página en particular?
-Para obtener el contenido HTML, se utiliza la función 'fetch' de la aplicación 'UrlFetch', pasando la URL deseada y luego utilizando el método 'getContentText' para obtener el texto HTML de la página.
¿Qué es un ASIN y cómo se relaciona con la extracción de datos de Amazon?
-Un ASIN (Amazon Standard Identification Number) es un número de identificación único asignado a cada producto en amazon.com. Se utiliza para identificar y rastrear productos específicos en la extracción de datos.
¿Por qué se necesita un tercero para obtener el contenido HTML de una página de Amazon?
-Se necesita un tercero debido a las restricciones y políticas de Amazon que pueden bloquear la extracción de datos directamente. Un tercero maneja la obtención del contenido HTML a través de proxies con diferentes direcciones IP, evitando mensajes de error y capturando automáticamente la información.
¿Cómo se registra y obtiene una clave de API para utilizar un servicio de terceros para extraer contenido HTML?
-Se debe registrar en el sitio web del servicio de terceros, obtener una clave de API gratuita que permitirá hacer un número limitado de solicitudes API al mes, y luego utilizar esa clave en el script para realizar la extracción de datos.
¿Cómo se utiliza la clave de API en el script de Google Apps Script?
-Se obtiene la clave de API de una celda específica en la hoja de configuración de la hoja de cálculo de Google, y luego se utiliza esta clave para realizar la solicitud a través del servicio de terceros para obtener el contenido HTML.
¿Cómo se identifica el precio de un producto en el contenido HTML de Amazon?
-Se utiliza una expresión regular para buscar y extraer el precio del producto, que generalmente se encuentra en un bloque de HTML específico con una clase o identificador único.
¿Cómo se implementa la extracción de precios en una hoja de cálculo de Google?
-Se utiliza Google Apps Script para iterar sobre una lista de ASINs, obtener el contenido HTML para cada uno, aplicar la expresión regular para extraer el precio y luego establecer el precio extraído en una columna de la hoja de cálculo.
¿Qué sucede si la expresión regular no encuentra una coincidencia en el contenido HTML?
-Si la expresión regular no encuentra una coincidencia, el script no podrá extraer el precio. Es importante incluir varias opciones en la expresión regular para abarcar diferentes formatos de bloques de precio que puedan aparecer en la página de Amazon.
¿Cómo se puede mejorar la eficiencia del script para manejar diferentes formatos de precios en la página de Amazon?
-Se pueden agregar múltiples casos a la lógica del script, utilizando diferentes expresiones regulares para encontrar el precio en diferentes bloques HTML. Esto asegura que el script funcione incluso si el formato del precio cambia en la página de Amazon.
¿Qué pasos adicionales se recomiendan antes de ejecutar el script de extracción de precios?
-Se recomienda limpiar el contenido HTML eliminando espacios y saltos de línea innecesarios para que el script de Google Apps Script pueda analizar el contenido más rápidamente y con mayor eficiencia.
Outlines
😀 Introducción al Scraping de Precios de Amazon
En este primer párrafo, se presenta el objetivo del video: enseñar cómo extraer precios de productos de amazon.com e insertarlos en una hoja de Google Sheets utilizando Google Apps Script. Se menciona la necesidad de abrir el editor de scripts, nombrar el proyecto y crear una función para realizar el scraping. Se destaca la estrategia de obtener el contenido HTML de la página y utilizar expresiones regulares para extraer el precio. Además, se aborda el uso de la clase UrlFetch y la obtención del número de identificación estándar de Amazon (ASIN) para construir la URL a scrapear.
🔍 Utilizando una API de Scraping de Terceros
Este párrafo describe el uso de una API de scraping de terceros para evitar restricciones y mensajes de error al obtener datos de Amazon. Se sugiere registrarse en el sitio web de la API para obtener una clave de API y se explica cómo usarla en la solicitud. Se menciona el uso de Postman para probar la API y se indica cómo implementar la lógica en una hoja de Google Sheets. Se destaca la necesidad de obtener el contenido HTML de la página y buscar la información de precio utilizando expresiones regulares.
📄 Procesamiento del Contenido HTML y Extracción del Precio
En este apartado, se explica cómo utilizar la expresión regular para buscar y extraer el precio del producto en el contenido HTML obtenido. Se menciona la necesidad de escapar ciertos caracteres especiales en la expresión regular y se proporciona un ejemplo de cómo hacer esto. Se describe el proceso de usar el método match() para encontrar coincidencias en el contenido HTML y cómo utilizar el índice para obtener la primera coincidencia. También se aborda el uso del método replace() para eliminar caracteres no deseados y obtener solo el precio del producto.
🔁 Scraping de Precios para múltiples ASINs
Este párrafo se enfoca en cómo adaptar el proceso de scraping para obtener precios de múltiples ASINs listados en una hoja de Google Sheets. Se sugiere utilizar un bucle for para recorrer filas y obtener el ASIN de cada fila. Se indica cómo utilizar el método getSheetByName() para obtener la hoja de configuración y extraer la clave de API de una celda específica. Se describe el proceso de reemplazar la clave de API en la solicitud y cómo utilizar el método setFormula() para establecer el precio extraído en la hoja de resultados. Además, se menciona la necesidad de manejar casos en los que no se encuentra un precio y se ofrece orientación para identificar y manejar diferentes bloques de HTML donde el precio puede aparecer.
Mindmap
Keywords
💡Scrapping
💡Google Spreadsheet
💡Google Apps Script
💡Amazon Standard Identification Number (ASIN)
💡Regular Expression
💡Third-Party Scraper
💡API Key
💡Postman
💡HTML Content
💡Product Advertising API
💡For Loop
Highlights
The video demonstrates how to scrape product prices from Amazon and input them into a Google spreadsheet.
A script editor is used to create a project named 'Amazon Scraper' for the scraping process.
The HTML content of a specific Amazon product page is fetched using the UrlFetchApp and regular expressions to extract the price.
An Amazon Standard Identification Number (ASIN) is used to uniquely identify each product on Amazon.com.
The script requires authorization the first time it is run.
To avoid automated access restrictions by Amazon, a third-party scraper service is recommended.
The third-party scraper service requires an API key, which can be obtained for free with registration.
The API key is stored in the Settings tab of the Google Sheet for easy access within the script.
The fetched HTML content is cleaned to remove unwanted spaces and line breaks for easier parsing.
Regular expressions are used to locate and extract the product price from the HTML content.
The script includes error handling for cases where the price block cannot be found in the HTML.
A for loop is used to iterate through a list of ASINs and scrape prices for each product.
The scraped prices are stored in column V of the Google Sheet.
The script accounts for different HTML structures that may represent the product price on Amazon.
The tutorial provides a complete solution for setting up an Amazon price scraper using Google Apps Script.
The source code and documentation for the third-party scraper are provided in the video description.
The video encourages viewers to subscribe to the channel for more informative content.
Transcripts
hello everyone in this video we are
going to see how to scrap the product
prices from amazon.com website and input
that into a Google spreadsheet using
Google Apps spread so that is what we
are going to see in this video let's see
how to do that so first you need to open
the script editor go to tools and script
editor just do any project name let's
name this as Amazon scrapper
so let's rename the function to
something meaningful
maybe scrap up so we are going to scrap
the prices the product prices from
Amazon using the your fridge class so
let's see how to do that let me create a
variable or get content so we are going
to get the HTML content of this
particular page and with the help of
regular expression we are going to pass
the price which is shown here so that's
our approach let's see how to do that so
first we will go with URL fish so you
are fetch app plus contain a lot of her
method so we are going to use the first
one phage and you need to pass the URL
that you want to scrap pass the water
here so you need to pass these mini long
character basically you just need to
give amazon.com /dp
/ the acid name so Asin is nothing but
Amazon standard identification number
for each product and this thing is
unique for each and every product on
amazon.com so just ignore everything and
make sure that it contains amazon.com
DB slash followed by passing and click
on enter you can see the same page so
just copy that you are and go here paste
it here so once you do this it will just
fetch but we need to retry the HTML
content of that particular URL so to pay
the HTML content you need to use a
method for get content text which return
the text HTML text of this particular
web page let's print that and see how it
works so you need to authorize for the
very first time when you're running the
script so initially you will get the app
isn't welfare if I am not sure why we
are getting this message frequently we
are getting this message so click on the
advanced and click on go to Amazon scrap
or click on allow so if you see her the
rectus was failed stating that like his
file truncated server response to
discuss automated passes to Amazon data
please contact APA services - support
for information about migrating to
something as so we are getting these
error message asking us to contact them
to get the product advertising API
access so even if you use or product
advertising ap axis a missile put as
several restriction you can able to use
the product advertising API only if you
made enough sales through you are afraid
otherwise they will block the product
advertising ap so we are not going to
directly pass the URL so we are going to
use a third-party scrapper which help us
to avoid these kind of message and gives
the HTML in return so that third party
take care of fetching the HTML content
for the given you are by handling
proxies with a different IP address and
it's salts capture automatically so
that's why we rely on the third party
aap so you need to register to this
third party eBay and you need to get
your free ap key so you can make
thousand API requests per month for free
of cost and if you want to upgrade your
plan you can go ahead and upgrade but
for this tutorial the free plan is more
than sufficient so I'm going to use that
third party API have given a link in the
YouTube description maybe you can go
ahead and find your unique apt and put
that here so this is how the URL will
look like and this is completely from
the third party API scrapper
so we need to pass mainly two parameter
one is you are unique ap key that you
can get by logging into the website and
the URL that you want to start that's it
you just need to pass the two parameter
and once you are done click on the send
so we are using a postman to see how the
API works so once it work successfully
in the postman we are going to implement
that on the Google sheet so you can see
her it's crap the HTML page for the key
one you are so this is the HTML page and
the size of HTML page is one point five
three sorry for three and the status is
200 okay so the response is good so I'm
going to copy the API key and put that
ap key in the Settings tab b1 and I'm
going to get that through the script so
let's say where API key I'm going to get
that from the be one cell of the
Settings tab
so use good active spreadsheet and get
sheet by name past the name has settings
and you need to use a gate range method
that backs up row I'm calm
it's in the first row and second column
and get value take care of driving the
value from the particular cell that's it
now we can use this variable to replace
our API key so just go ahead and copy
them enter a URL create a new variable
called tour and paste the URL and just
replace this one with the apt variable
that we already created here so that's
it and we are going to use the same URL
dot fetch method but instead of passing
the URL data key we are going to pass
the URL through the body scrapper and
let's see how it works whether it can be
able to fetch the HTML content or not so
that's it it executed successfully let's
see the logs yeah so it successfully
retry the HTML content of the key one
reward now just go here and copy this
one so that you can able to copy the
native HTML content to your clipboard
and go to read jigsaw so where you can
write your regular expression your tribe
that Maison product price so this is a
scene that we have used and let's
inspect and see where exactly on the
HTML content this price has been located
so it is located
on an IB price block underscored our
price so just copy this one maybe you
can do a right click and click on edit
as HTML so just copy this once again
getting timed out so what I'm going to
do is I'm going to remove the unwanted
spaces and line breaks to do that guru
text fixer it's a most public tool I
select the remove line break just paste
the content and select option anymore
line break and paragraph break click on
the more line break so it make the data
British odd click on copy to clipboard
and now go to rejects select everything
and paste your new data okay so now we
can able to successfully paste the
content just copy this span ID is equal
to price block underscore our price and
go here and paste it here so go to list
where you can see something yeah we can
able to find a match for the given text
and we need to retype the price all
right so to try the price would dot stop
we should try everything and we need to
retry up to this pan right here put it
again and this is not accepted by edges
you need to escape if we use up the /sk
you need to use a backward slash and
that's it alright saying that no result
but of course if it shows the list for
this character then obviously it should
show the rest of the text here but
because of the contact is to log is not
able to display the result here so let's
go head and fly that in the Google Apps
Script itself by taking these regex so
stow this retic somewhere here
let's name that as a text so we put a
matrix need to and post array text
between the forward slash and you need
to mention G which is for global and I
for a sensitive 488 so now it become a
proper
read text so now what you have to do is
you need to match this vertex with the
HTML content so let's say price is equal
to get content dot match have changed
the matrix so when you do a match okay
it's a regex
you will get n number of matches but we
are sure that we are going to get only
one match so I am going to use the index
to find the very first match let's say
price a price of index 0
and just before executing the a script
you can use a method training so which
remove the unwanted excess lines and
unwanted spaces and make the content
really short so that this Google Apps
Script can parse the content quickly so
now
Xu and see how it works so it executed
successfully go to view and logs and
here you go we can able to scrap the
price block that's it now what we are
going to do is we just need this potion
so we need to read this portion and this
pan block so to remove that you can use
a dot re place and replace the unwanted
characters with an empty string so let
me let's name this as price or I can use
the same variable price and I just
replace the unwanted block weight now
I'm going to use the replace I cane so
in this time so first we identify this
character and replace it with a null
string then we have identified this
character and replace it with a string
so that we will get only the product
price so let's print that and see
whether we are getting that exact right
price or not so the script ran
successfully in the log we exactly
caught the product price that we want to
spend that's it
we have scrapped the prod price of the
give one asking or for the queuing to
armed now I have list of a secure so
let's see how to scrap all the
price through the spirit all we need to
do is just look through from the road to
to route well that's it
so let's see how to do that and it's
very simple you just need to include a
for loop for hydration so we no need to
do a hot code like oil or something we
can use a method get last row to find
the last row of this particular sheet
let's name this sheet has scrap of sheet
and the name of the sheet is scrap oh
let's put that here stop writing this we
can just use a striper sheet let's
create a variable called last row and we
are going to find the last row of that's
proportionate so get last row which
returns an integer and that indicates a
last row of the spreadsheet and we just
need to create a for loop started from
is equal to because the first row is a
header we need to include that and a
lesser than or equal to last row and I
plus plus that's it and these block of
code should go inside the follow alright
and here we need to replace the hot
coded a same with last thing we get from
the Google sheet so during each
iteration the axon will vary and we need
to pass the respective assign to this
form so just put 1/2 plus followed by
scrapper sheet dot hit range and a row
should keep on change during each
iteration so we can replace the row with
I and the quorum is one dot good value
that's it
which will turn the value of the
respective side during each hydration so
now we need to put the price on the
column V right so use the same object
after we get a price scrap a sheet dot
date range and use the hydration
variable and we need to operate it on
the corner too because we are getting
the scene from the column one and
instead of grid value you are going to
set the value so we use the method set
value and the value you are going to set
the price of the asset that you are
going to get during each hydration so
let's execute and see whether it
fetching the price of or passing or not
so basically it will take a few seconds
to fetch the price for each icings
so if you see here for this passing it
called free stating that cannot create
property zero from one because it's not
able to match that our text let's go to
that and see white got failed so if you
see the HTML content you can see that
it's not priced block underscore our
price its price block underscore sale
price and that's why the script code
fail so you need to add this additional
case in case if the match is zero so
that's what you want to consider and
also you need to check whether the price
is present first of all if the price is
present you have identified
the list of HTML block where the price
get displayed in the Amazon and you have
to find the right jex for all those
blogs and put it in your Google Apps
group so that first it will go and check
in this block and it's not able to find
then it will go and check in the next
block
if the match exists it will go and
populate the price on the Google sheet
basically the block will be not more
than three to four so you just need to
identify and put that here and that's it
you are Amazon price scrapper is ready
let me know if we have any doubts and
you can find the source code of this
grid the third part is crap out
documentation you are and the
description if you liked this video you
comes up and don't forget to subscribe
to my channel thank you
Browse More Related Video
Cómo hacer una PÁGINA WEB en BLOC de NOTAS
✌️👑Aprendiendo Desde Cero HTML 5 en Adobe Dreamweaver 2021👍✌️
Qué es y Cómo usar la Etiqueta Header en HTML (Ejemplo de uso)
¿Como realizar una Base de Datos en Google Sheets? Base de datos en la nube Gratis
FormData en javascript - Obteniendo datos de un formulario - Como cuando y porque usarlo - JS
Web I - SPA y Ajax - Partial Render
5.0 / 5 (0 votes)