Scraper facilement 1001 pages et surpasser la concurrence - Découvrez comment !
Summary
TLDRDans cette vidéo, l'auteur révèle une technique secrète pour extraire et réécrire des articles de blog d'un concurrent en utilisant l'intelligence artificielle. Il démontre comment analyser un site web, récupérer les liens des articles via le sitemap.xml, puis créer un fichier CSV avec les URLs, les mots-clés associés et les titres SEO optimisés. L'objectif est de créer du contenu旋回 (spinning) détectable en utilisant GPT-4, ce qui peut être utilisé pour améliorer le référencement naturel ou pour des campagnes de marketing à l'international.
Takeaways
- 🔍 Utiliser un site appelé Open AI Master pour écrire des articles de blog sur l'IA et d'autres sujets.
- 🗂️ Accéder au sitemap d'un site WordPress en ajoutant 'sitemap.xml' à l'URL pour extraire tous les liens de contenu.
- 📄 Télécharger le fichier XML du sitemap pour analyser et utiliser les liens des articles de blog.
- 🔧 Utiliser un prompt de GPT pour extraire les URLs, créer un fichier CSV, et générer des mots-clés et des titres SEO optimisés.
- 🚀 Automatiser le processus de récupération de liens et de création de contenu en utilisant des outils de scraping et de génération de texte.
- 📊 Analyser la structure du fichier XML pour comprendre les données et les liens présents.
- 🔄 Diviser le processus en étapes pour éviter les erreurs et améliorer la précision des résultats.
- 📈 Créer des titres de page (title tags) uniques et engageants en respectant les fondamentaux du référencement naturel.
- 🔄 Effectuer des itérations pour corriger les erreurs et améliorer la qualité des données et du contenu généré.
- 🌐 Considérer la possibilité de réécrire le contenu dans une autre langue pour atteindre un public international ou moins développé.
- 🎓 Apprendre l'automatisation et l'utilisation de GPT Chat pour améliorer les techniques de SEO et de scraping.
Q & A
Quel est le but de la technique présentée dans cette vidéo ?
-Le but de la technique présentée est de récupérer et analyser les articles d'un site concurrent pour les réécrire ou 'spinning', en utilisant une approche automatisée.
Pourquoi l'auteur utilise-t-il un site appelé 'open ai master' ?
-L'auteur utilise le site 'open ai master' car il est un exemple de site ayant un trafic en augmentation rapide et qui publie de nombreux articles de blog sur l'intelligence artificielle et autres sujets.
Comment l'auteur accède-t-il aux éléments du site WordPress ?
-L'auteur accède aux éléments du site en ajoutant 'sitemap.xml' à l'URL du site, ce qui lui permet de visualiser tous les éléments du site car il est sur une plateforme WordPress.
Quelles sont les étapes principales de la technique présentée ?
-Les étapes principales sont : analyser la structure du fichier XML du sitemap, extraire les URLs des balises 'loc', créer un fichier CSV avec les URLs et les métadonnées (mots-clés et titres), et finalement utiliser ces données pour réécrire ou 'spinning' le contenu.
Comment l'auteur utilise-t-il un outil de traitement de données pour extraire les URLs ?
-L'auteur utilise un outil de traitement de données pour extraire les URLs en utilisant une requête spécifique qui cherche les balises 'loc' ouvertes et fermées, et en créant un fichier CSV avec les résultats.
Quels sont les problèmes que l'auteur rencontre lors de la création des métadonnées ?
-L'auteur rencontre des problèmes avec les métadonnées car les mots-clés ne correspondent pas toujours aux URLs et les titres ne respectent pas toujours les fondamentaux du référencement (SEO) et peuvent contenir des répétitions.
Comment l'auteur résout-il les problèmes de correspondance entre les mots-clés et les URLs ?
-L'auteur résout ces problèmes en demandant au processus d'analyse de données d'extraire des mots-clés plus pertinents directement des URLs, puis de créer de nouveaux titres correspondants.
Quelle est la finalité de la technique présentée par l'auteur ?
-La finalité de la technique est de permettre une réécriture ou 'spinning' de contenu de manière automatisée et efficace, en utilisant les données extraites d'un site concurrent pour créer du contenu original et optimisé pour le référencement.
Quels sont les avantages de cette technique par rapport au travail manuel ?
-Les avantages incluent une économie de temps considérable, une automatisation des tâches répétitives, et la possibilité de traiter de grandes quantités de données qui pourraient prendre des jours de travail manuel.
Quelle est la recommandation de l'auteur pour éviter les répétitions dans les titres créés ?
-L'auteur recommande d'utiliser des prompts précis et de faire des itérations étape par étape pour éviter les répétitions et assurer la qualité du contenu généré.
Quel est le prochain sujet que l'auteur envisage de couvrir dans ses prochaines vidéos ?
-L'auteur envisage de couvrir le sujet de la scraping de plusieurs sites à la fois pour créer des combinaisons de contenus, qui pourraient être utilisées pour 'spinning' de manière plus efficace.
Outlines
🔍 Analyse de site concurrent et récupération de liens
Dans ce paragraphe, l'auteur présente une technique pour analyser le site d'un concurrent et récupérer ses liens de blog. Il utilise un site appelé 'open ai master' pour montrer comment exploiter le sitemap.xml pour avoir accès à toutes les pages du site. L'objectif est de scraper les articles du blog pour du spinning. Il explique que la majorité des sites sont basés sur WordPress et que l'on peut accéder au sitemap en ajoutant 'sitemap.xml' à l'URL. Il montre comment enregistrer ce sitemap en tant que fichier XML pour l'analyser plus en détail.
📋 Création de CSV et extraction de liens
Le paragraphe explique comment créer un fichier CSV en extrayant les URLs des tags 'loc' du fichier XML. L'auteur montre comment utiliser Visual Studio Code pour ouvrir et analyser le fichier XML, puis créer une feuille Google avec les liens dans la colonne A et les mots-clés associés dans la colonne B. Il insiste sur l'importance de ne pas avoir de répétition et de créer des titres uniques et engageants pour chaque lien.
🔄 Amélioration des données et création de titres
Dans ce paragraphe, l'auteur discute des erreurs rencontrées lors de la création initiale des fichiers CSV et des titres. Il explique qu'il a dû corriger les données en supprimant les mots-clés qui ne correspondaient pas aux URLs et en créant de nouveaux titres qui sont pertinents et respectent les règles de référencement (SEO). Il souligne l'importance de l'itération pour améliorer la qualité des données et des résultats.
🚀 Automatisation et utilisation de GPT pour le spinning de contenu
L'auteur conclut en expliquant comment l'automatisation et l'utilisation de GPT peuvent aider à réécrire les contenus en utilisant les données extraites. Il suggère de préparer un scénario 'diabolique' pour spinner le contenu de manière détectable en utilisant les mots-clés et les titres créés. Il mentionne également l'intérêt de créer du contenu dans d'autres langues pour s'adresser à des pays moins développés, ce qui peut être une stratégie efficace en référencement.
Mindmap
Keywords
💡technique secrète
💡scraping
💡SEO
💡sitemap.xml
💡WordPress
💡content spinning
💡Visual Studio Code
💡CSV
💡prompt destructeur
💡GPT
💡habysal
💡iteration
Highlights
The video introduces a secret technique for scraping and spinning a competitor's blog articles.
The technique involves using the site's sitemap to access all blog posts efficiently.
The speaker uses a site called 'open ai master' as an example to demonstrate the process.
The process includes downloading the sitemap.xml file to extract all the blog post URLs.
The speaker uses Visual Studio Code to analyze and manipulate the XML file.
A prompt is created to structure the extraction and processing of URLs and creation of a CSV file.
The process involves creating keywords and title tags for SEO optimization based on the extracted URLs.
The technique emphasizes the importance of avoiding repetition and ensuring uniqueness in the generated content.
The video demonstrates the potential of automating content creation and SEO processes using AI.
The speaker discusses the legality and ethics of scraping and spinning content, leaving it to the viewer's judgment.
The method can be applied to multiple sites and languages, potentially targeting less developed markets.
The video is aimed at SEO professionals, writers, and those interested in content scraping and spinning.
The speaker plans to create more videos on scraping and combining content from multiple sources.
The process saves significant time compared to manual content creation, potentially saving days of work.
The video concludes with a call to action for viewers to apply the learned techniques in their own projects.
Transcripts
welcome to this new video today I'm going to show you a secret technique I
don't think there is anyone who has already made a video on this really there is it
legal is it is not legal for you to judge but the goal is to take a competitor so
someone that you want to completely scrape his site his blog articles and do spinning
so I will show you directly the strategy the technique a little brutal I'm going to put myself here
we will be better on the right so there I went to a site called open ai master which writes
blog articles on AI on many things which is in place and I had
done an analysis of his site his traffic is exploding the guy writes a lot of blog articles so you
can already see it yesterday yesterday October 8 1 2 3 4 5 6 here he made six if only 'ier so
he is a guy who has done automation he has set up systems and so on who
writes uh he makes images it seems to me with either Habysal or with other tools via uh
in automation so once I'm on his site what I'm going to do is I'm going to
go to the site map so I'm going to put a slash after his link and I type sitemap so
sitemap.xml as it's a WordPress site I can see exactly all the elements
so there on most of the sites so we will say 80% of the sites are WordPress
or when you type sitemap.xml or sitemap uh- from the bottomindex.xml you will have direct access to the link
so there it is simple we have what we have we have here after the slashdedu.com we have post we still have post post
so that's the blogs pages these are the pages of the site so the pages maybe the guy he made
a page a landing page or whatever any page categories it's these categories that
he created so perhaps artificial intelligence chat GPT mid day d it's categories and
then the rest author that's him and post tag that doesn't interest us we're going to go to the first one
so in the posts to retrieve the blog articles so I'm going to show you my combination there you
see we have all the pages which are referenced from these blog articles so I open any
random one like that in a new tab this one this one and this one and each of them
are blog posts are blog posts from his site me what I'm going to do
is is that I'm going to have fun with his site so I'm going to right click in XML sitemap here in
the white I'm going to save under or control S and there I'm going to download an
XML file so I called it deletion designed so this one I delete it you've never
seen it I'm going to call it deletion designed so now I'm saving it so yes it's going to tell me that it
's going to overwrite it that's normal so now I have my XML file from this site map here I'm going to open it
to show you what it looks like in vs code so here I opened it in Visual Studio code
which allows you to see a little bit what the file looks like the file is huge there is a mass
of data and there are a lot of links so all the links of his of his we will say of his site
and me what interests me if I look at all the structures if I tell him for example cat
GPT to recover all the links it will take the pages from me it will also take the images there
you see there is an image of point lock so that is an image if I take it and I open it and I go
to google I will show you directly what it looks like if I go here hop via the Site
Map I can retrieve this image but what interests me is only the links that's why
I prepared a prompt quite destructive quality so once I have my XML my
site map there I had cleaned the link but we have it we don't need it for today we will
go directly to the magic prompt so I will show you the structure of the prompt it's going to be
simple we're going to do it already I'm going to copy that because it's my last prompt hop no I'm going to
take this one instead here hop so we look at analyzing the structure of this XML file deletion
designed so there I put it I'm even going to add an XML point like that at least he knows what
it's going to be in extracting me only the URLs with this tag so here I'm going to ask for
the lock tag quite simply because it's only this tag there which interests me to
recover the links so lock entry tag and loc closing tag and here here it is image
of point lo so as I read said that the loc it will only recover well which me therefore interested
in the pages so that's perfect in the third step create a CSV file by listing all the
URLs one by one in column a so here I am already preparing it to create a Google sheet therefore a
CSV file and which directly puts the links which goes which will extract in column a then in
4 we created a keyword in column B linked to the URL present in column a no keyword
must contain a duplication so if we take the example here for example of this link so uh how
to use Bing ai on PC Mac well he will put me how to use Bing PC on PC Mac he will put me that so
here it is then uh and in the last step write a TITLE tag of less than 65 characters in
column C respecting the fundamentals of SEO so there I will perhaps add it
same natural referencing Natural referencing each Balis title must correlate with column
A and B avoids repetitions and is engaging the Balis titles must all be unique and
then I told him wait for my validation once you have completed a step quite simply
because when we make him do FIVE steps at the same time he gets confused so the best is to
do a step afterward he waits for the OK or yes and each time we take it one step at a time so
here I take this nugget I take it directly so I take this prompt I'm going to make a new prompt
to show you how it happens I'm going to advanced data analysis I'm going to paste my
prompt here so quite simply with the steps as required and I will add in file my
deletion file designed as I said above he analyzes this deletion file designed
there all I have to do is press Validate and you will see what will happen so there
he will follow the steps one by one and he will wait for my ok after each step so I can
already prepare so begin my my my ok I will simply put a sentence you can move on
to the next step so so if there is a bug it's just because I have a plugin once again
I can write OK to it or wait for what it will ask you if it asks you for a yes or something to be
able to continue so here we really analyze what it does so here I received the
XML file to begin with I will analyze the structure it's good at least he understands the content he sees
he analyzes I managed to analyze the structure of the file XML here is an overview of what I found
so there he found a very good link so there he found the the the siteem.org the URLs there this mode
now that we have an idea of the structure let's take the next step question mark there
yes I tell him you can move on to the next step perfect now I'm going to extract the
URLs which are in the lock tags of the XML file a moment please so
there you see that allows him to do a pause each time to do one step by one and you
will allow you to analyze how it worked in the sense that if tomorrow a GPT 5 cat
comes out and which is ultra powerful you will have trained enough to know that uh action one
plus action 2 plus action 3 plus action 4 so the prompts that were defined were
well written you had worked on them but maybe with gpt5 you can make it do 15 steps
d 'suddenly and that's where it allows you to prepare for war in a way
so he tells me great I managed to extract 100 URLs from this XML file ok very good that
suits you we go to the next step okay me that suits me 100 I just extracted 100 URLs in
just 30 seconds that suits me so I'm going to the next step to the next step
so great I'm now going to create a file CSV and list all URLs in the column at one
moment please so somehow I know it could do the CINs at once
or maybe all four but it gave me some bugs I tested it before but
just why I wanted to show you why to do it one by one because there you see
it is only step 3 and it will offer me to download the CSV so I will be able to go
directly to download this CSV it is not finished we are we are to agreement there he asks us to move on
to the next step but as I am curious I am going to go and analyze what he did to me so
I downloaded it so I come back to where I was and as I I've downloaded I'm going to go open it
to see a little bit what it did to me like that URL list here it is I'm going to wait for what
matters there I pulled out my column A as we can see all the URLs are there there are 1001 of them,
I tested it several times before giving it to you so no there are even 1002, well whatever
1001 because he didn't count the title at the start so very good it suits me perfectly
I'm happy I can move on to the next step in the next step it's been twice that I'm
wrong so I send the next step then logically we are at the step create a keyword
so there in column B he must create a keyword for each of the links so we
will see if he will respect what we asked him and as you can see it is really it is
it is really impressive the fact of, as I say each time, doing iterations
if you have a request that is too complicated to be able to break it down into several steps simplifying the
work in GPT chat there I'm not saying you are an expert in data analysis data na it's
a simple job for him if you detail it directly to him as required step one 2 3 4 5 you
don't need to bother yourself go tell him you are an expert in data analysis of
world renown for 25 years it's useless there step an analysis this structure this structure as it should
1 2 3 4 5 everything is understandable there are no words which are useless and that's it so there uh for
column B I'm going to create a keyword in link with each URL so he understood very well
eh we can read there he gives us some perfect examples here is a little preview it's always good
to have little glimpses besides what you can do is is that in your
initial prompt you could add uh add me little previews each time like that at least
you see the structure and you are ready you don't need to download your file and go
open it and to go check so ok he created the files for me I'm happy we move on to
the next step so we continue the iterations once again there we see keyword homepage keyword
chat GPT keyword iOS keyword up so there it is isn't great at all just now he did things
a little better for me so we're definitely going to give him an iteration at the end if if I see that it's not
it's not great at all which I 'did uh because there he didn't respect me too much so
I'll do I'll do an iteration at the end I'll ask him to modify so that's the advantage
he is that once your step one 2 3 4 it is well done and that leaves you Class 5 which has
messed up a little well there you are going to use directly well a prompt finally at least a guest to
only make a modification in a column and there you see you see how he speaks to me yy it's done
I created title tags ok so he made me title tags he added open a master it's
perfect like that I'm going to show you exactly what I didn't want let him do it to me he did it to me
earlier too and and you would have I could have told him also don't include the word open ai master
which is of course the name of the site of this competitor openmaster. com I could very well have
told him not to include it and because there it wasted directly on the characters
so as I asked for 65 maximum for the TITLE tag in ACO there he gave me some wasted
a good ten so there it is working I will now save everything
that in a CSV file for a moment OK and now download the CSV file we are going to download it we are going to
come and open it so the very last one and we are going to browse and we are going to integrate it ok we are going to wait we are going to
see what we have did we do the job well so there is a bug I
think it's because I imported too many files it makes me re-import
the old one so instead of opening I'm going to import it has already done it to me several times
so here I import the data and I open ok so as we can see he did it to me he didn't
do something great to me I'm going to do it for you show exactly what I did earlier when
I reworked it, it was really perfect I'm going to show you the example from earlier as
you can see there's a lot of iteration I I did a lot of tests where it was not bad at all
it was here I think here it was not bad this is what came out to me how to use being a as
I said earlier more afterwards directly he had removed open a master so it wasn't bad at
all so I'm not happy with his result and I'm going to tell him the keywords don't match
there the keywords in column B don't match the URL and and he did
anything to me in in in column C ok so I'm going to write it here like this because
with the plugin bug then the keywords in column B don't match the URLs in the
column A corrects this then the title tags are not at all correlated with column
A and B of the CSV file don't write again at least improve the CSV and delete the word open ai
master point here's my little prompt here I'm going to send him that and we're going to send it so I think
it's good in fact he did it for me perfectly just now I thought my prompt was
perfect and then he made me a mistake so so much the better that way at least it allows me to see
what I could have improved perhaps be a little more precise because at one point I tell him
I think I know where it comes from it's at one point I talk to him about column A and the column B
here and I didn't tell him what it was in column A and what it was
in column B so he must have mixed it up at that time I think so we'll see what
he's going to do to us I'm really sorry for the mistake to correct this I'm going to take a
new approach to extract more relevant keywords directly from the URLs then I'm
going to create here he is telling me a little about his life I'm not interested there in a few seconds he
extracted directly this best keyword it's the home page so there he did to me
what uh specific word keyword how to use ah there you go there it's better there it got me a keyword
exactly as I wanted what is dpt that's it even if it gives me the URL it's that's exactly
it it's perfect it's the keyword c 'is a keyword of approximately medium long term so it's very
good now let's go through the creation of the steps of Balis title so there because I think that because
we have already told it to do iterations one by one well there he has he has he hasn't mixed everything
he hasn't done both at once he did first the key and then he does the URLs so that's it's
really it's really not bad I don't think that's it because
I gave it iterations like that but I find that it's clear so here we look at Balis
title it's the home page so that's logical that he has he in addition he understood that this is the
home page he is smart best ballist title how to use being on PC so the guide a
perfect guide it's good we'll see we'll see if it will suit us often repeat the word guide information I
think it will often repeat the same structure but it doesn't matter afterwards either we
iterate or because there are already because there are 1000 1000 lines so it is already a lot for
him and he risks doing repetitions often, I will now save these improvements
ok very good then I wait and there it is done I saved the file very well now
uh can you give it to me please I will download it I will import a new file
so the last hop and normally it should be much better than before and even especially for the
title ballistas we will have more open a master so that's very good so I'm going to stretch the links
well not to the end because we don't care a bit there the titles are very good he recovered
in fact he didn't get too bored he just recovered the the how to say the
slugs the URLs and he put them there so it's very good and then as for the
Balis title it's not crazy at all but yeah it's not crazy at all but it can still do it if
there are still a lot of repetitions I think that maybe delete guide information all that
or ask him to iterate but in itself in 5 minutes you did what a person would have
spent perhaps days of whole days doing by hand it would have taken you TR days
of days if this person is efficient so that's it and then after that I'll let you
learn automation thanks to my training my videos which will allow you thanks to this
data to this data I think I will make a microtraining one once you have this now
you have to prepare a diabolical scenario which will allow you to rewrite each of these contents
via this keyword via this this Balis title and to spin all these elements therefore to make a kind of
spinning content in gpt4 by being undetectable or why not write in another language in
another country make a point is a point be in another language which allows you to place yourself
on countries which are less developed like uh uh finally less developed than France that the
United States which everyone aims for every time there are also other countries which are interesting
so there I leave you with that I hope you liked it little blue thumb it's a pleasure uh
you're subscribe you share with your friends who are in SEO in writing or in scrapping he
or even guys who are starting to know a little bit about GPT chat and its screws because this
video is very technical and it can help you to iterate to do this even on
several sites in fact I am preparing quite interesting videos where we will try to scrape
at least I will try to scrape several sites at the same time to make combinations which
are quite, how to say evil, which will allow spinner from the context which is already spined so
these are words that I use a little a little SIO a little blackat and well it's not too too blackat there
but for the moment here we say see you soon it was my ID for an upcoming Ciao video
Посмотреть больше похожих видео
L'unique prompt ChatGPT pour le SEO.
ChatGPT : 10 étapes pour créer ton livre sur AMAZON KDP
Marketing Digital en 2 minutes
Nouvelle Façon de Gagner de l'Argent avec ChatGPT - Personne n'en Parle !
LES FIGURES DE STYLE : comment les repérer et les analyser
Hack WIFI using Kali Linux 100% working | Practical Demo | #makeeasy
5.0 / 5 (0 votes)