Find endpoints in the blink of an eye! GoSpider - Hacker Tools

Intigriti
23 Nov 202106:18

Summary

TLDRIn this hacker tools video, the presenter introduces Go Spider, a powerful web crawling tool designed to discover endpoints, subdomains, and other resources on a website. It efficiently scans web pages, identifies links, and can even recursively crawl through found files. The tool offers customization options like setting user agents, cookies, headers, and managing request speed to comply with platform rules. Advanced features include utilizing third-party archives and filtering results based on file length or extensions, making it an essential tool for initial target enumeration in cybersecurity assessments.

Takeaways

  • 🕷️ Go Spider is a tool designed for web scraping and crawling web pages to discover various endpoints, subdomains, and other assets.
  • 🔍 It operates by requesting a web page and then searching for links, JavaScript files, directories, subdomains, and endpoints, presenting a comprehensive map of the target's web presence.
  • 🔄 The tool can recursively crawl through discovered files to uncover even more links and resources, creating a detailed web of the application's structure.
  • ⚙️ Basic usage involves running Go Spider with the '-s' option to specify a URL, '-o' for output file, and '-c' for setting the number of concurrent requests.
  • 🔑 'Big bounty parameters' like '-u' for user-agent and '-h' for custom headers can help comply with platform rules during scanning.
  • 🚀 The tool can be configured for speed with parameters like '-threads' for setting the number of threads and '-concurrent' for concurrency level.
  • 🛑 To avoid overwhelming targets, use '-k' or '-delay' to set a delay between requests, ensuring you stay within acceptable request limits.
  • 🗂️ Advanced features include crawling JavaScript files with '-js', including subdomains with '-subs', and utilizing sitemaps and robots.txt with '-sitemap' and '-robots' respectively.
  • 🔎 Go Spider can integrate with third-party archives like Common Crawl and Wayback Machine using '-h' or '-other-source' to find URLs from historical data.
  • ⛔️ Use '-blacklist' with regex to exclude specific results or '-whitelist' to focus only on desired outcomes.
  • 📊 Filtering options like '-l' or '-length' allow for the exclusion of certain file types or HTTP status codes to refine the scan results.

Q & A

  • What is Go Spider and what does it do?

    -Go Spider is a tool that spiders web pages to crawl them and extract information such as endpoints, subdomains, and other links. It can also recursively crawl the discovered files to create a comprehensive web of the application's structure.

  • How does Go Spider perform a basic scan?

    -To perform a basic scan, Go Spider is run with the '-s' option to input a URL, '-o' for output to specify an output file, and '-c' for concurrency to set the number of concurrent requests.

  • What are 'big bounty parameters' in Go Spider and why are they important?

    -'Big bounty parameters' refer to options like user-agent, cookies, and headers that can be set with Go Spider to adhere to the rules of a platform and ensure ethical hacking practices.

  • How can Go Spider be configured to respect the speed limits of a target platform?

    -Go Spider allows setting the number of threads with '-d', concurrency with '-c', and delay between requests with '-k' to control the speed and avoid overwhelming the target platform.

  • What additional features does Go Spider offer beyond basic crawling?

    -Go Spider can find JavaScript files, include subdomains, crawl sitemaps, and utilize third-party archives like Common Crawl and Wayback Machine for more extensive data collection.

  • How can Go Spider be used to filter out unwanted results during a scan?

    -Go Spider provides options like '-h' for excluding specific sources, '-l' to view file lengths, and '-f' to filter out specific lengths or extensions to refine the scan results.

  • What is the purpose of the '-blacklist' option in Go Spider?

    -The '-blacklist' option allows users to supply a regex pattern to exclude results that match it, helping to focus on relevant data during a scan.

  • Can Go Spider handle multiple URLs at once?

    -Yes, Go Spider can handle multiple URLs by using the '-s' option with a file that contains multiple links, allowing for batch processing of URLs.

  • How does Go Spider help in the initial enumeration of targets?

    -Go Spider assists in the initial enumeration by mapping out the target's web structure, identifying running services, and providing a comprehensive overview of what's present on the target's web pages.

  • What is the recommended next step after using Go Spider for initial scanning?

    -After the initial scan, the recommended next step is to analyze the results, identify important targets, and proceed with more focused and in-depth security testing.

Outlines

00:00

🕷️ Introduction to Go Spider

This paragraph introduces Go Spider, a tool designed for web crawling and data extraction. It explains how Go Spider can spider web pages to discover endpoints, subdomains, and other valuable information. The tool is capable of recursively crawling through files and directories to create a comprehensive map of a web application. The video demonstrates a basic scan using Go Spider with the '-s' option for specifying a URL, '-o' for output file, and '-c' for setting the number of concurrent requests. The tool's ability to quickly find a wealth of information is highlighted, emphasizing its power in web scanning.

05:01

🛠️ Advanced Features and Filtering with Go Spider

The second paragraph delves into the advanced features of Go Spider, including parameters that can be set to comply with platform rules, such as user-agent and cookie settings. It discusses the importance of controlling the speed of requests to avoid exceeding limits set by the target website. The paragraph also covers additional functionalities like crawling JavaScript files, sitemaps, and using third-party archives for enhanced data retrieval. The tool offers filtering options to refine results, such as excluding certain file extensions or HTTP response lengths, which is particularly useful for avoiding false positives like custom 404 pages. The video concludes by encouraging viewers to explore the tool for initial target enumeration and suggests viewers to provide feedback on future tool coverage.

Mindmap

Keywords

💡Go Spider

Go Spider is a web crawling tool that is central to the video's theme. It is designed to spider web pages, meaning it can traverse and index content from the web. The tool is used to discover endpoints, subdomains, and other resources linked within a web page, which is crucial for initial reconnaissance in cybersecurity assessments. The script mentions that Go Spider can be used to 'crawl web pages and to get targets endpoints, subdomains everything out of it,' showcasing its capability to create a comprehensive map of a website's structure.

💡Endpoints

Endpoints in the context of the video refer to the specific URLs or routes within a web application that can be accessed. These are important for security testing as they might reveal potential vulnerabilities. The script explains that Go Spider can identify endpoints by 'searching through that page and look for links to... other endpoints,' which is a critical step in assessing the attack surface of a web application.

💡Subdomains

Subdomains are subsets of a primary domain, such as 'blog.example.com' where 'blog' is the subdomain of 'example.com'. They are significant in web crawling as they can host different services or applications that might have distinct security configurations. The video script highlights Go Spider's ability to find subdomains, which is part of the broader process of identifying all accessible points within a web property.

💡Concurrent Requests

Concurrent requests are multiple HTTP requests sent to a server at the same time. In the video, the script discusses setting the concurrency level with the 'dash c' option, which controls how many requests Go Spider can make simultaneously. This is important for efficiency but also must be managed to avoid overwhelming the server or violating the terms of engagement with the target website.

💡User Agent

A user agent is a string that allows network protocols peers to identify the application, operating system, and version of the requesting software. In the script, it is mentioned that Go Spider can be configured with a user agent using the 'dash u' or '--user-agent' parameter. This is often necessary to comply with the rules of the platform being scanned or to bypass certain security measures that might block requests without a recognizable user agent.

💡Cookies

Cookies are small pieces of data stored on a user's computer by the web browser while browsing a website. In the context of the video, setting specific cookies is discussed as a way to customize the requests made by Go Spider using the 'cookie' parameter. This can be important for accessing pages that require authentication or maintaining a session during the crawling process.

💡Headers

HTTP headers are key/value pairs sent in an HTTP request or response that provide additional information about the request or response. The video script refers to setting specific headers with the 'dash capital h' or '--header' parameters. This can be used to mimic legitimate traffic or to provide additional information that might be required to access certain resources on the web server.

💡Blacklist

A blacklist is a list of entries that are blocked or excluded from a set of resources. In the video, the script mentions using a 'blacklist' to exclude certain URLs or patterns that match a regular expression provided by the user. This feature is useful for focusing the crawl on relevant resources and avoiding unnecessary data collection.

💡Whitelist

A whitelist, the opposite of a blacklist, is a list of entities that are allowed access or are considered safe. The script briefly touches on the concept of whitelisting, where Go Spider can be configured to only include specific resources during the crawl. This is beneficial for narrowing down the scope of the crawl to the most relevant parts of the website.

💡Sitemap.xml

A sitemap.xml file is an XML file that lists URLs for a site and allows webmasters to include additional information about each URL for search engines. In the video script, it is mentioned that Go Spider can crawl a sitemap.xml file if found. This is particularly useful for discovering all the URLs of a website in a structured manner.

💡Robots.txt

Robots.txt is a file that tells web crawlers which areas of a website should not be processed or scanned. The script refers to the option to crawl the robots.txt file with 'dash dash robots'. This file can provide insights into the parts of the website that the owners do not want to be indexed, which might be relevant for ethical web crawling practices.

Highlights

Go Spider is a tool for spidering web pages to crawl and gather information such as endpoints and subdomains.

It can request a page and search for links, JavaScript files, directories, subdomains, and endpoints.

The tool can also crawl the findings to create a web of the application's files and links.

Go Spider can be run with simple commands to perform scans and gather results instantly.

The '-s' option allows input of a URL to scan, while 'capital S' can take a file with multiple links.

The '-o' option is used for outputting results to a file.

The '-c' option sets the level of concurrency for the scan.

Parameters like 'big bounty parameters' can be set to adhere to platform rules, such as user-agent with '-u'.

The '-cookie' and '-header' options allow setting specific cookies and headers for the scan.

The '-d' and '-c' options control the number of threads and concurrency of the scan.

The '-k' option sets the delay between requests to match domains.

Go Spider can find JavaScript files, include subdomains, and crawl sitemaps and robots.txt files.

The '-h' or '--other-source' option uses third-party archives like Common Crawl and Wayback Machine to find URLs.

The '-r' or '--include' option allows crawling of web pages from archives like archive.org.

The '--blacklist' option can be used to blacklist specific items using regex patterns.

The '--whitelist' option can be used to only allow specific items during the scan.

The '-l' or '--length' option can show the length of each file found, which can be useful for filtering.

The '--filter-length' option can be used to filter out specific file lengths or HTTP status codes.

The tool provides extensive filtering options to refine the scan results.

Go Spider is useful for initial enumeration of targets to understand what they are running.

The video concludes with a call to action for viewers to comment on tools they would like to see covered in the future.

Transcripts

play00:00

this

play00:01

is go spider

play00:02

and that's what we're gonna talk about

play00:04

in today's hacker tools video

play00:06

[Music]

play00:09

go spider is a really cool tool that

play00:11

allows us to spider web pages to crawl

play00:14

web pages and to get targets endpoints

play00:17

subdomains everything out of it so the

play00:20

idea here is that this tool is going to

play00:21

request a page and then it's going to

play00:24

search through that page and look for

play00:26

links to javascript files other

play00:28

directories other sub domains other

play00:30

endpoints anything and it's going to

play00:32

show that all to us now additionally to

play00:34

that you can even set it that it's also

play00:36

going to crawl those findings that we

play00:38

have so it finds a file and it also

play00:41

crawls that file for more files and that

play00:43

way you really create a web of this

play00:46

application where you have all the files

play00:48

and all the links between them and you

play00:50

can be sure that you have almost

play00:52

everything

play00:53

that can be found on that web page

play00:55

without having to go through it all

play00:57

manually but let's take a look at how

play01:00

this works by running it and doing a

play01:02

very simple scan here

play01:05

so what are we going to do we're going

play01:06

to run go spider and then we're going to

play01:08

supply the dash s option the dash

play01:11

lowercase s option allows us to input a

play01:13

url to

play01:16

any page that we want to scan

play01:18

now you can also use capital s and then

play01:20

supply a file that holds multiple links

play01:23

that you always can

play01:25

following that argument we're gonna use

play01:27

the dash o for output and we're gonna

play01:29

supply an output file and then we're

play01:31

gonna use this dash c lastly which is c

play01:34

standing for a concurrency how many

play01:35

concurrent requests are we gonna run

play01:38

and that is how we can perform a simple

play01:40

scan and if i press enter here

play01:43

we're gonna see that we get a lot of

play01:44

results instantly it finds subdomains

play01:47

urls javascript forms uh links anything

play01:51

we can think of it is going to find it

play01:54

and that's obviously very very powerful

play01:58

with that simple scan out of the way we

play02:00

can also look at some more features that

play02:03

this tool has because this was just the

play02:05

most simple way to go but there are

play02:08

plenty more features that we can use

play02:11

and first of all i want to talk a bit

play02:12

about

play02:13

some

play02:14

parameters that i call

play02:16

big bounty parameters because they can

play02:18

help us and help the companies and help

play02:20

us adhere to the rules of a platform for

play02:23

example your platform may have a rule

play02:25

that you have to set a user agent to be

play02:28

for example your integrity email address

play02:31

you can do that with the dash u or the

play02:32

dash dash

play02:34

user-agent parameter

play02:36

we can also set specific cookies with

play02:38

the cookie and specific headers with the

play02:40

dash capital h or dash dash header

play02:44

parameters

play02:45

next up we also have to talk about speed

play02:48

because this tool can make a lot of

play02:50

requests and can go very fast but you

play02:53

have to make sure that you adhere to the

play02:54

rules of the program that you're hacking

play02:56

and that you don't go over that

play02:57

threshold of that many

play02:59

requests per second

play03:01

you can set

play03:02

the amount of threats you want to use

play03:03

with dash d or dash dash threads you can

play03:06

set the concurrency with dash c or dash

play03:09

dash concurrent as we saw in the example

play03:12

and then you can set your delay

play03:14

between new requests and matching

play03:16

domains to dash k or dash dash delay

play03:21

now with that out of the way let's look

play03:22

at some of them some more features that

play03:24

this tool has

play03:26

because it can not only find files you

play03:28

can also file in javascript files as we

play03:30

as we've shown with dash dash js

play03:33

it can include subdomains which with

play03:35

dash dash subs it can also crawl site

play03:38

maps so if it finds a sitemap.xml file

play03:41

it can also crawl that if you supply

play03:43

dash sitemap and the robots.txt file as

play03:46

well with dash dash robots

play03:49

now

play03:50

following that we can do some really

play03:52

cool stuff and that is done with dash h

play03:54

or dash dash other dash source

play03:57

and this is gonna use third third-party

play03:59

archives such as common crawl um

play04:03

wayback machine virustotal all that's

play04:05

all those uh

play04:07

already big databases of files in the

play04:10

from the past it's going to use them to

play04:11

also find urls um

play04:14

and then you can also use dash r or dash

play04:16

dash include other domain or other

play04:19

source rather and what that's gonna do

play04:21

it's gonna also then crawl those found

play04:24

um web pages from archive.org or for

play04:27

from common crawl and also keep on

play04:29

crawling them so you know that you have

play04:30

found everything

play04:33

as you've seen in the example this

play04:35

generates a ton of output so we need a

play04:38

means of blacklisting this and with dash

play04:41

dash blacklist we can black blacklist

play04:43

specific um

play04:46

things so we can supply a regex and it's

play04:48

then gonna blacklist everything that

play04:49

matches that regex we can also just

play04:51

whitelist things if we only want

play04:53

specific things

play04:55

we can also choose to view the length of

play04:57

every of our of every file that we get

play05:00

with dash l or dash dash length and then

play05:02

we can filter out specific things with

play05:04

dash dash filter dash length

play05:07

now this could be extremely useful if

play05:09

for example this

play05:11

website has a custom 404 page that just

play05:14

returns a 200 so the crawler is going to

play05:16

give you for everything it crawls um

play05:19

this 200 but you don't want that

play05:20

obviously so you could remove that from

play05:22

the results with this filter length and

play05:25

then you can also do more filtering and

play05:27

find all of these

play05:28

flags in the dash help page for this

play05:30

tool but you can even filter out certain

play05:33

extensions that you don't want for

play05:34

example i can think of pngs stuff that's

play05:37

not really interesting that you don't

play05:38

want to see well you can also filter

play05:40

them out

play05:43

now that was it for this tool go spider

play05:47

i think it's a really interesting tool

play05:48

to get some

play05:50

some first enumeration of your targets

play05:52

to know

play05:53

what they are running what's going on

play05:55

there

play05:56

and then from there you can obviously

play05:58

pick the targets that are important and

play06:00

start hacking

play06:01

now i hope you enjoyed this video i hope

play06:04

you like it if you liked it of course

play06:07

comment down below what tools you would

play06:09

like to see us cover in the future so

play06:12

that was it for me have a good day take

play06:14

care

play06:16

[Music]

Rate This

5.0 / 5 (0 votes)

Связанные теги
Web CrawlingHacking ToolsGo SpiderSecurity ScanningCybersecuritySubdomainsEndpointsJavaScript FilesCrawler FeaturesHacker Tutorial
Вам нужно краткое изложение на английском?