How to Install and Use **wget** on Mac to Download Entire Websites

July 2, 2023
How to Install and Use wget on Mac to Download Entire Websites
In the world of technology and website management, wget is an essential tool for automated and recursive downloads of websites. In this tutorial, we will teach you how to install wget on your Mac and how to use it to download an entire website. We will also explore the most common and useful commands offered by wget.
Installing wget on Mac
To install wget on your Mac, we will use Homebrew, a very popular package manager among macOS users. Follow these steps:
-
Install Homebrew: If you do not have Homebrew installed yet, open the Terminal and run the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -
Install wget: Once Homebrew is installed, install wget by running this command in the Terminal:
brew install wget
Using wget to Download an Entire Website
Once wget is installed, you can use it to download an entire website with the following command:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.com
This command will automatically download all necessary resources from http://example.com. Let's break down the options used:
--mirror: Enables recursive downloading and maintains a local structure of the site.--convert-links: Converts the links to work locally.--adjust-extension: Adjusts file extensions to.htmlwhen necessary.--page-requisites: Downloads all files needed for the HTML pages to display correctly, including images, CSS, etc.--no-parent: Preventswgetfrom ascending to the parent directory, limiting the download to the specified site.
Some Variations
To exclude pages that contain the word "hotels" in the URL, you can use the --reject-regex option to prevent wget from downloading URLs that contain the word "hotels". The complete command would be:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --continue --reject-regex "hoteles" http://example.com
If you want to exclude pages that contain either "hotels" or "hoteles", you can use a regular expression that covers both words. Here's how you can do it:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --continue --reject-regex "hoteles|hotels|hotel" http://example.com
To ensure that wget only downloads URLs containing "hoteles" or "hotels", you can use the --accept-regex option. Here’s an example of how to do it:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --continue --accept-regex "hoteles|hotels" http://example.com
Main and Most Used wget Commands
Here are some of the most common and useful wget commands:
-
Download a file:
wget http://example.com/file.zip -
Recursive download:
wget -r http://example.com-ror--recursive: Enables recursive downloading.
-
Limit the download depth:
wget -r -l 5 http://example.com-lor--level: Limits the depth of the recursive download to 5 levels.
-
Download in the background:
wget -b http://example.com/file.zip-bor--background: Runswgetin the background.
-
Resume an interrupted download:
wget -c http://example.com/file.zip-cor--continue: Resumes the download of an interrupted file.
-
Limit download speed:
wget --limit-rate=100k http://example.com/file.zip--limit-rate: Limits the download speed. In this example, it is limited to 100 KB/s.
Conclusion
wget is a powerful and versatile tool that allows you to download entire websites and files with ease. With the right commands and options, you can automate numerous web management and download tasks. Now that you know how to install and use wget on your Mac, you can expand your tech toolkit and improve your workflow.