Download A Website Using wget
Sysadmin
Use the wget command line utility to download an entire website.
Be careful with recursive retrieval - you might download the entire internet!
wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --domains testsite.com \
     --no-parent \
         http://testsite.com/
- –recursive: downloads entire site
- –no-clobber: doesn’t overwrite files, useful for interrupted downloads
- –page-requisites: download all the files required to display the page (CSS, images etc)
- –html-extension: save files with extension HTML
- –convert-links: make links relative so they work off-line
- –domains: Set domains to be followed
- –no-parent: Don’t ascend to the parent directory when retrieving recursively - guarantees that only the files below a certain hierarchy will be downloaded
Alternative Method
This can sometimes works better:
wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://www.testsite.com
Friendlier on the target website and avoids getting blocked.
Note: We use this primarily for downloading our own or our client’s CMS based sites in “flat” HTML - so we’re only hitting our own site resources, or we’re downloading with permission.
If you’re using this method to download other people’s websites, be responsible.
See wget man page for more details.
comments powered by Disqus