How to Download All Images from a Web Page in Python?

4 min readApr 18, 2021

Web scraping is a technique to fetch data from websites. While surfing the web, many websites don’t allow the user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-consuming. Web Scraping is the automation of the data extraction process from websites. In this article, we will discuss how we can download all images from a web page using python.

STEP 1: Preparation

Modules Needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come in built-in with Python.
requests: Requests allow you to send HTTP/1.1 requests extremely easily. This module also does not come in built-in with Python.
os: The OS module in Python provides functions for interacting with the operating system. OS comes under Python’s standard utility modules. This module provides a portable way of using operating system-dependent functionality.

pip install bs4
pip install requests

STEP 2: Import The Packages

In the first line, we have to initialize the packages that we will use so that the project can run well. That matter can be done by importing the package.

from bs4 import BeautifulSoup
import requests
import os

STEP 3: Get the Response Text

To get the response text, we must know the URL that we are targeting to do the scraping.

https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=773862163&keywords=hotel&location=london

After that, enter the URL into the request function with the get method.
with additional functions
Create a separate folder to download images using mkdir method on os.
and I will directly teach you how to create a key format because here we will be looking for the HOTEL type, LOCATION, and a number of pages. and we will get a response using a user agent so that it is still read as a browser in scraping the images that will be obtained.

import requests
from bs4 import BeautifulSoup
import os

path_folder = os.path.join(os.getcwd()) #make create folder

key = 'hotels'
location = 'london'
link = 'https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=1288835170&keywords={}&location={}&'.format(key, location)
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

try:
    os.makedirs('images') # folder create
except:
    pass

STEP 4: Get the picture Element dan Lopping Variable with “for” Function

soup = BeautifulSoup(req.text, 'html.parser')
items = soup.findAll('div', 'row businessCapsule--mainRow')
for items in items:

    image = items.find('div', 'col-sm-4 col-md-4 col-lg-5 businessCapsule--leftSide').find('img')['data-original']
    alt_item = items.find('div', 'col-sm-4 col-md-4 col-lg-5 businessCapsule--leftSide').find('img')['alt']
    alt_item = str(alt_item).replace(' ', '-').replace('/', '').replace('*', '') + '.jpg'
    if 'http' not in image:
        image = 'https://www.yell.com{}'.format(image)

Then we will parse the data contained in the product class by repeating the ‘for’ function. The data that I took in this article include name, club, nation, and league. The name, nation, and league are in the same tag and class so we apply it with the ‘FindAll’ function so that it becomes a list of data, then we will call the data in what order. You can develop your own by adding data variables that you will take and enter into the “find (‘ <tag> ‘, {‘<attribute name>’:’ attribute value’})” function. Use the string strip method to remove the characters at the front and back of the string. and

In the picture, every product is inside
tag ‘div’ and class ‘row businessCapsule — mainRow’ because in all products in one tag and the same class, we will use the function
“findAll” in the BeautifulSoup module to retrieve all the information contained in the product.

STEP 5: Pagination

To do pagination, we do a page lopping with a certain range, for example from pages 1 to 5, then we loop the range with the “for” function that loops all the variable data that we scrape so that later the data from each page can be read properly. On the yell.com site, there are 10 pages.

for page in range(1, 5):
    req = requests.get(link+f"pageNum={str(page)}", headers=headers)

STEP 6: Download image using File Handling

After successfully scrapping the data, we can save it in any folder we want. using file handling as follows. for more details click here to see how to complete the guide using the os library.

with open('images/' + alt_item, 'wb') as f: 
    images = requests.get(image)
    print(f'Take Pictures from Item {alt_item} Status Web:  {images.status_code}')
    f.write(images.content)
    f.close()
print('Done')

Check scraped images. Finish, good luck :)

I hope you can benefit from this article and develop it according to your own needs and imagination. And I ask forgiveness for any words and behave which are not supposed to be. Thank you for your kind attention Guys. Stay tuned for my next articles!…. :)

the Github link you can get here

How to Download All Images from a Web Page in Python?

STEP 1: Preparation

Modules Needed

STEP 2: Import The Packages

STEP 3: Get the Response Text

STEP 4: Get the picture Element dan Lopping Variable with “for” Function

STEP 5: Pagination

STEP 6: Download image using File Handling

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Farid Winarto

Responses (1)