Basic Web Scrapper
Abstract
This is a basic web scrapper that scrapes the data from the website and stores it in a csv file. This is a beginner level project. We are going to use the BeautifulSoup library for this project and requests library to get the data from the website. In this application, we are going to scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/phones/touch and store the data in a csv file. We are going to scrape the data of the product name, price, and description of the product.
Prerequisites
- Python 3.6 or above
- BeautifulSoup library
- requests library
- Text editor or IDE
Before we start
Before we start, we need to install the BeautifulSoup library and requests library. To install the BeautifulSoup library, we need to run the following command in the terminal.
C:\Users\username>pip install beautifulsoup4
C:\Users\username>pip install beautifulsoup4
To install the requests library, we need to run the following command in the terminal.
C:\Users\username>pip install requests
C:\Users\username>pip install requests
Getting Started
Creating a project
- Create a folder named
basicwebscrapper
basicwebscrapper
and open it in the text editor or IDE. - Create a file named
basicwebscrapper.py
basicwebscrapper.py
in thebasicwebscrapper
basicwebscrapper
folder. - Open the
basicwebscrapper.py
basicwebscrapper.py
file in the text editor or IDE. - Copy the code below and paste it into the
basicwebscrapper.py
basicwebscrapper.py
file.
Write the code
- Copy and paste the following code into the
basicwebscrapper.py
basicwebscrapper.py
file.
βοΈ Basic Web Scrapper
# Basic Web Scrapper
# Importing Libraries
import requests
from bs4 import BeautifulSoup
# URL
url = "https://webscraper.io/test-sites/e-commerce/allinone/phones/touch"
# Requesting the URL
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Finding the phones
phones = soup.find_all("div", class_="card-body")
# Creating a CSV file
open_file = open("phones.csv", "a")
headers = "Name, Price, Description, Reviews, Rating, Image\n"
open_file.write(headers)
# Looping through the phones
for phone in phones:
name = phone.find("a", class_="title")
price = phone.find("h4", class_="price")
description = phone.find("p", class_="description")
reviews = phone.find("p", class_="float-end review-count")
rating = phone.find("p", attrs={"data-rating": True})
image = phone.find("img", class_="img-responsive")["src"]
# Writing to the CSV file
open_file.write(f'{name.text}, {price.text}, {description.text}, {reviews.text}, {rating["data-rating"]}, {image}\n')
print(f'Name: {name.text} \nPrice: {price.text} \nDescription: {description.text} \nReviews: {reviews.text} \nRating: {rating["data-rating"]} \nImage: {image} \n')
# Closing the CSV file
open_file.close()
# Basic Web Scrapper
# Importing Libraries
import requests
from bs4 import BeautifulSoup
# URL
url = "https://webscraper.io/test-sites/e-commerce/allinone/phones/touch"
# Requesting the URL
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Finding the phones
phones = soup.find_all("div", class_="card-body")
# Creating a CSV file
open_file = open("phones.csv", "a")
headers = "Name, Price, Description, Reviews, Rating, Image\n"
open_file.write(headers)
# Looping through the phones
for phone in phones:
name = phone.find("a", class_="title")
price = phone.find("h4", class_="price")
description = phone.find("p", class_="description")
reviews = phone.find("p", class_="float-end review-count")
rating = phone.find("p", attrs={"data-rating": True})
image = phone.find("img", class_="img-responsive")["src"]
# Writing to the CSV file
open_file.write(f'{name.text}, {price.text}, {description.text}, {reviews.text}, {rating["data-rating"]}, {image}\n')
print(f'Name: {name.text} \nPrice: {price.text} \nDescription: {description.text} \nReviews: {reviews.text} \nRating: {rating["data-rating"]} \nImage: {image} \n')
# Closing the CSV file
open_file.close()
- Save the file.
- Open the terminal in the
basicwebscrapper
basicwebscrapper
folder. - Run the following command in the terminal.
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
Name: Nokia 123
Price: $24.99
Description: 7 day battery
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: LG Optimus
Price: $57.99
Description: 3.2" screen
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Samsung Galaxy
Price: $93.99
Description: 5 mpx. Android 5.0
Reviews: 3 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Nokia X
Price: $109.99
Description: Andoid, Jolla dualboot
Reviews: 4 reviews
Rating: 4
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Sony Xperia
Price: $118.99
Description: GPS, waterproof
Reviews: 6 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Ubuntu Edge
Price: $499.99
Description: Sapphire glass
Reviews: 2 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: White
Reviews: 10 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Silver
Reviews: 8 reviews
Rating: 2
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Black
Reviews: 1 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
Name: Nokia 123
Price: $24.99
Description: 7 day battery
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: LG Optimus
Price: $57.99
Description: 3.2" screen
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Samsung Galaxy
Price: $93.99
Description: 5 mpx. Android 5.0
Reviews: 3 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Nokia X
Price: $109.99
Description: Andoid, Jolla dualboot
Reviews: 4 reviews
Rating: 4
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Sony Xperia
Price: $118.99
Description: GPS, waterproof
Reviews: 6 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Ubuntu Edge
Price: $499.99
Description: Sapphire glass
Reviews: 2 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: White
Reviews: 10 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Silver
Reviews: 8 reviews
Rating: 2
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Black
Reviews: 1 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Explanation
- First, we import the
requests
requests
library and theBeautifulSoup
BeautifulSoup
library.
import requests
from bs4 import BeautifulSoup
import requests
from bs4 import BeautifulSoup
- Then, we assign the URL to the variable
url
url
.
url = "https://webscraper.io/test-sites/e-commerce/allinone/phones/touch"
url = "https://webscraper.io/test-sites/e-commerce/allinone/phones/touch"
- Next, we request the URL and assign it to the variable
response
response
. Then, we parse the HTML using thehtml.parser
html.parser
and assign it to the variablesoup
soup
.
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
- After that, we find all the phones and assign it to the variable
phones
phones
.
phones = soup.find_all("div", class_="card-body")
phones = soup.find_all("div", class_="card-body")
- Then, we create a CSV file named
phones.csv
phones.csv
and open it in append mode. Then, we write the headers to the CSV file.
open_file = open("phones.csv", "a")
headers = "Name, Price, Description, Reviews, Rating, Image\n"
open_file.write(headers)
open_file = open("phones.csv", "a")
headers = "Name, Price, Description, Reviews, Rating, Image\n"
open_file.write(headers)
- Next, we loop through the phones and find the name, price, description, reviews, rating, and image of the phone. Then, we write the data to the CSV file.
name = phone.find("a", class_="title")
price = phone.find("h4", class_="price")
name = phone.find("a", class_="title")
price = phone.find("h4", class_="price")
and so onβ¦
after that, we write the data to the CSV file.
open_file.write(f'{name.text}, {price.text}, {description.text}, {reviews.text}, {rating["data-rating"]}, {image}\n')
open_file.write(f'{name.text}, {price.text}, {description.text}, {reviews.text}, {rating["data-rating"]}, {image}\n')
- Finally, we close the CSV file.
open_file.close()
open_file.close()
Usage
- Open the terminal in the
basicwebscrapper
basicwebscrapper
folder. - Run the following command in the terminal.
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
- The data will be scraped from the website and stored in the
phones.csv
phones.csv
file. - The data will be printed in the terminal.
- The data will be stored in the
phones.csv
phones.csv
file.
Name, Price, Description, Reviews, Rating, Image
Nokia 123, $24.99, 7 day battery, 11 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
LG Optimus, $57.99, 3.2" screen, 11 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
Samsung Galaxy, $93.99, 5 mpx. Android 5.0, 3 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
Nokia X, $109.99, Andoid, Jolla dualboot, 4 reviews, 4, /images/test-sites/e-commerce/items/cart2.png
Sony Xperia, $118.99, GPS, waterproof, 6 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Ubuntu Edge, $499.99, Sapphire glass, 2 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, White, 10 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, Silver, 8 reviews, 2, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, Black, 1 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Name, Price, Description, Reviews, Rating, Image
Nokia 123, $24.99, 7 day battery, 11 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
LG Optimus, $57.99, 3.2" screen, 11 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
Samsung Galaxy, $93.99, 5 mpx. Android 5.0, 3 reviews, 3, /images/test-sites/e-commerce/items/cart2.png
Nokia X, $109.99, Andoid, Jolla dualboot, 4 reviews, 4, /images/test-sites/e-commerce/items/cart2.png
Sony Xperia, $118.99, GPS, waterproof, 6 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Ubuntu Edge, $499.99, Sapphire glass, 2 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, White, 10 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, Silver, 8 reviews, 2, /images/test-sites/e-commerce/items/cart2.png
Iphone, $899.99, Black, 1 reviews, 1, /images/test-sites/e-commerce/items/cart2.png
- The data will be printed in the terminal.
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
Name: Nokia 123
Price: $24.99
Description: 7 day battery
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: LG Optimus
Price: $57.99
Description: 3.2" screen
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Samsung Galaxy
Price: $93.99
Description: 5 mpx. Android 5.0
Reviews: 3 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Nokia X
Price: $109.99
Description: Andoid, Jolla dualboot
Reviews: 4 reviews
Rating: 4
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Sony Xperia
Price: $118.99
Description: GPS, waterproof
Reviews: 6 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Ubuntu Edge
Price: $499.99
Description: Sapphire glass
Reviews: 2 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: White
Reviews: 10 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Silver
Reviews: 8 reviews
Rating: 2
Image: /images/test-sites/e-commerce/items/cart2.png
C:\Users\username\basicwebscrapper>python basicwebscrapper.py
Name: Nokia 123
Price: $24.99
Description: 7 day battery
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: LG Optimus
Price: $57.99
Description: 3.2" screen
Reviews: 11 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Samsung Galaxy
Price: $93.99
Description: 5 mpx. Android 5.0
Reviews: 3 reviews
Rating: 3
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Nokia X
Price: $109.99
Description: Andoid, Jolla dualboot
Reviews: 4 reviews
Rating: 4
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Sony Xperia
Price: $118.99
Description: GPS, waterproof
Reviews: 6 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Ubuntu Edge
Price: $499.99
Description: Sapphire glass
Reviews: 2 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: White
Reviews: 10 reviews
Rating: 1
Image: /images/test-sites/e-commerce/items/cart2.png
Name: Iphone
Price: $899.99
Description: Silver
Reviews: 8 reviews
Rating: 2
Image: /images/test-sites/e-commerce/items/cart2.png
Next Steps
Congratulations π you have successfully created a basic web scrapper that scrapes the data from the website and stores it in a csv file.
Here are some ideas to get you started:
- Scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops and store the data in a csv file.
- Scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/computers/tablets and store the data in a csv file.
- Add a feature to scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/phones/touch and store the data in a json file.
- Add a feature to scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/phones/touch and store the data in a database.
- Create a GUI for the application.
- Create a web application for the application.
- Add a cron_job to scrape the data from the website https://webscraper.io/test-sites/e-commerce/allinone/phones/touch and store the data in a csv file every 24 hours.
Resources
Conclusion
In this tutorial, we learned how to create a basic web scrapper that scrapes the data from the website and stores it in a csv file. This is a beginner level project. We used the BeautifulSoup library for this project and requests library to get the data from the website. In this application, we scraped the data from the website https://webscraper.io/test-sites/e-commerce/allinone/phones/touch and stored the data in a csv file. For more information, visit the resources listed above. For more projects like this, visit Python Central Hub.
Was this page helpful?
Let us know how we did