Web scraping or web harvesting is a technique to extract large amounts of data from websites whereby the data is extracted and saved to your computer or server.
Data displayed by any websites can be viewed using a web browser. They may not offer the functionality to download a file for personal use. You have to copy and paste the data which is time consuming and boring stuff. By using web Scraping technique of automation, this process instead of manually copying the data from websites, the Web Scraping software will perform the same task within a fraction of the time.
In this tutorial, we are going to use PHP to download image from any website and store into our local server. Web Scraping is using different languages like Python, PHP, Javascript or Ruby.
So, let's start tutorial by creating file ImageDownload.php
and create a class in it.
<?php
class ImageDownload
{
/**
* file directory path
*
* @return void
*/
public $folder = 'images';
/**
* website link
*
* @return void
*/
public $websitelink;
/**
* Create a new class instance.
*
* @return void
*/
function __construct($websitelink)
{
if (!file_exists($this->folder)) {
mkdir($this->folder, 0777, true);
}
$this->websitelink = $websitelink;
}
/**
* save file.
*
* @return void
*/
public function getLinks()
{
$html = file_get_contents($this->websitelink);
preg_match_all("{<img\\s*(.*?)src=('.*?'|\".*?\"|[^\\s]+)(.*?)\\s*/?>}ims", $html, $image_urls, PREG_SET_ORDER);
return $image_urls;
}
/**
* save file.
*
* @return void
*/
public function saveImage($images)
{
foreach ($images as $val) {
$pos = strpos($val[2],"/");
$link = substr($val[2],1,-1);
if($pos == 1) {
$site = parse_url($this->$websitelink);
$image_url = $site['scheme'].'://'.$site['host'].$link;
} else {
$image_url = $link;
}
$image_name = pathinfo($image_url)['basename'];
copy($image_url, $this->folder.'/'.$image_name);
}
}
}
In the above class, getLinks()
function returns all image links from any webpage and saveImage()
method will save all links one by one with copy() function.
Now we need to create second file which will create object of this class and call methods. Create a file index.php
file and include the above class file.
<?php
include "ImageDownload.php";
$website_link = 'https://hackthestuff.com/article/';
$downloader = new ImageDownload($website_link);
$images = $downloader->getLinks();
$downloader->saveImage($images);
Now run PHP server with command php -S 0.0.0.0:8000
and run http://localhost:8000
in your browser.
I hope you liked this article and will help you.
Hi, My name is Harsukh Makwana. i have been work with many programming language like php, python, javascript, node, react, anguler, etc.. since last 5 year. if you have any issue or want me hire then contact me on [email protected]
How to create custom checkboxes in HTML using CSS and jQuery
Use the CSS :checked Pseudo-class with j...How to get form data in jQuery using serialize and serializeArray Method
jQuery serialize() method used to create...Install Nginx HTTP server on Ubuntu
Nginx is an open source HTTP web server...How to Compare current password with hash password in Laravel
In this article we will share with you h...How to Change the Class of an Element Using JavaScript
Use the classList Property In modern...