PHP for Web scraping and bot development

Web ScrapWeb scraping is a computer science technique for extracting information and data from web sites. In data mining research scraping and analysing of information is discussed. Practically web scraping is necessary if you want to develop a web application where you want to show customised information from various websites.  For this you’ve to first scrap data from the sites and then apply some logic to filter the information.

Practically you can use different languages to write the program that will automatically search and collect the information. But if you’re PHP experts and want to use PHP for this kind of stuff here I am referring a book with PHP library. Practically I found this book is very helpful to learn the topic and their library is easy to use.

Checkout the following book

 Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL


Checkout the library:

Download Code

Here I am describing a simple example of web scraping idea. Suppose you want to develop a web application where you want to show classified information about current news from different newspaper. So that people who are interested on a particular news can get all the newspaper link in your classified category.

News Classification

Web scraping for news classification

In the following diagram it is describing that, the application will retrieve all the news from different websites, then classify and categorised information based on interest, like politics, sports, entertainment etc. The above book and the library will help you to make this kind of application.

You can also extend the thing to develop a iPhone or Android news application. On that case your scraping script should be installed in a server where it automatically will collect and classify information and in the smart phone application you just retrieved the data from your server.

mahmud ahsan

Computer programmer and hobbyist photographer from Bangladesh, lives in Malaysia. My [Business | Twitter | Linkedin | Instagram | Flickr | 500px]

You may also like