Web scraping is a computer science technique for extracting information and data from web sites. In data mining research scraping and analysing of information is discussed. Practically web scraping is necessary if you want to develop a web application where you want to show customised information from various websites. For this you’ve to first scrap data from the sites and then apply some logic to filter the information.
A few months ago I worked in a web project where we had to develop web & SMS application. Our main project is fully web application where we need to integrate SMS and some voice feature. We also wanted to integrate this feature easily and cost effectively. After some R&D in google search we found several web services for this purpose. Among all the services we liked Twilio.
There are several reasons to choose Twilio. One of the most important reason is, its very easy to integrate Twilio’s REST API. The price is also cheap compare to other services. In this post I’ll not write details about Twilio. You can easily learn more about it by there web site. I’ll mention some features that we integrated in our application.
For last 8 months, I have been working on a web application. We are developing the application based on CodeIgniter framework. In our project there are normal web version and mobile view version. Some days ago we noticed that, some people can’t login their account via mobile version though there username & password are correct. After debugging and digging the problem we found that in iPod Touch 2G/3GS Safari browser, this problem is happening.
For last few weeks I have been working in a web application, where I’ve to do some automated tasks. This is the first time I learned cron process, wrote php script for the cron process and some basic shell script that runs the php script. So here I’m describing the things I learned.
Summary of the tasks:
- Write a php script
- Test the script
- Confirm only one process of the php script is running
- Set the cron so that the php script will run in every 2 minute
Before php 5.2.0 when we have to validate or filter user data, we normally use regex and some php functions. Some of those regex are difficult to understand. So normally most of the coders search in google to collect the correct regex to validate data and also use some php functions to filter data.
In php 5.2.0 a new extension is provided named filter to make these filter tasks much easy. You can install it in your linux distro by simply typing in shell pecl install filter
Before proceeding next at first check the available filters in your system:
echo '<pre>'; print_r(filter_list()); echo '</pre>';
Output in my system:
Array (  => int  => boolean  => float  => validate_regexp  => validate_url  => validate_email  => validate_ip  => string  => stripped  => encoded  => special_chars  => unsafe_raw  => email  => url  => number_int  => number_float  => magic_quotes  => callback )
filter_list() is a method that returns a list of all supported filters.
In facebook stream you’ll see the time period at the bottom of the stream. For example: 4 minutes ago, 2 days ago, 3 weeks ago…. In our recent project we have to show similar time fashion for our application’s activity stream. So I write a function to retrieve the time duration.
In our mysql database, we used a column named ‘created‘ as DATETIME in the table. I retrieve that created field as unix_timestamp. So the sql query looks like “SELECT UNIX_TIMESTAMP(created) as created from tableName”
After getting the data I just pass the created value in my function. Here is the function:
So what is defensive programming? Shortly said, in any problematic situation your code doesn’t break rather bypass the situation by taking proper steps. If you want to know details just visit wikipedia
I am just writing this article because I found that many programmers don’t accept this approach. So if you provide unexpected data sometimes the application will crash or show you unwanted error message and sometimes important data (for web application).
Some days ago I was working in a vocabulary game and dictionary. The dictionary contains 1,10,000 words and meanings. I developed a vocabulary game where I had to randomly choose 10 words out of 1,10,000 dataset. Here I’m describing the possible solutions for this problem and which solution I used.
Table name is dictionary and it has id, word and meaning fields. id contains auto incremented id and it is unbroken sequence 1,2,3…..n.
|1||aback||Having the wind against the forward side of the sails|
|2||abandon||Forsake, leave behind|
Some days ago I was working in a quiz project. Where user will play quiz and for each correct answer they will earn points. One of the task of this quiz application was, to get rank of a particular user. Now I am showing how could I solve this problem using mysql query.
Here is the table structure:
CREATE TABLE IF NOT EXISTS `quiz_user` ( `uid` BIGINT UNSIGNED NOT NULL , `participated` SMALLINT UNSIGNED NULL DEFAULT 0 , `correct` SMALLINT UNSIGNED NULL DEFAULT 0 , `wrong` SMALLINT UNSIGNED NULL DEFAULT 0 , `created` DATETIME NULL , `updated` DATETIME NULL , PRIMARY KEY (`uid`) ) ENGINE = InnoDB
- Orientation Detection at Runtime in iPad or iPhone April 27, 2013
- PHP for Web scraping and bot development April 22, 2013
- Personal Health Records: Retrieving Contextual Information with Google Custom Search December 8, 2012
- Bad experience on Malaysian Expressway from Melaka to Kuala-lumpur December 1, 2012
- Graph API & IFrame Base Facebook Application Development May 28, 2010
- PHP SDK & Graph API base Facebook Connect Tutorial May 2, 2010
- PHP SDK 3.0 & Graph API base Facebook Connect Tutorial May 26, 2011
- Graph API & IFrame Base Facebook Application Development PHP SDK 3.0 May 22, 2011
- Jones sabo new york cosmetic physicians will develop followe: I will immediately grab your rss feed as I can't ...
- www.surffc.com: There isn't much a person may do to stop them fro...
- Trevor: advertising...
- business continuity plan: Hey There. I discovered your weblog the usage of ...
Follow @mahmudahsan on Twitter