MySQL random data selection

mysqlSome days ago I was working in a vocabulary game and dictionary. The dictionary contains 1,10,000 words and meanings. I developed a vocabulary game where I had to randomly choose 10 words out of 1,10,000 dataset. Here I’m describing the possible solutions for this problem and which solution I used.

Data table

Table name is dictionary and it has id, word and meaning fields. id contains auto incremented id and it is unbroken sequence 1,2,3…..n.

id word meaning
1 aback Having the wind against the forward side of the sails
2 abandon Forsake, leave behind
….. ….

Problem: Get 10 words out of 1,10,000 words.

Solution 1

I asked a intermediate level php programmer to solve this problem and I think he doesn’t know much about mysql. So he sent me the solution:

   //he retrieves all words from mysql to php
   $sql     =  "SELECT word FROM dictionary";
   $result =  mysql_query($sql);
   $rows  =  '';

   $data = array();
   if (!empty($result))
        $rows      =  mysql_num_rows($result);
   else
        $rows      =  '';

    $row = '';
    if (!empty($rows)){
        while ($row = mysql_fetch_assoc($result)){
                $data[]   = $row;
        }
    }

   //he shuffles the full array in php
   srand((float)microtime() * 1000000);
   shuffle($data);

   //then he collects 10 words
   $word = array();
   for ($i = 0; $i < 10; ++$i){
     $word[] = $data[$i];
   }

LOL :D, a brute force solution by that intermediate level programmer. This solution might be good for 1000 or not more than 2000 dataset. But never use the solution for small dataset also, if you solve a problem in mysql then solve that in there. Because it would be faster.

Solution 2

SELECT word FROM dictionary ORDER BY RAND() LIMIT 10;

A nice and simple mysql query. This simple query will give you 10 random words from the dictionary.  So you just have to run this query from your php code. Its one of the best solution for small size data like 25000 or not more than 40000. But remember your dataset is 1,10,000 words. So why not looking for another faster technique.

Solution 3

If you’re sure your dataset has unbroken sequence like 1,2,3…..n. You can solve this problem in another faster way.

   $randids = array();

   for($i = 1; $i <= 10; ++$i){
       $prev  = 1;
       $range = 10000 * $i;
       //mt_rand give random number from a given range
       $rndid  =  mt_rand($prev, $range);
       $randids[] = $rndid;
       $prev  = $range;
   }

   //we passed the ids we need from the table
    $sql     =  "SELECT word FROM dictionary WHERE id in (" . implode(",", $randids) . ")";
    $result =  mysql_query($sql);

Here you first randomly generated 10 random ids using php code. Then you pass those ids as string in mysql query. So mysql only returns those words from the table.

I used this solution (3rd solution) for that vocabulary project I was working and it performs super. If you know better solution then why not you share that with us? Share your thoughts as comments.

Lets post-mortem about my 3rd solution

2nd query:

SELECT word FROM dictionary ORDER BY RAND() LIMIT 10;

it takes: 10 rows in set (0.09 sec)

3rd query:

SELECT word FROM dictionary WHERE id IN ( 1, 1000, 10000, 33, 444, 555, 533, 55, 6666, 66666 );

it takes: 10 rows in set (0.0006 sec)

So 3rd query is faster than 2nd query.

Now check the time php code takes:

    $time_start = microtime(true);

    $randids = array();

    for($i = 1; $i <= 10; ++$i){
        $prev  = 1;
        $range = 10000 * $i;
        //mt_rand give random number from a given range
        $rndid  =  mt_rand($prev, $range);
        $randids[] = $rndid;
        $prev  = $range;
    }

$data = implode(',', $randids);

$time_end = microtime(true);
$time = $time_end - $time_start;

echo "This code taks $time seconds or ";
printf("%.5f", $time);

It prints This code taks 4.6968460083008E-5 seconds or 0.00005

So 3rd solution php with mysql takes: 0.0006 + 0.00005 = 0.00065 which is better than the mysql query that takes (0.09 sec)

About mahmud ahsan

Founder And Lead Programmer at iThinkdiff.net

, , ,

15 Responses to MySQL random data selection

  1. maSnun April 26, 2010 at 7:47 pm #

    Solution 2 should be the fastest among the 3. Shouldn’t it ?

  2. mahmud ahsan April 26, 2010 at 9:17 pm #

    @maSnun, I updated the post with post-mortem report. Take a look.

  3. Matthew Montgomery April 26, 2010 at 9:31 pm #

    See: http://jan.kneschke.de/projects/mysql/order-by-rand/

  4. TRIVUz April 26, 2010 at 9:56 pm #

    Solution 1 is weird…. we used to call it BANGLA SOLUTION ;)

    anyway… i like the solution 3.. great writing.. keep it up.. :)

  5. maSnun April 26, 2010 at 11:51 pm #

    WoW! Blindingly fast! Only drawback is in case of a broken sequence it might fail.

    Great write up!

  6. mahmud ahsan April 27, 2010 at 12:08 am #

    @maSnun, definitely the third solution isn’t for broken sequence and I mentioned it in the third solution. :)

  7. Himel Nag Rana April 27, 2010 at 10:09 am #

    Like the third solution! initially like others i was thinking about the second one … after reading the post mortem report I realize that. Thanks!

  8. Abdullah Al Jahid April 27, 2010 at 2:16 pm #

    What the solution for broken sequence? Isn’t there any faster solution for broken sequence like solution 3?

  9. maSnun April 27, 2010 at 2:22 pm #

    I think it’d be best to convert the broken sequence into an unbroken sequence.

  10. Mahbub May 3, 2010 at 10:24 pm #

    Just to let everyone know that Random Numbers, selects are always expensive to the processors. Either you do it on PHP or in mysql it will have scalability issues. If we run an apache benchmark on the script for 30 concurrent users with 100 requests, we’ll see a lot of fails. So if an application is going to have a huge number of concurrent users, some more efficient ways will be needed.

    Lot of people have suggested a pseudo random method. Some even use special hardware which generate random numbers from radio noise frquency to save the processor from doing an expensive sum.

    But of course solution 3 is smarter and can be user for fairly large sized app i guess.

  11. LiquidFyre Games May 12, 2010 at 8:44 am #

    There is a solution for broken sequences that is only slightly slower than solution 3. The solution is to first do a row count, select a set of 10 random numbers between 0 and the row count with php, and then do 10 select statements using a LIMIT statement.

  12. Paul July 24, 2010 at 11:39 am #

    What number is “1,10,000”?

  13. Matt August 14, 2010 at 9:47 pm #

    For broken sequences, if the number of missing values is very small in comparison to the number of present values, couldn’t you just try and fetch, say, 20 random entries, then pick the first ten returned in the script.

  14. krnathi kumar November 1, 2011 at 6:01 pm #

    Dear mahmud ahsan
    You did good job
    really great.

  15. Vasif February 28, 2013 at 2:15 pm #

    Hi. Thanks for useful post :)