Some days ago I was working in a vocabulary game and dictionary. The dictionary contains 1,10,000 words and meanings. I developed a vocabulary game where I had to randomly choose 10 words out of 1,10,000 dataset. Here I’m describing the possible solutions for this problem and which solution I used.
Data table
Table name is dictionary and it has id, word and meaning fields. id contains auto incremented id and it is unbroken sequence 1,2,3…..n.
| id | word | meaning |
| 1 | aback | Having the wind against the forward side of the sails |
| 2 | abandon | Forsake, leave behind |
| … | ….. | …. |
Problem: Get 10 words out of 1,10,000 words.
Solution 1
I asked a intermediate level php programmer to solve this problem and I think he doesn’t know much about mysql. So he sent me the solution:
//he retrieves all words from mysql to php
$sql = "SELECT word FROM dictionary";
$result = mysql_query($sql);
$rows = '';
$data = array();
if (!empty($result))
$rows = mysql_num_rows($result);
else
$rows = '';
$row = '';
if (!empty($rows)){
while ($row = mysql_fetch_assoc($result)){
$data[] = $row;
}
}
//he shuffles the full array in php
srand((float)microtime() * 1000000);
shuffle($data);
//then he collects 10 words
$word = array();
for ($i = 0; $i < 10; ++$i){
$word[] = $data[$i];
}
LOL
, a brute force solution by that intermediate level programmer. This solution might be good for 1000 or not more than 2000 dataset. But never use the solution for small dataset also, if you solve a problem in mysql then solve that in there. Because it would be faster.
Solution 2
SELECT word FROM dictionary ORDER BY RAND() LIMIT 10;
A nice and simple mysql query. This simple query will give you 10 random words from the dictionary. So you just have to run this query from your php code. Its one of the best solution for small size data like 25000 or not more than 40000. But remember your dataset is 1,10,000 words. So why not looking for another faster technique.
Solution 3
If you’re sure your dataset has unbroken sequence like 1,2,3…..n. You can solve this problem in another faster way.
$randids = array();
for($i = 1; $i <= 10; ++$i){
$prev = 1;
$range = 10000 * $i;
//mt_rand give random number from a given range
$rndid = mt_rand($prev, $range);
$randids[] = $rndid;
$prev = $range;
}
//we passed the ids we need from the table
$sql = "SELECT word FROM dictionary WHERE id in (" . implode(",", $randids) . ")";
$result = mysql_query($sql);
Here you first randomly generated 10 random ids using php code. Then you pass those ids as string in mysql query. So mysql only returns those words from the table.
I used this solution (3rd solution) for that vocabulary project I was working and it performs super. If you know better solution then why not you share that with us? Share your thoughts as comments.
Lets post-mortem about my 3rd solution
2nd query:
SELECT word FROM dictionary ORDER BY RAND() LIMIT 10;
it takes: 10 rows in set (0.09 sec)
3rd query:
SELECT word FROM dictionary WHERE id IN ( 1, 1000, 10000, 33, 444, 555, 533, 55, 6666, 66666 );
it takes: 10 rows in set (0.0006 sec)
So 3rd query is faster than 2nd query.
Now check the time php code takes:
$time_start = microtime(true);
$randids = array();
for($i = 1; $i <= 10; ++$i){
$prev = 1;
$range = 10000 * $i;
//mt_rand give random number from a given range
$rndid = mt_rand($prev, $range);
$randids[] = $rndid;
$prev = $range;
}
$data = implode(',', $randids);
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "This code taks $time seconds or ";
printf("%.5f", $time);
It prints This code taks 4.6968460083008E-5 seconds or 0.00005
So 3rd solution php with mysql takes: 0.0006 + 0.00005 = 0.00065 which is better than the mysql query that takes (0.09 sec)







Solution 2 should be the fastest among the 3. Shouldn’t it ?
@maSnun, I updated the post with post-mortem report. Take a look.
See: http://jan.kneschke.de/projects/mysql/order-by-rand/
Solution 1 is weird…. we used to call it BANGLA SOLUTION
anyway… i like the solution 3.. great writing.. keep it up..
WoW! Blindingly fast! Only drawback is in case of a broken sequence it might fail.
Great write up!
@maSnun, definitely the third solution isn’t for broken sequence and I mentioned it in the third solution.
Like the third solution! initially like others i was thinking about the second one … after reading the post mortem report I realize that. Thanks!
What the solution for broken sequence? Isn’t there any faster solution for broken sequence like solution 3?
I think it’d be best to convert the broken sequence into an unbroken sequence.
Just to let everyone know that Random Numbers, selects are always expensive to the processors. Either you do it on PHP or in mysql it will have scalability issues. If we run an apache benchmark on the script for 30 concurrent users with 100 requests, we’ll see a lot of fails. So if an application is going to have a huge number of concurrent users, some more efficient ways will be needed.
Lot of people have suggested a pseudo random method. Some even use special hardware which generate random numbers from radio noise frquency to save the processor from doing an expensive sum.
But of course solution 3 is smarter and can be user for fairly large sized app i guess.
There is a solution for broken sequences that is only slightly slower than solution 3. The solution is to first do a row count, select a set of 10 random numbers between 0 and the row count with php, and then do 10 select statements using a LIMIT statement.
What number is “1,10,000″?
For broken sequences, if the number of missing values is very small in comparison to the number of present values, couldn’t you just try and fetch, say, 20 random entries, then pick the first ten returned in the script.
Dear mahmud ahsan
You did good job
really great.
Hi. Thanks for useful post