Adrian World Design
  • Services
    • Website Consulting
    • Startup Consulting
    • Zend Development
  • Portfolio
    • My Framework
  • Knowledge
    • Web Standards
    • PHP bare-bones
    • Zend Framework
    • Zend Framework 2
    • Git - Github
    • Search Engine Optimization
    • Web Hosting
    • Linux
    • Microsoft Windows
    • Web Browsers
    • Mobile Devices
  • About
    • Business
    • Professionally
    • Personally
  • Contact
    • Contact Form
    • Phone
    • Email
    • Messaging

Knowledge Base Overview

Doing curl_multi_exec the right way

Knowledge ⇒ PHP bare-bones ⇒ Download content with cURL ⇒ Doing curl_multi_exec the right way
Tweet
Share on Tumblr

Created: Dec 1, 2012, 11:25:08 AM CDTLast updated: Aug 6, 2013, 7:50:46 AM CDT

If you want to download content with PHP like either webpages in HTML or feeds with XML the best tool for this job is certainly cURL. Actually it is a combination of a variety of cURL functions.

For accessing a few handles, i.e. downloading multiple pages at the same time, and in parallel instead of serial like in a foreach loop, the curl_multi_exec() function seems to be right tool. Unfortunately, the documentation is a little hazy on how to exactly use this function.

Although it mentions that it processes each of the handles in the stack and it can be called whether or not a handle needs to read or write data there are a few unsolved mysteries with the examples.

Why all the loops

If you look at the example in the curl_multi_exec manual you'll notice that they suggest two do-while loops. And yes the second loop with curl_multi_exec is yet in another while loop. So, there are actually three loops. You wonder, why? Lets look at the example real quick.

Note that we only worry about curl_multi_exec here and that we don't look at setting up each handle, only the multi handle sequence.

//execute the handles
do {
 $mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
 
while ($active && $mrc == CURLM_OK) {
 if (curl_multi_select($mh) != -1) {
  do {
   $mrc = curl_multi_exec($mh, $active);
  } while ($mrc == CURLM_CALL_MULTI_PERFORM);
 }
}

The loop that confused me the most is the first one. Especially because the curl_multi_exec command is issued again in the second loop. Why would you send this command twice and why loop over it twice? Interesting question right?

First loop is for nothing

Turns out the first loop is actually for nothing. If you look at the curl_multi_exec function long enough and wonder about $mrc, $mh and $active and what they do it suddenly dawns on you.

What is $mrc

First the $mrc variable and from the manual we learn that the response is a cURL code defined in the cURL Predefined Constants. In esssence it is a regular response and as with any other PHP function curl_multi_exec is no different and only returns a response once it is finished. Which means there should be only ONE response. In a perfect world this single response is 0 (zero) or equal to the predefined constant CURLM_OK.

Feel free and try this

$c = 0;
 do {
  $mrc = curl_multi_exec($mh, $active);
  $c++;
 } while ($mrc == CURLM_CALL_MULTI_PERFORM);
echo $c; // surprise, this is 1

When I did this I was almost 100% sure $c = 1 and not something in the thousands or even millions. What is really troubling is if there is a serious problem with one handle the function will never return CURLM_OK. So, if something has failed we just go ahead and try it again and it may or may not solve the problem? If it does not we simply end up in an infinite loop. Awesome!

If you echo $active you will see that it is probably equal to the number of handles. Our curl_multi_exec has executed all the handles and they are running now independendly in the background. The function as is is done like any other PHP function but the handles are still running. So lets look at this $active variable.

What is $active

If you look at the second loop and other examples on the Internet you will notice that the $active variable must change somehow. This somehow is based on what is happening insight curl_multi_exec.

As we have just learned above it tells us how many handles are still running. How does this work? The only way this is going to work is when the variable is set by reference inside the function. Indeed, the documentation says this is "a reference to a flag to tell whether the operations are still running."

Once you realize this you know we don't need the first loop with the curl_multi_exec function inside it. What we really need is a loop over the $active variable.

Mind your CPU

By the way and in case you are not aware of this. When you do a loop on something like this that just ends at some point your system's CPU will spike at 100% during the whole process.

Worse in this case! While it loops over the curl_multi_exec function asking "are you done?" it is doing this at lightning speed. Because of this it is also taking away CPU power from other process, like our handles and network connection in the background or the whole web server. Our web server and especially our curl handles are competing against this loop. It's like your boss or client yelling at you ever second "are you done yet?" until you're done with your project. It doesn't really help you finish quicker, right?

For how to solve this problem lets look at another function called curl_multi_select and the next document. (coming soon)

blog comments powered by Disqus
Prev

Powered by FeedBurner Load our "Knowledge Base" feed in your RSS feeder

Follow us on Twitter
Follow us on Facebook
Follow us on LinkedIn
Follow us on Google+

All rights reserved, Adrian World Design ©2009–2022 Powered by Wejas Framework

Jump to Top