Why Broadcom 802.11 Linux STA driver sucks, and how to fix it

TL;DR – the broadcom sta linux driver always fails in the first scan request after the interface is brought up, this produces a long delay when connecting to a wireless network. There’s an open source driver which does not have this problem, but is not good with power management. In this post I describe the steps I took to pinpoint the problem in the proprietary driver and to fix it.

The story begins when I updated Ubuntu from 11.10 to 12.04 on my MacBook Air, everything worked fine after upgrading except one thing that bothered me a lot: when resuming the laptop after suspending it, it took around 30 seconds to connect to my wireless network. It wouldn’t have bothered me if it had been the same in 11.10, but in 11.10 the time to connect was barely 5 or 6 seconds, so having to wait 30 seconds was totally unacceptable.

Initially I thought it was a bug in NetworkManager, and increased the debug level in the config file to finally come out to the conclusion that I was using a different driver in 12.04 than in 11.10.

There are two drivers available for the Broadcom BCM4353 802.11 Wireless Controller:

Both wl (broadcom proprietary driver) and brcmsmac (the open source driver) were installed in my Ubuntu 11.10 but the open source driver was used by default, and this driver connected to the wifi network in 5 seconds.

In Ubuntu 12.04, the wl proprietary driver provided by the package bcmwl-kernel-source has been updated from version 5.100.82.38+bdcom-0ubuntu4 to version 5.100.82.38+bdcom-0ubuntu6.1 which includes the following fix:

---------------
bcmwl (5.100.82.38+bdcom-0ubuntu6.1) precise-proposed; urgency=low
	
  * debian/bcmwl-kernel-source.postinst:
    - Blacklist brcmfmac, brcmsmac and bcma so that they don't
      conflict with the closed driver (LP: #873117)
 -- Alberto Milone  Mon, 23 Apr 2012 16:11:56 +0200

Which basically blacklists the open source brcmsmac module, forcing the wl proprietary driver to be in use. When the brcmsmac was not blacklisted, even if the wl driver was loaded it failed silently and brcmsmac was used instead.

So, the easy path to solve my problem would have been to blacklist the wl module, and add the brcmsmac to /etc/modules and live happy with my 5 seconds needed to associate when resuming, *BUT* I compared both drivers and the proprietary driver has better signal and way better power management, which makes my battery last longer, so I decided to go the long route. My goal was to achieve the lowest delay possible to connect to a wireless network when coming from a suspend using the proprietary driver.

I went “down” one level and started looking at wpasupplicant, as NetworkManager communicates with it using the DBus control interface (dbus-monitor showed the problem was not in dbus communication) so, increased the debug level in wpa-supplicant by adding ‘-dd‘ switch in /usr/share/dbus-1/system-services/fi.w1.wpa_supplicant1.service, and looked through the logs, which quickly revealed the following:

May 20 11:49:46 maco wpa_supplicant[12610]: Scan requested (ret=0) - scan timeout 5 seconds
May 20 11:49:52 maco wpa_supplicant[12610]: Scan timeout - try to get results
May 20 11:49:52 maco wpa_supplicant[12610]: Failed to get scan results
May 20 11:49:52 maco wpa_supplicant[12610]: Failed to get scan results - try scanning again
May 20 11:50:07 maco wpa_supplicant[12610]: Scan requested (ret=0) - scan timeout 5 seconds

The first scan request (SIOCSIWSCAN), after the wireless interface was brought up always failed (?), and wpa_supplicant tried to get the scan results (SIOCGIWSCAN) because some drivers do not deliver SIOCGIWSCAN events to notify when scan is complete, but this failed too, so wpasupplicant requested a second scan after a timeout, which properly delivered the results this time. This first failing scan was adding 21 seconds of delay to the network association process.

I googled the error and found I was not the only soul affected by this problem, Kalle Valo submitted 4 different patches to the hostap mailing list between October 2010 and March 2011, but the patches were never accepted upstream, nor included in the ubuntu package. The wpasupplicant code has changed a bit since Kalle submitted his patches, so I adapted them to the current wpa_supplicant version in Ubuntu. If you are curious, you can dig through ubuntu bug #994739.

In short, the version 4 patch from Kalle basically patches the WEXT driver from wpasupplicant to check the return value when trying to get scan results (SIOCGIWSCAN) from the wl driver, if the number of last error (errno) is EINVAL on the first scan, it requests another scan, so this one will go through (as only the first one fails) and return the scan results next time wpasupplicant tries to get them. This is far from perfect, but it works (and doesn’t seem to break anything), reducing the time needed to associate to the wireless network after the interface has been brought up from 30 seconds to 12 seconds.

But I was still unhappy with this result, so I patched the wpasupplicant code to request a scan right after the driver init function, so this would be the first “failing” scan, and the real scan requested a bit later would return results. This was a very ugly patch, because it made wpasupplicant request a scan in INACTIVE state (when it should be SCANNING), but it worked and reduced the time from 30 seconds to 10 seconds.

So, still unhappy with the results, I decided to go down one more level and have a look at the GPL’d source of the Broadcom’s Linux STA proprietary driver, and BINGO! this is how the wl_iw_set_scan() function ends:

        (void) dev_wlc_ioctl(dev, WLC_SCAN, &ssid, sizeof(ssid));
	
        return 0;

They always return 0, even when the dev_wlc_ioctl() function fails!! and WTH is it casted to void?? It would have been easier to just return the result of this function!. Patching this shows that the first scan after the interface is up fails with errno EBUSY (device or resource busy), so I added a workaround here to make it request the scan to the underlying hardware until it returned something different than EBUSY and could be correctly handled by wpasupplicant, et voilà, time reduced to 10 seconds.

But hey, now that I looked at their source, it turns out that there’s a newer version available in broadcom’s website: 5.100.82.112. This version now supports the new linux cfg80211 wireless configuration API in addition to the older Wireless Extensions (WEXT), you can choose between CFG80211 or WEXT at compile time, the ubuntu package broadcom-sta-dkms in the development release for 12.10 ‘Quantal Quetzal’ has been updated to this version but still uses the old WEXT which is still broken (always returns 0, remember above?). But, guess what they have done it correctly this time in the new CFG80211 code, see the end of the function __wl_cfg80211_scan():

        err = wl_dev_ioctl(ndev, WLC_SCAN, &sr->ssid, sizeof(sr->ssid));
        if (err) {
                if (err == -EBUSY) {
                        WL_INF((\"system busy : scan for \\"%s\\" \"
                                \"canceled\n\", sr->ssid.SSID));
                } else {
                        WL_ERR((\"WLC_SCAN error (%d)\n\", err));
                }
                goto scan_out;
        }
	
        return 0;
	
scan_out:
        clear_bit(WL_STATUS_SCANNING, &wl->status);
        wl->scan_request = NULL;
        return err;

As you can see, they now return EBUSY when the driver cannot perform the scan, and wpa_supplicant can manage this situation correctly, so I quickly backported the broadcom-sta-dkms package from ubuntu 12.10 to ubuntu 12.04 and added a patch to compile it with CFG80211 enabled, and finally I CAN HAS ONLY 8 SECONDS DELAY!!!!1 to associate to the wifi network after my laptop resumes from suspend using the wl driver, and I’m a happy camper! :D

This entry was posted in linux, wireless and tagged , , , , , , , , , , , , , . Bookmark the permalink.

16 Responses to Why Broadcom 802.11 Linux STA driver sucks, and how to fix it

  1. Iolanda says:

    I can ensure it’s been a hard way before getting the Wifi connected in less of 10 sec. Proud of this achievement :)

  2. pof says:

    Ubuntu packages are available in poliva/pof ppa, sources on github:

    sudo add-apt-repository ppa:poliva/pof
    sudo apt-get update
    sudo apt-get install broadcom-sta-dkms

    You might want to remove the old wl module first, if you have it installed:

    sudo apt-get purge bcmwl-kernel-source

  3. Laurent says:

    This is brilliant work! thanks for sharing. Looks like my macbook is finally as snappy as under OSX :)

  4. Chris says:

    Ah so much better! Thanks for working on this.

  5. You shouldn’t have to patch broadcom-sta to build with the cfg80211 API; it’s supposed to do it automatically if you’re running a kernel newer than 2.6.32; which should be the case anyway on Precise.

    Have you found out that this in fact wasn’t working properly?

  6. Brian Kloppenborg says:

    Greetings. Thanks for putting this into a PPA. It makes maintenance so much easier.

    I’m running 3.2.0-29-generic on Ubuntu 12.04 64-bit with a BCM4313. I’ve noticed very inconsistent performance with this hardware setup and the broadcom-sta drivers. Ping times range from 0.1 ms to 9 seconds (although what I show below is more typical):

    $ ping 192.168.0.1
    PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
    64 bytes from 192.168.0.1: icmp_req=1 ttl=61 time=124 ms
    64 bytes from 192.168.0.1: icmp_req=2 ttl=61 time=16.9 ms
    64 bytes from 192.168.0.1: icmp_req=3 ttl=61 time=115 ms
    64 bytes from 192.168.0.1: icmp_req=4 ttl=61 time=30.8 ms
    64 bytes from 192.168.0.1: icmp_req=5 ttl=61 time=56.9 ms
    64 bytes from 192.168.0.1: icmp_req=6 ttl=61 time=167 ms
    64 bytes from 192.168.0.1: icmp_req=7 ttl=61 time=54.7 ms
    64 bytes from 192.168.0.1: icmp_req=8 ttl=61 time=179 ms
    64 bytes from 192.168.0.1: icmp_req=9 ttl=61 time=42.1 ms
    64 bytes from 192.168.0.1: icmp_req=10 ttl=61 time=5.76 ms
    --- 192.168.0.1 ping statistics ---
    10 packets transmitted, 10 received, 0% packet loss, time 9012ms
    rtt min/avg/max/mdev = 5.766/79.453/179.395/59.439 ms

    On Windows the same hardware gets 1-2 ms ping times. The problem isn’t limited to the 5.100.82.112 driver similar performance exists with the 5.100.82.38 version as well. Any thoughts how I could troubleshoot this?

    • Brian Kloppenborg says:

      I got this sorted out. I just had to lower the level of power management (shutting it off fixed the issue entirely) using
      iwconfig wlan0 power off
      some level of power management is, of course, preferable. Experiment with the different levels to find a good mix of power and usability.

  7. Aniello Del Sorbo says:

    Thanks for this!

    It helped here on my HP!

  8. Carl says:

    Great work! The long connect times irked me no end. A lot better now, thanks!

  9. fascht says:

    <3, after 20 hours useless things. 3 simple steps solves it.
    lenovo edge 335.
    12.04 =)

  10. Αναστάσης says:

    I have the same problem on Debian Wheezy/Testing 7. I’ll probably try it and see the results. If somebody has already tried it on Debian share your exprerience please.

  11. Andrew says:

    At last a post that works!! Now my Inspiron R gets online faster than an 8 year old laptop!
    Many thanks for the repository all so easy :) .

  12. Nathan Caldwell says:

    I just wanted to point out that the upcoming Raring driver has switched over to the cfg80211 API. I managed to get this version backported to Quantal. So now all that is required is to enable quantal-backports and install broadcom-sta-dkms/quantal-backports.

  13. Paul says:

    Thanks a lot ^^

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>