Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose transport and phy configuration for connection (Android 10 connection issue) #646

Open
mtomczynski opened this issue Nov 12, 2019 · 13 comments

Comments

@mtomczynski
Copy link

mtomczynski commented Nov 12, 2019

Hey,

Android 10 introduced problems with connecting to multiple BLE devices for me. Linked topics may be related:
https://stackoverflow.com/questions/58299507/android-10-ble-connection-issue
https://issuetracker.google.com/issues/141188862

I managed to find solution by forcing this config on connection:

bluetoothDevice.connectGatt(
                context,
                false, // auto connect set to false
                connectCallback,
                BluetoothDevice.TRANSPORT_AUTO,
                BluetoothDevice.PHY_LE_CODED
            )

But RxAndroidBle don't exposes possibility to set transport and phy. Have you considered exposing those settings?

@dariuszseweryn
Copy link
Owner

But RxAndroidBle don't exposes possibility to set transport and phy. Have you considered exposing those settings?

At some point — yes. Unfortunately I do have limited time and it is not top priority right now.

Could you shed a bit more light on your exact case? Have you tried to investigate this bug? Get HCI logs from your device?

@mtomczynski
Copy link
Author

My case is a bit peculiar. I've got two devices on BT 4.2. They've got two modes, first is public undirected advertising for connection with an unknown central. Second is directed advertising with resolvable private address for reconnection with a bonded central.

After bonding to two devices and doing few successful connections/disconnections it bricks all future connections. After this phone can't connect and times out after 30s with Gatt error 133. It happens only when there are multiple bonded devices and only on Android 10, only thing that helps is clearing list of bonded devices.

Solution from first comment works only until I try to connect with different config, so let's say RxAndroidBle default. After which connection is bricked even for LE_CODED and Transport_auto config.

About the solution itself, setting phy to le_coded and transport to auto. My devices do not support BT5 so phone can't use 2M or le_coded for actual connection. In the HCI logs all connections are done with phy set to 1M no matter what was set in the API call (::connectGatt). But by some reason this is the only way I can connect to my devices.

From the bt device point of view, when the connection is bricked on particular phone. Bt device doesn't see any incoming connections when phone is trying to create one.

@dariuszseweryn a bit long description but case is complex. Anyway it looks like a bug deep in the bt layer that was introduced in Android 10

@mtomczynski
Copy link
Author

More info the issue. Actual event of scanning is corrupting future connections to bonded devices.

After bonding I can connect freely to multiple devices, given that I turned off scanning after the bonding. Then after disconnecting from the devies and turning on scanning for a moment and turning it off I can no longer connect to my devices. It seems that scanning event corrupts some internal data for bonded devices.

@dariuszseweryn
Copy link
Owner

😲
Have you checked what is sent through HCI before and after scanning?

@mtomczynski
Copy link
Author

I'm not an expert in low level BLE communication but I haven't seen anything out of ordinary in HCI logs. If you're interested in the topic I'd be more than happy to share the logs.

Also found out that it's not only about the scanning. What actually corrupts the connection is successfuly scanning devices from the bonded list. I can't reproduce the error If I'm turning off the devices during scanning events

@dariuszseweryn
Copy link
Owner

Please do. Having logs with a successful connection after bonding, re-scan and unsuccessful connection could be interesting to find out what may be happening there

@mtomczynski
Copy link
Author

Here are logs and thank you!
HCI logs are a bit crowded with events, please use logcat highlights to get timestamps for easier navigation. Here are steps from the issue:

  1. Started scan and discovered two devices
  2. Successfuly bonded to both devices
  3. Stopped scan
  4. Successfuly connected and disconnected few times
  5. Performed scan for few moments and turned it off
  6. Tried to connect to one of the devices which resulted in failure, gatt error 133 (30 sec timeout)

Just in case also attached full logcat logs with filtering on bluetooth stack.

logcat_full.txt
BTSNOOP.log
logcat_highlights.txt

@mtomczynski
Copy link
Author

mtomczynski commented Nov 15, 2019

I've prepared similar log package from Android 9 where this problem doesn't occur. Differences I found after comparision:

  1. For some reason OnePlus with Android 9 doesn't seem to be using White List at all, where Android 10 Pixel always adds the device to the White List before the connection.

  2. Scanning gives slightly different results for both devices. OnePlus reports the device with address type of Public Identity Address (Corresponds to Resolved Private Address) (0x02) where on Pixel reported device type is Random Device Address (0x01). Address type is the only thing different, address stays the same.

Additionally before scanning that corrupts connection Pixel adds the device to White List with type Public Device Address (0x00) and then performs the connection to same address with same type so Public Device Address (0x00) which results in successful connection.
But after scanning the type on add to white list event changes to Random Device Address (0x01) but on connection address type stays the same Public Device Address (0x00) after which connection fails. Address itself doesn't change, only type.

@dariuszseweryn Do you think it might be the actual problem that Android 10 doesn't correctly recognizes address type when scanning and then caches that address type in bond information? Or do you think it's just differences between manufacturers or system version in logging information?

bt_snoop_android_9.log
logcat_full_android_9.txt
logcat_highlights_android_9.txt

@dariuszseweryn
Copy link
Owner

dariuszseweryn commented Nov 15, 2019

I assume your peripherals use public address types?
I have updated Wireshark and now I see that yes. But I do not see any reported scans of the device between last successful connection and corrupted request. I do not yet see what could mess up the address type.

@mtomczynski
Copy link
Author

mtomczynski commented Nov 18, 2019

You're right, the device won't always pop up in the scans, maybe it's scanned with true random address and is interpreted by something up in the stack. But still after the scanning it's added to white list with random type address.

Scanning itself is not full story, if I successfuly scan the devices, then turn off bt for a while and try to connect to them without doing second scan (BluetoothDevice from bonded list), connection is successful.

@dariuszseweryn
Copy link
Owner

Scanning itself is not full story, if I successfuly scan the devices, then turn off bt for a while and try to connect to them without doing second scan (BluetoothDevice from bonded list), connection is successful.

This is a well-known bug of Android. I have briefly mentioned about it on Wiki

You're right, the device won't always pop up in the scans, maybe it's scanned with true random address and is interpreted by something up in the stack. But still after the scanning it's added to white list with random type address.

I have been looking on frames between 4531 and 6017. First one is the last moment the peripheral is added to white list with public address type and the second is the first moment the peripheral is added to white list with random address type. I have tried searching for MAC address c5:c1:c4:74:61:88 but have not found it in between. That would suggest that the Android stack has some bug not directly related to this peripheral scan.

@mtomczynski
Copy link
Author

Was doing some tests with single bonded device and managed to recreate the issue only on one device, which is new for my case.

Checked the logs after connecting without scanning and type address added to the whitelist is actually public as expected connection is successful in such case. When only one device is bonded scanning again doesn't affect future connections as device with correct type address is already on the white list and doesn't have to be re-added. It makes sense that it's really easy to duplicate the issue while using two devices because during connections they're added and removed from white list.

Here are logs with successful connecting without scanning. You can find adding to white list in frame 186
btsnoop_1811_4.log
logcat_highlights.txt

What's interesting is that in logs with corrupted single device connection scanning returns correct address type public resolved from private but it's still added as random to the white list. Which results in unsuccessful connection.
White list in frame 1198
btsnoop_1811_6.log
logcat_highlights.txt

@mtomczynski
Copy link
Author

mtomczynski commented Nov 19, 2019

@dariuszseweryn problem has been fixed from the bluetooth device perspective. I think it's really interesting case. Straight after bonding device didn't update it's address type immidiately to correct type (random static) but kept the incorrect (public) one. Address itself was correct.

These are white list operation with bug:

  1. Add to white list: type PUBLIC, address true private // First connection after bonding
  2. Add to white list: type RANDOM, address true private // All subsequent connections

These are white list operation with fixed address type:

  1. Add to white list: type RANDOM, address true private // First connection after bonding
  2. Add to white list: type PUBLIC, address current public random address // All subsequent connections

Here're logs with solution if you're interested:
btsnoop_working.log
logcat_highlights.txt

What is most interesting about it, is that this situation is tolerated by all iOS versions and Android version until 10. It appears that they dramatically changed how addresses are handled under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants