[LoRaWAN] Join Sequence Breaks if Non-Join Accept Message Received

Hi,

I believe the join sequence will be broken and the end-device will never join if it receives a data message when expecting a join accept message.

An end-device will transmit a join request on a given channel and then open a receive window expecting to receive nothing or a join accept. It is possible that another end-device could transmit on the same channel in which case the first end-device will receive a data uplink message. It is also possible that a gateway transmits a downlink intended for another end-device.

I believe this could be a real problem in a highly congested LoRaWAN installation.

If the transceiver fails to receive anything during an RX window LoRaWANStack::rx_timeout_interrupt_handler will get called and rx_timeout_interrupt_handler will continue the join sequence.

If the transceiver receives something in the RX window LoRaWANStack::rx_interrupt_handler will be called. rx_interrupt_handler calls LoRaWANStack::process_reception calls LoRaMac::on_radio_rx_done and they get in a mess if mac_hdr.bits.mtype != FRAME_TYPE_JOIN_ACCEPT.

Interestingly LoRaMac::on_radio_rx_done contains this code:

    default:
        //  This can happen e.g. if we happen to receive uplink of another device
        //  during the receive window. Block RX2 window since it can overlap with
        //  QOS TX and cause a mess.
        tr_debug("RX unexpected mtype %u", mac_hdr.bits.mtype);
        if (get_current_slot() == RX_SLOT_WIN_1) {
            _lora_time.stop(_params.timers.rx_window2_timer);
        }
        _mcps_indication.status = LORAMAC_EVENT_INFO_STATUS_ADDRESS_FAIL;
        _mcps_indication.pending = false;
        break;

but I don’t think this helps during the join sequence. This code was added by this PR LoRaWAN: Terminate RX when receiving uplink messages #11241.

I’ve come up with a suggested fix but clearly it is very difficult to reason about and test:

diff --git a/features/lorawan/LoRaWANStack.cpp b/features/lorawan/LoRaWANStack.cpp
index 79f899ef2a..a4a9833305 100644
--- a/features/lorawan/LoRaWANStack.cpp
+++ b/features/lorawan/LoRaWANStack.cpp
@@ -715,6 +715,10 @@ void LoRaWANStack::process_reception(const uint8_t *const payload, uint16_t size
     }

     if (!_loramac.nwk_joined()) {
+        _device_current_state = DEVICE_STATE_AWAITING_JOIN_ACCEPT;
+        if (_loramac.get_current_slot() != RX_SLOT_WIN_1) {
+            state_controller(DEVICE_STATE_JOINING);
+        }
         _ready_for_rx = true;
         return;
     }

Any thoughts greatly appreciated.

Regards,
Matt

After some testing I’ve come up with a different fix:

diff --git a/features/lorawan/LoRaWANStack.cpp b/features/lorawan/LoRaWANStack.cpp
index 3738b4c540..a4776e8b5c 100644
--- a/features/lorawan/LoRaWANStack.cpp
+++ b/features/lorawan/LoRaWANStack.cpp
@@ -742,6 +742,8 @@ void LoRaWANStack::process_reception(const uint8_t *const payload, uint16_t size
     }

     if (!_loramac.nwk_joined()) {
+        _device_current_state = DEVICE_STATE_CONNECTING;
+        state_controller(DEVICE_STATE_JOINING);
         _ready_for_rx = true;
         return;
     }

Hi @janjongboom,

I’m seeing difficult to reproduce problems with our LoRaWAN end-device in a commercial application.

During development and in the lab the device looks good but we’ve made and deployed several hundred devices now and the subtle problems are starting to appear.

This is the second topic I’ve opened on the forum in the last few days and I’ve had no replies.

Is there anyone in the ARM team who wants to discuss LoRaWAN? I’ve seen you post about LoRaWAN in the past, I’m hoping that even if your not working on LoRaWAN you might know someone who is.

Obviously there are bigger problems all over the world at the moment, I’m wondering if that is why I haven’t had any replies.

TIA,
Matt

Hi @mattbrown015, I’ve left Arm about a year ago, so no longer working on this. @hasvir01 who was doing a lot of work on this also left around the same time - so not sure where support can be given at the moment. My best suggestion would be the Mbed OS GitHub repo. In the past the core team was not monitoring the forums, only the GH issues.

Hi @janjongboom,

Thanks for the update. I hope you’re enjoying your new role and keeping safe.

I’ll try the repo. Back in 2018 I always got good support there but I was trying to use the forum because I believe that’s what the mbed team want now.

Thanks,
Matt

Another report that sounds similar to mine…

LoRaWAN: Join procedure hangs #10590

It doesn’t look like there’s any help coming from the Mbed team. :frowning_face: