No successful CAN bus sending with NucleoG474RE

Hi all,

I have quite some issues with sending messages via CAN bus using the NucleoG474RE with mbed OS 6.15 (current version in github).

In the meantime I believe I could nail down the issue to the following:

I try to send a message (CANExtended Format, CANData Frame) with NucleoG474RE board over a short cable connection.
The opposite side (LPC4088 with an older mbed OS, of course) interrupts the transmission after about 110 Bits and sends a CAN error frame (6 consecutive dominant bits).
The connection speed does not make a difference (10kHz, 100kHz, 125kHz) - all checked with same, unsucessful result. For the NucleoG474RE I use a MCP2562 as transceiver, for the LPC4088 it is the MCP2515.

In my setup, the LPC4088 also has a second and working CAN bus connection to a different CAN system which uses an ATMega644 with an MCP2515 as CAN controller (no mbed system, of course).
Working means that sending and receiving between these CAN nodes works without any issues.

Even when switching the CAN bus connections on LPC4088 side, the issues remains. So, I think it is not the LPC4088 - it is somehow related to the NucleoG474RE (HW or SW configuration).

Whenever the LPC4088 sends its error frame, the NucleoG474RE CAN controller is configured by mbed to automatically repeat its sending attempts several times until the Nucleo CAN controller gives up and disconnects itself from the CAN bus.

Of course, the automatic retransmission can be disabled in the mbed sources, but I believe that this certainly is not the issue.

With a scope I checked the transmission and I could see that the transmission from NucleoG474RE to LPC4088 as such works correctly, i.e. TX signal on Nucleo side and RX side signal on LPC side show the same result - until the LPC4088 starts with its error frame transmission.

Additionally, receiving on the NucleoG474RE side also works, i.e. the same CAN frame format sent by the LPC4088 is correctly received by the NucleoG474RE. The scope shows a working transmission, i.e. the CAN frame shipment from the LPC4088 is acknowledged by the NucleoG474RE CAN controller.

Apparently, the CAN frame created on Nucleo G474RE side cannot be understood by the LPC4088 CAN controller, but vice versa it can.
I am wondering how this is “achievable” by configuring or using the CAN controller on Nucleo side - or whether there is any other reason I am just too blind to see.
Neither the STM32G474RE nor the LPC4088 errata sheet shows something related.

Is there anybody who ever experienced such an issue?

Hello,

Just for sure. Do you know what is difference between CAN controller and CAN transceiver, right?

BR, Jan

Hi Jan,
I think so. The MCP2515 is a CAN controller (which is needed for the ATMega) whereas the MCP2551 as well as the MCP2562 are CAN transceivers. But my description for LPC4088 was wrong, my bad.

I think I need to restate the setup to avoid misunderstandings:

  • LPC4088 with two CAN buses, both connected via MCP2562 transceivers.
  • NucleoG474RE connected to first CAN bus via MCP2562 transceiver.
  • ATMega644p connected to the second CAN bus via MCP2515 CAN controller + MCP2551 CAN transceiver.

Greetings

Ok, sound good.
I do not have this board so I can not test it, but I provide some tips what you can check.

BR, Jan

Hi Jan,
many thanks for your ideas - highly appreciated!

  • terminators (each 120 Ohm) on both ends are applied (fun fact: the working CAN bus connection has no termination, but omitting the termination of course did not help for the non-working CAN bus connection…
  • setup uses common power supply and therefore also shares the same GND.
  • I exchanged the transceivers (different MCP2562’s, I have quite a large stock…) already.

I now also tried sending a CANStandard frame and changed the CAN frame id to some fixed small value.
But on LPC4088 no CAN frame is received. So far, I did not verify with the scope what happened on the bus. This takes some more time.

What happens in the firmware: the LPC4088 ships a “ping” message in a specific format (CANExtended) to the NucleoG474RE and waits for the response. The NucleoG474RE responds (in the test with a CANStandard frame) but, but on LPC4088 side, no frame is received.

When reading the referenced issue, I have the impression that the transceiver was operated in low power mode (read only), therefore no transmission happened.
In my setup, the transceiver works in high speed mode and the transmission at least starts- until it is interrupted with the error frame.

Greetings and have a nice evening!

Hi again,
I need to correct my observation from yesterday: the transmission works from NucleoG474RE → LPC4088 when using the CANStandard frame. I checked today several times and it is reproducible that the format change is the key here…
Many thanks, Jan, for your hint! :smiling_face_with_tear:

Still, the issue is not really solved as I need to use the CANExtended format.
But if the CANStandard format works, I think an electrical issue can be excluded safely.

New tests using the CANExtended frame format resulted in somewhat confusing results:
The transmission works sometimes, apparently depending on the frame id used (!).

  • The NucleoG474RE firmware ships the frame with the id $1007018f. That value always led immediately to an error frame on LPC4088 side.
  • When switching to the id $0, I could perform 15 transmissions before also here the error frame came.
  • Applying further values where single bits in the id were set, also led to several successful transmissions.

When checking the transmitted bit sequence with the id $1007018f, it seems that the LPC4088 sends its error frame after receiving the 10th bit of the CRC checksum (quite demanding to transfer the scope result to a list and isolating the stuffing bits). The CRC checksum as such is correct (checked with a Python CRC-15 implementation).

There are the Tx levels for the non working CAN transmission with frame id $1007018f:
CAN1

Here, the transmission with the error frame on LPC4088 side is zoomed in:
CAN2.

I did some tests using a partially masked frame id.
A mask $7ffff (leading to an id of $7018f) leads to zero or one successful transmission.
A mask of $3ffff (leading to an id of $3018f) leads to several thousand successful transmissions (no error until test termination).

Even though, as far I could check, the CRC calculation on Nucleo side looks fine, the transmitted bit sequence determines somehow the success of the transmission.
That is more than strange.

That is cool. I remembered something. Your board is not contain standard CAN controller but FDCAN controller, which was first time implemented into can_api with STM32H7. So maybe there can be an issue for extended id.

However, from my point of view, if you are trying to solve a problem, it would be good to try that with the Mbed CAN api example. It could be used on one board with CAN-bus loop back between both CAN controllers. If that will work, then you can use it again on two different boards and so on.

BR, Jan

Hi Jan,
you are right. I will re-test it with the CAN API example and a loopback. Let’s see…

So… being back from testing:

When using the CAN API example (after reworking it to be usable with Mbed OS6), there were no issues with sending CAN Extended frames from one CAN port to the other CAN port (tested for 2 combinations).
This test used the default CAN frequency of 100kHz and did not use the CAN API method frequency().

Then I started to re-use my setup NucleoG474RE and LPC4088 for basic CAN API tests.
Finally, it ended up with a simple code:

The Nucleo board just sends extended frames, while the LPC4088 just listens to the bus.
And as long as the Nucleo board can send messages successfully, everything is ok - as this means, that the LPC4088 sends ACK replies. Here comes the code snippet for the Nucleo board with some includes omitted:

#define CAN3_RD PA_8
#define CAN3_TD PB_4

int main()
{
    uint32_t counter = 0;
    CAN can(CAN3_RD, CAN3_TD);  
    //can.frequency(100000);
    //can.reset();
    DigitalOut led1(LED1);    


    CANMessage msg;

    TRACE_PRINTF("CAN Test NucleoG474RE\n");

    while (1) 
    {
        unsigned int id = 0x1b0f320f;
        int result = can.write(CANMessage(id, reinterpret_cast<const char*>(&counter), 4, CANData, CANExtended));
        if (result)
        {
            TRACE_PRINTF("Message sent: %d\n", counter);
            counter++;
            led1 = !led1;
        }
        else
        {
            TRACE_PRINTF("Write failed\n"); 
            NVIC_SystemReset();
        }

        wait_us(20000);

    }
}

You may notice the commented out lines:

// can.frequency(100000);
//  can.reset();

When I kept the lines commented out, the code worked and CAN messages were sent successfully.
As soon as I used the line:

can.frequency(100000);

no write operation succeeded, no matter whether the following reset operation was added or kept commented out.

My expectation is that this should not happen, as 100kHz is the default frequency of the CAN bus in Mbed, i.e. setting the frequency to its default value should not make a difference.

Am I wrong?

Sounds logic and I have no Ideas for now.

However, be careful to use the reset method correctly. The frequency method or anything else must be called after call of the reset method not before. The purpose of the reset method is to return the CAN interface back to its default settings. So when you use the frequency method for set the CAN-Bus to 500khz and then call reset, the CAN-bus will be again back on 100khz.

BR, Jan

Hi Jan,
are you sure about this effect of the reset() function? I did not see anything related in the code:

/** Reset CAN interface.
 *
 * To use after error overflow.
 */
void can_reset(can_t *obj)
{
    can_mode(obj, MODE_RESET);
    HAL_FDCAN_ResetTimeoutCounter(&obj->CanHandle);
    HAL_FDCAN_ResetTimestampCounter(&obj->CanHandle);
}

There is at least no obvious call to some frequency related setting. This is just a first impression from my side. Of course, this might be wrong.

In the meantime, I had a first look how the frequency settings are processed in the the function can_frequency() and in the CAN object initialization with default frequency of 100kHz.

For the NucleoG474RE the initialization is done within the function _can_init_freq_direct(can_t *obj, const can_pinmap_t *pinmap, int hz) which - apart from some pin mapping operations - performs some timing related calculations and applies general CAN settings.

In the function int can_frequency(can_t *obj, int f) also calculations are done but apparently with a focus to timing related settings.

So far so good.

There is a small difference in the timing calculations in both routines when determining the so called nominal prescaler:

_can_init_freq_direct(…):

    // !When the sample point should be lower than 50%, this must be changed to
    // !IS_FDCAN_NOMINAL_TSEG2(ntq/nominalPrescaler), since
    // NTSEG2 and SJW max values are lower. For now the sample point is fix @75%
    while (!IS_FDCAN_NOMINAL_TSEG1(ntq / nominalPrescaler)) {
        nominalPrescaler ++;
        if (!IS_FDCAN_NOMINAL_PRESCALER(nominalPrescaler)) {
            error("Could not determine good nominalPrescaler. Bad clock value\n");
        }
    }

can_frequency(…):

    // !When the sample point should be lower than 50%, this must be changed to
    // !IS_FDCAN_DATA_TSEG2(ntq/nominalPrescaler), since
    // NTSEG2 and SJW max values are lower. For now the sample point is fix @75%
    while (!IS_FDCAN_DATA_TSEG1(ntq / nominalPrescaler)) {
        nominalPrescaler ++;
        if (!IS_FDCAN_NOMINAL_PRESCALER(nominalPrescaler)) {
            error("Could not determine good nominalPrescaler. Bad clock value\n");
        }
    }

When using this for a frequency of 100kHz, this leads to different settings for the nominalPrecaler and to all timing related settings.

  • nominalPrescaler = 7 (_can_init_freq_direct)
  • nominalPrescaler = 49 (can_frequency)

I am not familiar with CAN bus timings so far, but this does not look good.
Therefore, I changed the non-working code in can_frequency() to the working one in _can_init_freq_direct(),
i.e:

//while (!IS_FDCAN_DATA_TSEG1(ntq / nominalPrescaler)) {
while (!IS_FDCAN_NOMINAL_TSEG1(ntq / nominalPrescaler)) {

Of course, applying a frequency of 100kHz now worked and did not “break” the transmissions any longer.
But also a change of the transmission frequency to 125kHz (on both sides: NucleoG474RE + LPC4088) led to working transmissions in my simple setup.

I used my workaround (IS_FDCAN_NOMINAL_TSEG1) now also in my application setup.
Now, the original issue is gone and I can perform the CAN bus transmissions from NucleoG474RE side with no errors (tested with 100kHz and 125kHz).

Many thanks, Jan, for your support! Your answers and suggestions made my day :smile:.

Still, I try to understand why it works now and why the original code was written as it was - and how this could lead to the “symptoms” I first observed…

Some further investigation resulted in:

The macro’s IS_FDCAN_DATA_TSEG1 and IS_FDCAN_DATA_TSEG2 refer to settings of the FDCAN data bit timing and prescaler register (FDCAN_DBTP).
This register supports FDCAN - which is not the case for Mbed.

Therefore, FDCAN_DBTP settings are all set to 1 in _can_init_freq_direct():

    obj->CanHandle.Init.DataPrescaler = 0x1;       // Not used - only in FDCAN
    obj->CanHandle.Init.DataSyncJumpWidth = 0x1;   // Not used - only in FDCAN
    obj->CanHandle.Init.DataTimeSeg1 = 0x1;        // Not used - only in FDCAN
    obj->CanHandle.Init.DataTimeSeg2 = 0x1;        // Not used - only in FDCAN

These registers remain untouched after this initialization.

So, what is the impact when using one of the macros:

  • IS_FDCAN_DATA_TSEG2
  • IS_FDCAN_DATA_TSEG1
  • IS_FDCAN_NOMINAL_TSEG1

as YOUR_MACRO in:

    while (!YOUR_MACRO(ntq / nominalPrescaler)) {
        nominalPrescaler ++;
        if (!IS_FDCAN_NOMINAL_PRESCALER(nominalPrescaler)) {
            error("Could not determine good nominalPrescaler. Bad clock value\n");
        }
    }

The loop is left as soon as a macro specific condition is not met:
The quotient ntq/nominalPrescaler must be < 256 for IS_FDCAN_NOMINAL_TSEG1, but only 16 in case of IS_FDCAN_DATA_TSEG2.
Therefore, when using IS_FDCAN_DATA_TSEG2 or IS_FDCAN_DATA_TSEG1 large values of nominalPrescaler are enforced.
But: The larger the nominalPrescaler, the coarser the timings of CAN setup become:

Example:
When using IS_FDCAN_NOMINAL_TSEG1 the nominalPrescaler is 7, when using IS_FDCAN_DATA_SEG2 the nominalPreScaler is 95 and for IS_FDCAN_DATA_SEG2 the nominalPreScaler is 49 respectively.
The reference clock is 160MHz, so for a nominalPrescaler value 7 each bit is sliced in to quantums of 7/160MHz = 43.75ns.
According to RM0440 (Ref Manual of the STM32G4 series), page 1943 in section 44.31 “bit timing”, this ends up to a bit duration of:

  • 228 slices* 43,75ns = 9,975µs for IS_FDCAN_NOMINAL_TSEG1 corresponding to 100250Hz (250Hz more than100kHz, 0.2% deviation) CAN bus frequency.
  • 32 slices * 306ns = 9.8µs for IS_FDCAN_DATA_TSEG1 corresponding to 102040Hz (2040Hz more than 100kHz, 2% deviation) CAN bus frequency.
  • 16 slices * 594ns = 9.5µs for IS_FDCAN_DATA_TSEG2 corresponding to 105218Hz (5263Hz more than 100kHz, > 5% deviation) CAN bus frequency.

The calculated timings are fed into the FDCAN nominal bit timing and prescaler register (FDCAN_NBTP) which is used for “classic” CAN (opposed to FDCAN).

My impression is that the large frequency deviation created with IS_FDCAN_DATA_TSEG1/2 lead to improper settings and sampling errors. As the NucleoG474RE reads back its own messages it adjusts to a wrong frequency which leads to adjustment issues or sampling errors on LPC4088 side.
And finally: My scope is not good enough to resolve clock deviations of 2% or 5%.

These are my conclusions. Please correct if this is wrong or if it was misleading.