Ahoy !..

I was rather dismayed when I saw my Mega attach with an Arduino service and I got a endless
error 

NOT CLEAR TO SEND! resetting parser!
Arduino->MRL error - bad magic number 66 - 1 rx errors
Arduino->MRL error - bad magic number 1 - 2 rx errors
Arduino->MRL error - bad magic number 0 - 3 rx errors
Arduino->MRL error - bad magic number 12 - 4 rx errors
Arduino->MRL error - bad magic number 24 - 5 rx errors
Arduino->MRL error - bad magic number 235 - 6 rx errors
Arduino->MRL error - bad magic number 0 - 7 rx errors
Arduino->MRL error - bad magic number 0 - 8 rx errors
NOT CLEAR TO SEND! resetting parser!
Arduino->MRL error - bad magic number 66 - 9 rx errors

I could tell between the errors - there were valid PUBLISH_BOARD_INFO messages - so why the endles resets?

As far as I can tell, my mega (which probably has some battle scars) has gotten into a state where setup is NOT run when a new serial connection is made.

There is a new message call PUBLISH_MRL_COMM_BEGIN which is the only msg that can unlock the java message parser.  My simple solution is to allow either PUBLISH_MRL_COMM_BEGIN or PUBLISH_BOARD_INFO to unlock the parser.

I think this is more desirable since there are quite a few possiblities of setup not being run on a new serial connection as this link can verify.

https://playground.arduino.cc/Main/DisablingAutoResetOnSerialConnection/

The Fix :

https://github.com/MyRobotLab/myrobotlab/pull/766

calamity

4 years 4 months ago

Here is what I think is happening

  • some data are put in the java side serial buffer to be send to the arduino.
  • java open the serial port and send those data.
  • the arduino grab those data and think it's new code uploading, and take time to try to make sense of it before giving up and start or resume the setup() method and send the MRLComm_Begin then board_info msg.
  • By the time it take to send those message, the java side already give up and trying to reopen the port. 
  • The board_info msg you are seeing are probably put in the pipe before the arduino reset again

 

What you are suggesting is a work around. it won't fix the problem, just go around it.

I don't know how java manage the port opening and send the data. I think you should look to improve the data control in the java side so the data are send after the port open and not send at the same time the port open.

On the good side, if you can find how to initiate the code upload in the arduino, you may have found a way to upload  arduino code without using the arduino.exe :)

 

Thanks for the imput calamity.
You have 2 things which want to talk to each other.
To me the simplest is to make one always wait for the other to begin.
Java waits and listens, Arduino begins by sending boardInfo - it does this continously every second providing heartbeat and diagnostics.  This is how its been since I last refactored it.

I think its a good strategy.  Java can open the port, read all the garbage out - and wait for a good next message (it should come <= 1 sec) after all the garbage is gone.

Kwatters made it so only 1 message on startup could start the conversation.  
This is not satisfactory, as the

           Arduino -- PUBLISH_BOARD_INFO ---> Java

is the continous 1s frequency message that keeps the two happy and in sync with one another.

I don't see this as a work-around, I see it as a fix.  And Java never initiates the conversation Arduino does.
At some point, Arduino will get through with a clean PUBLISH_BOARD_INFO - if it takes more than 3 seconds to recieve a msg at a second interval, your Arduino is broken or you have other problems.

I see it as a work-around because it did not fix the problem, it goes another way so you don't see the problem.

Instead of the conversation been initates by the arduino, it's been pick up and sees a 'ok, you look worky'. Not exactly the same strategy.

 

It's fine if it's now working, but what I fear is that this tiny bug grow into a monster later on when using intensiveley the arduino will be need. (like using 15 servo with inverse kinematic for position control, you better had a really good communication :)

GroG

4 years 4 months ago

kwatters: The setup method will be called as soon as the port is open.
kwatters: The latest mrlcomm prints out garbage intentionally to ensure the parser syncs
kwatters: That happens in the setup method
kwatters: If you didn't see the mrlcomm begin.. the the serial port didn't open as expected
kwatters: The only way to avoid the setup method being called is if you de-solder pins on the Arduino megas serial port
kwatters: Which I assume you have not done
 
I did not - no intentional physical mods
 
kwatters: I tend to agree with calamity on this one...
 
ok, I disagree with you both
 
kwatters: I disagree with using the board info message for purposes to begin the communication
kwatters: In your scenario, bytes are being lost... The parser is confused..
 
Bytes will get lost - to say they won't is silly.  You cannot control all environmental factors. More importantly is how it recovers.

kwatters: Your fix added additional complexity and is likely covering up some other bug

Adding another conditional to an if statement I wouldn't consider "additional complexity" considering the complexity that is being managed currently.  What is the other bug?  How do you test it?  How should it be investigated?  I don't believe in ghosts in the machine, so its needs to be more concrete than "some other bug"

kwatters: If you want these changes.. a unit test proving that they fix the problem should be added to verify the fix.
 
Reasonable request.
 
kwatters: O/w I think the bug is actually elsewherr
kwatters: The mrlcomm is initiated by sending dtr high on the serial port.serial
kwatters: At that point java land needs to see mrl comm begin
kwatters: Not a board info update
kwatters: Board info is initiated by mrlcomm.. that's wrong
 
I don't think so.
 
kwatters: The setup method is triggered when the serial port is opened by java(or other program)
kwatters: The next thing that comes over the port is garbage, followed by 64 spaces.. then the mrlcm begin is
kwatters: The 64 spaces is chosen because that's the max mrlcom message length
kwatters: And the parser on the java side will have to reset
 
kwatters: I have seen the errors that you see after you added peer stuff and the metadata refsctor
kwatters: That was related to multiple arduinos attaching to a serial port that was already open
kwatters: If you can prove that incorrect in a unit test then I'm on board..
kwatters: O/w I think this is the wrong fix
kwatters: I'd recommend setting debug=true in the msg.java class to verify what's actually going on
kwatters: When I saw this error about 2 weeks ago, I went back to my original commit of better acks2 and the problem went away..
kwatters: Which means it was introduced since then..
 
Can we stop bringing that up ?  It was fixed and verified.
As for a "test"  I've already tested it with a serial service, and what you and calamity think should be happening with my board is not.  I can tell setup is not being run when the port is re-opened. For an additional test I verified it with a small arduino sketch.

Its the same way MrlComm starts up.  

This is what shows on the first time the port is opened with the monitor...

And this is what you'll see the 2nd time you open the port with the monitor ... yup ... no setup()
And this is the same thing you'll see you open the port (N) times ... no setup

This will make the dependency of PUBLISH_MRL_COMM_BEGIN endlessly fail as I mentioned above.  There is no recovery, short of manually resetting the board.

To double verify, I tried with my other board, and it called setup on every restart of the monitor. So this is a physical difference - not a software one.

But it doesn't seem that uncommon .. sometimes its even done on purpose
https://playground.arduino.cc/Main/DisablingAutoResetOnSerialConnection 

Currently, only having START in setup is more fragile than being able to start communication with a BOARD_INFO.

 

I did some test this morning. I did not had any Uno or Mega available, but I did use a bunch of Pro Mini (6 of them). I use the same script that you post in the previous message.

 

I test all of them and none ever skip the setup(). 

So it's seem to be an hardware failure more than a design problem.

But you had a valid point that the arduino may be set to not auto reset on sertial connection and the current way is not working in that case.

There is a major difference between using MRL_COMM_BEGIN and a BOARD_INFO. The first one state that the arduino have been freshly hardware reset while the second report the state of the arduino (wich may have some device already set on the serial connection).

 

I have miss a lot recently to be able to help much here but here some difference I can see

If the BOARD_INFO deviceSummary did not match what java expect, what is happening (I know you had code to auto reconnect, but did that reset the arduino data, or reset it by reopening the serial connection?)

is the HeartBeat currently working in the arduino? If it's working, the arduino may be in the "onDisconnect" state and may need more code to put it back in his normal state if no hardware reset happen.

So there is more than juste using one or the other msg to initiate the conversation between the arduino and mrl.

The fix you made make it work in the situation that you explain (a clean slate arduino, even if it did not reset on connect) but may not work if the arduino already had some data setup

So...  the first screen shot is not what i would expect.  Once the arduino is powered on, you should actually have a clean first open of the serial port.

The second screen shot that you shared of just the "loop" methods, is what i would also see, however, that was because there was a ton of garbage on the line and, I'm curious why there was no reset..  possibly a hardware problem.. but I suspect you're seeing something else.  again still skeptical.

the last one..  why / how did you end up with 2 MRLComm reset messages in a row?  

I think a better test would have been to print out the loop counter..  This would truly show that the board didn't reset.  If that's the case, then the loop counter should never reset and only increase.  

something like a global counter of the number of times the loop method was called. you should be able to see it reset to zero on every re-open..    if you can show that on your hardware, i'll start thinking about coming around to the dark side on this one...  however,  it does point to hardware problems.  Is there any damange on the ftdi pins? or other pins on the board? 

Any ideas of the manufacturer of the board?  lIs it a knock off arduino or an official one?

I do have an arduino mega with a faulty reset switch, so there are hardware problems with some of the boards, I acknowledge that.

As for loosing bytes..  I still quite firmly believe that any dropped byte to/from the arduino is grounds for a full reset.  That should be disconnecting the arduino.. closing the port.. opening it back up and re-syncing any devices that had previously been attached.  

 

 

kwatters

4 years 4 months ago

In reply to by kwatters

I also want to make note that it would be and is very common that many full board info messages might be on the serial port when you open it back up.  These are bogus messages and need to be discarded because they were from the last time the port was opened not the current time.  

The only way to know that the mrlcomm messages are from this current session it to wait until after you see something from a setup method.  In my testing locally, I noticed many times that you can see a LOT of data when you first open the serial port.

out of curiosity, what happens when you hit the reset button on the arduino, did it sync back up?  or did it stay out of sync?

 

As for the "additional complexity" comment.  The additional complexity wasn't regarding the "if" statements.  The additional complexity is that now the board_info message has double duty.  And that when you see a board info message, you can't actually trust it unless you've already seen a proper mrlcomm begin message , as the board info message may have come from a previous reset of the arduino..  that is a lot of complexity in my opinion,   (i'm not picking on the if statements in particular, but rather the overloaded usage of the board info message when you can't trust that the board info message came from the current session.)

 

GroG

4 years 4 months ago

In reply to by kwatters

So...  the first screen shot is not what i would expect.  Once the arduino is powered on, you should actually have a clean first open of the serial port.

The second screen shot that you shared of just the "loop" methods, is what i would also see, however, that was because there was a ton of garbage on the line and, I'm curious why there was no reset..  possibly a hardware problem.. but I suspect you're seeing something else.  again still skeptical.

It's not resetting on serial open.

the last one..  why / how did you end up with 2 MRLComm reset messages in a row?  

The working one ?  Not sure, I was curious about that too - it looks like it double resets.

I think a better test would have been to print out the loop counter..  This would truly show that the board didn't reset.  If that's the case, then the loop counter should never reset and only increase.  

something like a global counter of the number of times the loop method was called. you should be able to see it reset to zero on every re-open..    if you can show that on your hardware, i'll start thinking about coming around to the dark side on this one...  however,  it does point to hardware problems.  Is there any damange on the ftdi pins? or other pins on the board? 

I'll print counter

Any ideas of the manufacturer of the board?  lIs it a knock off arduino or an official one?

Both Genuine Arduino Megas.
No damage to any pins that I'm aware of - it works fine, it even uploads fine.

I do have an arduino mega with a faulty reset switch, so there are hardware problems with some of the boards, I acknowledge that.

As for loosing bytes..  I still quite firmly believe that any dropped byte to/from the arduino is grounds for a full reset.  That should be disconnecting the arduino.. closing the port.. opening it back up and re-syncing any devices that had previously been attached.  

 

 

New code - printing out counter

"Bad" Ardruino that has "feature" of not running setup on serial connect.
First time connect after upload

Yup .. it consistently does a double-tap when connecting first time.  Mebbe its afraid of zombies....

ITS A ZOMBIE !!!!  yup .. doesn't matter how many times I open the serial port - it doesn't run setup

The other "good" board consistently does a double-tap on every serial open

It typically always surprises me to really dig how things work ... now I'm curious about other boards ... is double tap a consistent "feature" ? 

GroG

4 years 4 months ago

if you can show that on your hardware, i'll start thinking about coming around to the dark side on this one... 

Welcome to the Dark Side...

GroG

4 years 4 months ago

I'd be fine considering ideas for sync'ing.  For instance perhaps board info could send a sequence number, if it wasn't below a certain value Java could tell it to reset.

For the record, its not that uncommon for a protocol to send a heartbeat on the line, e.g. the syn ack of TCP/IP is how you know the pipe is "connected"

Minimally, I think relying on setup() to make or break connectivity is not the way to go.  My board currently does not run setup and I see this as unusual because I think it used to (recent hardware change). 

But my spidey sense tells me this is not that uncommon.   I have some more boards to test.

Ahahahaha :D

3rd Board Tested

Genuine Arduino Uno

No Reset on Serial Open !

I remember I was shocked when kwatters told me setup was run every time a serial connection was made.  I thought "that is silly". What if I want to preserve state information and read that info from the serial port ? 
I remember reading Arduino docs saying that was the case ... apparently "not always", and now I'm wondering how often does this really occur.

So far 2 Genuine Megas tested & 1 Genuine Uno

Stats :
No reset on serial connect 2
Reset on serial connection 1

I've got a dozen plus arduinos.. and every one of them resets on a serial port open...  

consistently..  I've actually never seen an arduino do what you are demonstrating here..

It looks like you're doing this from linux, perhaps there's some details of the uart driver on linux and it doesn't actually toggle the DTR signal pin on the usb serial port?  Does it also behave like this on windows?  what if you use some 3rd party terminal emulator to open and close the port?  

so odd that you would have so many non-compliant boards.  the arduino spec says reset on DTR toggle (port open)  and that is what i have always observed... every single time..  I have never seen it not reset as expected .... 

how do you trigger a reset of your arduinos then?  do you actually have to unplug them and plug them back in?   

It pains me to think that we would be writing the MrlComm code for non-compliant defective hardware...

Heh .. "wonderful" toys ?  Well I guess they add testing diversity ;)

Yes, I wondered the same thing - I have a windows laptop that I'll test with, and ya .. the same thing crossed my mind.  Although JSSC is orders of magnitude better than RxTx I'm sure there are some escaped use cases considering the wonderful and wild frontier of Linux device drivers :)

Hmmm...

So I tested all the boards on windows ...
All do the double-tap, but all will reset on serial open :P

So we can attribute this one to weird state problems with Linux driver (perhaps) ....

If ya'll have a simple "fix" to make MrlComm robust across this I'd like to hear it.

I believe in the aesthetic that less code is best code ... but I'm still having a hard time with the "necessary" dependency of setup() starting successful communication.

BTW - we havent tested ESP8226 via TCP/HTTP - do these reset and run setup on every connection ?  
I see future problems down the road....

Maybe something like that?

while the arduino have not done any change, have it spit MRL_COM_BEGIN instead of BOARD_INFO. So that mrl know it deal with a newly resetted arduino.

after opening the port, if mrl receive a BOARD_INFO, that's mean the arduino have not been resetted so force a reset (a soft reset or trigger an error that will make the arduino to reset), then close and reopen the port, or you could compare if deviceSummary is what expected to see if it need to be resetted

Well,  I'm happy to hear that the issue is that your linux driver isn't setting DTR to high as expected to signal to the usb device that the port is open.  That's a whole lot better than the hardware error hypothesis.  

Rather than "fixing" it in MrlComm..  It'd be good to understand the true nature of the problem.

What version of linux? x86? arm?  Were you using the standard Arduino IDE?  Some other terminal emulator/program?  I thought we were using JSSC everywhere now...  I thought RxTx lib was gone.. but perhaps not completely?

As far as ESP boards... yeah,  I think we'll want to have a better story for those in the future, but for now, I think it's important to understand why your serial ports are not opening and closing as expected.  The program that is used to communicate with the serial device really needs to do it properly, and if the serial ports are properly being opened and closed, then the arduinos will behave as expected.

Grr ... now it consistently resets .. I can't reproduce the issue 

Ubuntu 18.04 Arduino IDE 1.8.10
No nothing uses RxTxLib - it was (thankfully) cleanly scraped from mrl a long time ago.

Since I can't reproduce the issue, I'm going to chock it up to perhaps driver/jssc getting confused of the state when I have been unplugging and re-plugging in boards.

From my experience Linux appears to not always keep the same dev device with the same board .. the mapping and re-mapping of devices to boards seems a bit wacky in comparison to windows which is pretty consistent on assigning a COM value to the same board (which is done I assume with a unique identifier)

You can lock down the dev device to a board, but it takes manual configuration ...

Ok, lets just chock that up to a transitory blip in Linux (but I have evidence it occasionally occurs - and reprocussions also occur which I think could be avoided)

And move on to the ESP boards.  I have several and would like to begin using them with mrl.  How shall we proceed on a design level ?   

Different issue, same problem, but even more severe...

You can't depend on setup to initialize communication.

How do you want to proceed?

dang...  i really wish there was a way to know what the root cause of the issue was...  I'm really trying to rationalize (in my head) how / what could have caused your serial port to not be opening / closing / opening as expected.

Normally.. the serial port (rs-232 style) sees DTR is low.. this doesn't actually concern the arduino.. it starts up.. runs its setup method one time.. and then presumably begins it's loop method.   (I think we have a check to see if the serial port is available before we return from the setup method..) so,  I think, the loop method won't actually start... this is why there is no garbage on the serial port line on the first connection.. 

so , on the first connection of the serial port.. DTR goes high.. and mrlcomm sees serial port is available and it proceeds into the loop method.  now.. mrlcomm begins sending the mrlcomm board info message every second....    so long as the serial port is open those bytes are read away in java land...  or what ever terminal has the port open... 

now.. you close the port... DTR goes low.. but MrlComm doesn't care.  It's still in the loop method and can write to the serial port.  The serial port will continue to get data written to it in it's buffer..   I'm pretty sure that if DTR is low, and you write to the serial port, that it will start dropping bytes...

so, then the next time the serial port opens up, DTR goes high, the arduino sees that event and it triggers a reset of the sketch that is currently running.  This terminates the current loop method call, resets the local sketch memory, and goes through the setup method again.   At this point, the java/ terminal side, will see the partial data in the serial port buffer  (including a bunch of partial board info methods..) then it will read the newly written data from the setup method.  This is the MrlCommBegin message.  so, reconnecting to an arduino running mrlcomm, you will almost always read some partial bogus board info messages.  

so, that's how it works.. that's how it is designed to work.. the real question is.. how did you actually  open / close / re-open the port...  from what i can tell.. the only hypothesis that i could come up with is.. 

1. you didn't actually close the port when you though you did.

2. somehow there are multiple processes that are keeping the serial port open for reading , and that closing one application didn't drop the DTR signal because another process was still holding the port open.  (this is highly unlikely, because you shouldn't (at an operating system level) be able to open a port that is already in use by another application.

3. there is some crazy bug in linux relating to very simple serial port operations that have been aroun dfor liek 50+ years.  

4.  some bizzare cosmic rays are bombarding your arduinos and they are making some strange quantum jump between alternative universes...

...

ok.. so ESP boards... cool!   lets' bring this whole topic up in another top level post.  It deserves it's own blog post.  As part of the initial design, we should understand under what circumstances does the ESP board call it's "setup" method.. how different is it from an actual Arduino Mega ?