Mysterious Corruption Of Data Causes RMIR 'Upload Failure'

Discussion forum for JP1 software tools currently in use, or being developed, such as IR, KM, RemoteMaster, and other misc apps/tools.

Moderator: Moderators

Post Reply
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Mysterious Corruption Of Data Causes RMIR 'Upload Failure'

Post by WagonMaster »

wnewell, in another thread, wrote:Download from remote no problem. But on upload, verify fail with data doesn't match or something like that, but everything appears to be ok when I download from remote and look at it.
I've been getting this message under certain conditions for a while. I mentioned it offhandedly in one of my other posts, but nobody else commented, so I wasn't sure if I was the only one seeing it or not.
wnewell, in another thread, wrote:I'm wondering if the failed upload message is because of my cable.
Probably not, since I'm getting the same error message as you, but with a self-built cable.
wnewell, in another thread, wrote:but I'm not sure if ir.exe verifies uploads either..
I think it will if you enable the "Verify Writes" option under the "Interface" menu item.
gfb107, in another thread, wrote:I don't know about the upload verify fail message. It may be real or it may be a bug.
I'm becoming pretty convinced that it's a bug, as explained below.
gfb107, in another thread, wrote:You could save before uploading, then save to a different file after downloading and compare the [Buffer] sections of the before/after files and see what if anything has changed
That's wise advice, but in this case, I don't think he'll find any difference. The mismatch, in my case, has been internal to RMIR -- it appears to be corruption of an internal data array (more below).

For the record, here's the exact error message that appears in the pop-up dialog:

Code: Select all

Upload verify failed: data read back doesn't match data written.
Other than my offhanded comment referred to above, I didn't mention more about it earlier because I wasn't sure if it was happening only to me or not and I wasn't sure if it was something I was doing wrong or not.

In fact, I was preparing a post about this very issue yesterday (most of which is now included below) when I encountered some odd, unpredictable behavior that made me hold off because I started thinking it might be a procedural error on my part.

Like 'wnewell', I'd found that the error doesn't seem to affect the operation of the remote. It's as if the error message itself was an error! :) In other words, I've noticed that the failure message never seemed to really be meaningful because the remote control would always work as expected after such a failure message, as if the upload had succeeded perfectly well.

But, I'd already (sporadically) spent 3 or 4 hours over the course of 2 or 3 days chasing this down. I had developed a reliable way to duplicate the problem, so I added some test code to dump some information to the log. In the part of the code where that failure message occurs, I added this code:

Code: Select all

// 
// Dump the mismatched parts of flash RAM...
// 
for ( int i = 0; i < data.length; ++i ) {

   if (data[i] != readBack[i]) {
      System.err.println( "Failure: " + 
                          Integer.toHexString(i) + " " + 
                          Integer.toHexString(data[i]) + " " + 
                          Integer.toHexString(readBack[i]));
   }
}
Aside: I was floored to find that Java does not have an 'unsigned byte' data type! That seems almost incomprehensible to me. And the way Java deals with it -- using 16-bit integers to store what should be a simple 8-bit 'unsigned byte' -- seems like a huge hack to me.

Here's what my test showed for the mismatched data:

Code: Select all

Failure: 400 148f 8f
Failure: 40e ffffffa8 a8
Failure: 40f ffffffa8 a8
Failure: 410 ffffffa8 a8
Failure: 411 ffffffa8 a8
Failure: 425 1450 50
The first value is, of course, the hex address (relative to the start of flash memory), followed by the "byte" of what RMIR thinks it just uploaded, followed by the "byte" of what it just read back on the 'verify' step.

As you can see, these mismatched bytes all seem to be caused by some sort of corruption of the high-order byte of the 16-bit value of the data that RMIR thinks it uploaded. This corruption seems to occur when I change one of the entries for "Setup Code" on the "General" tab before performing the upload, but there may be other cases which trigger the problem.

Armed with that knowledge, I tried to figure out exactly where the corruption was actually ocurring. I even tried learning the Java debugger well enough to "trap" it. But I failed. For some reason, I cannot seem to get the Java debugger to stop where/when I ask it to!

So, having pinned the problem down as much as I realistically could without a day-long Java tutorial, I was getting ready to officially ask for help from the "RM master". :) Then I saw some odd behavior that made me think I might be doing something wrong. Then the post by 'wnewell' in the other thread essentially forced the issue.

Greg, any ideas what's causing this corruption? Or any tips on running the Java debugger?

Here's the sequence of events with which I can predictably cause the problem:
  1. Run RMIR. (All my testing was with v1.96 on Linux.)
  2. Open this RMIR file: 'RS-15-135-with-Magnavox-device-upgrade.rmir'. It's got a device upgrade (taken from the forum) for a Magnavox ATSC tuner box (TB100MW9) and changed to use the RS 15-135 remote.
  3. Change a "Setup Code" on the "General" page. I was changing the "DVD" device from code "0636" (the default in a virgin RS 15-135 load) to "1010".

    As near as I can tell, this (or something triggered by this action) is the point where the internal data array is corrupted.
  4. Upload to the remote. You should then see the "Upload verify failed: data read back doesn't match data written." error message.
I later realized (by running 'IR.exe', which nicely colorizes "illegal" codes red) that the RS 15-135 remote doesn't have '1010' as a built-in device "setup code" for devices of type "DVD" (type 2, per the RDF). That's what made me hesitate about posting. I was planning on doing more tests to make sure that my procedure wasn't causing the issue.

Even though my test case may be invalid, I think it may be useful for tracking down the source of the data corruption, which may be happening even in "valid" test case scenarios. I just haven't had time to pin that down.

I was really hoping to debug this a bit more before posting about it because I didn't want to essentially dump this in Greg's lap if I had a chance of figuring it out. But my inexperience with Java may have meant that posting for help was unavoidable anyway. So, I'm sorry to dump this one on you, Greg. I actually was making a "best effort" to figure this out on my own. If there's anything more I need to do to debug this (above what I was already planning), please let me know.

Bill
gfb107
Expert
Posts: 3411
Joined: Sun Aug 03, 2003 7:18 pm
Location: Cary, NC
Contact:

Post by gfb107 »

Looks like you've done some good leg work here.
As you noted, Java doesn't have an unsigned byte data type, so I'm using 16-bit shorts to hold 8 bit values.
Some of the operations performed in RMIR can overflow beyond 8 bits, and is probably causing this problem.

I have lots of code in places to mask off the overflows, but it looks like I've missed some.

From the RDF for the 15-135, I see that $0400 is the checksum for $0402..$0BFF, which is the upgrade area. This provides a clue as to where to look.
gfb107
Expert
Posts: 3411
Joined: Sun Aug 03, 2003 7:18 pm
Location: Cary, NC
Contact:

Post by gfb107 »

3FG
Expert
Posts: 3442
Joined: Mon May 18, 2009 11:48 pm

Post by 3FG »

Aside: I was floored to find that Java does not have an 'unsigned byte' data type! That seems almost incomprehensible to me. And the way Java deals with it -- using 16-bit integers to store what should be a simple 8-bit 'unsigned byte' -- seems like a huge hack to me.
Conceptually is is a hack, but there is apparently no size penalty. I've been told, and a Google search seems to confirm, that the Java Virtual Machine has a minimum word size of the native pointer size. That is, a byte, short, char, or int is stored as 32 bits (or 64 in 64 bit computers).

Also, as I understand it, the only manipulations that would require masking are shifts and reading in byte size data. In both case, bytes are automagically promoted to type int, so a construct like int val = (b & 0xFF) is necessary when reading in byte data. Once the data is stored as an array of int or short, only shifts would need masking.
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Post by WagonMaster »

3FG wrote:Conceptually [it] is a hack, but there is apparently no size penalty.
I didn't delve into the details too deeply, but I too saw hints of that in my initial search.

Regardless of storage issues, that decision by the Java designers really makes for some ugliness in the code required to accommodate it. I consider 'unsigned byte' a fundamental type and I can't believe that it was completely ignored. I mean, why bother with a 'byte' type at all if not supporting 'unsigned byte'? I'm no language designer, but that seems like a monumentally stupid decision to me. I've never felt a burning need to learn Java and after seeing stuff like that, I really haven't changed my opinion.

Bill
gfb107
Expert
Posts: 3411
Joined: Sun Aug 03, 2003 7:18 pm
Location: Cary, NC
Contact:

Post by gfb107 »

I've also made the v1.98beta1 available at
https://sourceforge.net/projects/contro ... p/download
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Post by WagonMaster »

Thanks for the alternate download site, Greg. That worked out much better than pulling that large file from this forum.

I'll be testing 1.98beta1 and reporting back here soon.

Bill
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Post by WagonMaster »

gfb107 wrote:Try RM v1.98beta1.
I just tested this build. Unfortunately, I'm still getting the same "Upload verify failed: data read back doesn't match data written." error message when I follow the 4-step procedure outlined in my first post.

And I don't see anything out-of-the-ordinary in the 'rmaster.err' file.

Bill
gfb107
Expert
Posts: 3411
Joined: Sun Aug 03, 2003 7:18 pm
Location: Cary, NC
Contact:

Post by gfb107 »

I'll make a new build that will dump the bytes that don't match to rmaster.err
gfb107
Expert
Posts: 3411
Joined: Sun Aug 03, 2003 7:18 pm
Location: Cary, NC
Contact:

Post by gfb107 »

Here it is: https://sourceforge.net/projects/contro ... p/download

Also includes one more masking change.
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Post by WagonMaster »

gfb107 wrote:I'll make a new build that will dump the bytes that don't match to rmaster.err
Thanks, Greg. I'd have added that code myself and re-tested, but I'm slightly crippled without the latest source code. :) Or has it been in SourceForge all along and I was just too dumb to check? :eek:

I tested 1.98beta2. I'm still getting the error, but it seems that fewer bytes have been corrupted now. Here's the end of 'rmaster.err':

Code: Select all

Checking protocol "Emerson Combo (4-device)" (00 65 : 2)
Imported fixedData is A8 A8 A8 A8
Imported device parms are: 42 42 42 42
Calculated fixedData is A8 A8 A8 A8
It's a match!
And it's longer, or the protocol code matches!
Using "Emerson Combo (4-device)" (00 65 : 2)
Checking for Special Procotol Pause w/ PID=01 FB
in getDeviceUpgrade
Checking TV/1104(null)
No match found!
Decoding advCode at $26, keyCode=162:Shift-Audio
length=5
0411: FFA8 != A8
As you can see, only 1 byte fails to match now, so we're clearly getting close!

You may have considered this already, but I'd humbly suggest that you leave that "byte mismatch" debug code in place permanently so that if anyone ever encounters this in the future, they'd already have the list of mismatched addresses/bytes in the log. It might even be wise to append a short "(nnn bytes different)" message to the text displayed to the user in the warning dialog, just so someone knows just how drastic the mismatch was. I don't know how badly a set of weak/failing batteries might corrupt an upload, but the 'byte mismatch' count might be helpful in distinguishing between an actual bug and a simple case of dying batteries. Just a thought, for whatever it's worth.

I had assumed (wrongly) that these 1.98beta builds were just to test for the problem mentioned in this thread. When I saw from the 'ChangeLog.txt' file that you'd made some of the other changes suggested by my problem reports, I tested those things too. I'm happy to report that the changes associated with problems reported in both these threads appear to be fixed: Nice work!

Due to reloading the remote that was triggering this, I can no longer get the pop-up dialog that was associated with this error, so I cannot confirm this fix: The only outstanding issue I'm aware of with RM/RMIR (besides this thread itself) is this recent addition of mine: Sorry to pester you with all these things, but I think the end result is that RMIR is getting more usable every day.

Bill
WagonMaster
Posts: 366
Joined: Thu Apr 16, 2009 2:25 pm

Post by WagonMaster »

I just tested 1.98beta3 and that seems to have nicely solved the problem -- many thanks!

Bill
Post Reply