Intermittent crash (Segmentation fault). Core dump attached. Fuse version 1.5.7 (https://sourceforge.net/projects/fuse-emulator/files/fuse/1.5.7/fuse-1.5.7.tar.gz/download).
Core was generated by `fuse --textfile test_po_mosaic_half.0.log --speed 100000 --machine 128 --no-sou'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384
384 if( mapping->contended ) tstates += ula_contention[ tstates ];
[Current thread is 1 (Thread 0xffffa42fa020 (LWP 18956))]
(gdb) bt
#0 0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384
#1 0x0000aaaae983ed5c in z80_do_opcodes () at ./z80/z80_cb.c:419
#2 0x0000aaaae97d684c in main (argc=<optimized out>, argv=<optimized out>) at fuse.c:199
Thank you for reporting. It looks like an array access out of bounds that happens on a long running machine.
Many thanks Sergio. I get a slightly different value for tstates from my gdb session, but it is still out of range.
Looking through the code, it looks like tstates is meant to represent the number of tstates since the start of the frame, so should always be between 0 and tstates_per_frame-1. However, there are a few places where tstates is increased (e.g. when there is contention) where there aren't bounds checks. In my case above, tstates is wildly out of bounds, so I wonder if perhaps this is just due to it reading an invalid entry beyond the array, which set tstates to a wild value. For example there are lines like:
Assuming tstates is just out of range, after this line executes, and it reads beyond the ula_contention_no_mreq array, tstates could be wildly out of range, like in the core dump.
So my current hypothesis is that at some point, close to then end of the frame, there is some contention, that bumps tstates just out of range (i.e. above tstates_per_frame - 1). It looks like tstates is "reset" (reduced by tstates_per_frame) inside
If this is the case, there are a few alternative approaches to fixing the problem that seem worth considering:
1) replace references to ula_contention[tstates] to ula_contention[tstates % tstates_per_frame], and the same for ula_contention_no_mreq, or
2) have a function to add a value to tstates that does the bounds checking, and replace current code that bumps the value to use this function, or
3) make ula_contention and ula_contention_no_mreq slightly larger, to cover cases when in the final tstates of the frame, there is contention that pushes the tstates into the next frame. Of course then, the array values would just wrap around the initial entries of the array.
I don't have a deep understanding of the fuse codebase, so just wanted to check in with you if you agree with the analysis, and if you would be interested in me proposing a patch with one of the above solutions.
Last edit: Pete Moore 2024-01-15
I've created https://sourceforge.net/p/fuse-emulator/fuse/merge-requests/28/ based on option 1) above - let me know what you think! :-)
(1) will as commented elsewhere do bad things to RZX playback; RZX frames can run to more than
tstates_per_frameif they were created on an inaccurate emulator.(2) I'd be against as it is in the hot path.
(3) has to a very real extent already been done in that the array is 80000 long as opposed to the 70908 tstates of the longest official model (+ a bit of bodge room for RZX). Making it a million long seems very wrong.
I feel the correct fix here is to work out why
tstatesis so far out of bounds and deal with the root cause of that, rather than bodging around it being out of bounds.Many thanks Philip! I've added some debug, I'll let you know how I get on...
Unfortunately I can't get rr to work under docker on my M1 mac, so resorting to rather primitive printf tactics!
https://github.com/spectrum4/spectrum4/commit/31430d6e8fc0881e12676b78b36a4a23131ac7f0
Turned out to be a simple overflow. Fixed in https://sourceforge.net/p/fuse-emulator/fuse/merge-requests/29/
Any questions, let me know!
Thanks.
Note, also happy to change to this form, if you find it more readable…
current version in PR:
return a->tstates != b->tstates ? (a->tstates > b->tstates) - (a->tstates < b->tstates)
: a->type - b->type;
but could be rewriten as:
if (a->tstates > b->tstates) {
return 1;
} else if (a->tstates < b->tstates) {
return -1;
} else {
return a->type - b->type;
}
this is slightly more optimised (e.g. 7 fewer instructions under gcc / aarch64).
I don’t mind either way.
Another question might be whether it makes sense to allow events to be scheduled 2^31 tstates in the future (that’s about 10 mins at x1 speed, I believe), which is when this bug happens. It happens when --speed is very high, e.g. 1000000. Perhaps it would be better to just have an option like "--speed MAX" to signify you want to run at the maximum speed, and doing so would avoid any timer waiting code? Running at full speed is useful for e.g. unit tests you want to run for spectrum software you have written.
Related
Bugs:
#504For what it's worth, I know where the very large
tstatesvalue is coming from - intimer_frame, we set an event 10 (real world) milliseconds into the future. That's fine when running at "reasonable" speeds but when we get to silly speeds that becomes a rather large number.Given this is a "housekeeping" event used for speed control/calculations and the like, the simple method here would just be to cap the max time for that event to something reasonable. Agreed in the longer term it might be nice to have a "fully unrestricted" mode but it's honestly a bit of a niche use case.
You're welcome. :-)
... and thanks for reviewing, merging, and adding the defensive code!
On 24 Jan 2024, 21:57, at 21:57, Philip Kendall pak21@users.sourceforge.net wrote:
Related
Bugs:
#504Defensive code added to
timer.c, going to close this one now, many thanks for the investigation and the fix here.