Menu

#504 Segmentation fault in readbyte (memory_pages.c)

NextRelease
closed-fixed
nobody
None
5
2024-10-19
2024-01-12
Pete Moore
No

Intermittent crash (Segmentation fault). Core dump attached. Fuse version 1.5.7 (https://sourceforge.net/projects/fuse-emulator/files/fuse/1.5.7/fuse-1.5.7.tar.gz/download).

Core was generated by `fuse --textfile test_po_mosaic_half.0.log --speed 100000 --machine 128 --no-sou'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384
384   if( mapping->contended ) tstates += ula_contention[ tstates ];
[Current thread is 1 (Thread 0xffffa42fa020 (LWP 18956))]
(gdb) bt
#0  0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384
#1  0x0000aaaae983ed5c in z80_do_opcodes () at ./z80/z80_cb.c:419
#2  0x0000aaaae97d684c in main (argc=<optimized out>, argv=<optimized out>) at fuse.c:199
1 Attachments

Related

Bugs: #504

Discussion

  • Sergio Baldoví

    Sergio Baldoví - 2024-01-14

    Thank you for reporting. It looks like an array access out of bounds that happens on a long running machine.

    #define ULA_CONTENTION_SIZE 80000
    libspectrum_byte ula_contention[ ULA_CONTENTION_SIZE ];
    libspectrum_byte ula_contention_no_mreq[ ULA_CONTENTION_SIZE ];
    
    (gdb) print tstates
    $1 = 17062059
    
     
  • Pete Moore

    Pete Moore - 2024-01-15

    Many thanks Sergio. I get a slightly different value for tstates from my gdb session, but it is still out of range.

    (gdb) print tstates
    $3 = 1293694
    (gdb) print machine_current->timings.tstates_per_frame
    $5 = 70908
    

    Looking through the code, it looks like tstates is meant to represent the number of tstates since the start of the frame, so should always be between 0 and tstates_per_frame-1. However, there are a few places where tstates is increased (e.g. when there is contention) where there aren't bounds checks. In my case above, tstates is wildly out of bounds, so I wonder if perhaps this is just due to it reading an invalid entry beyond the array, which set tstates to a wild value. For example there are lines like:

    tstates += ula_contention_no_mreq[ tstates ];
    

    Assuming tstates is just out of range, after this line executes, and it reads beyond the ula_contention_no_mreq array, tstates could be wildly out of range, like in the core dump.

    So my current hypothesis is that at some point, close to then end of the frame, there is some contention, that bumps tstates just out of range (i.e. above tstates_per_frame - 1). It looks like tstates is "reset" (reduced by tstates_per_frame) inside

    int spectrum_frame ( void )
    

    If this is the case, there are a few alternative approaches to fixing the problem that seem worth considering:

    1) replace references to ula_contention[tstates] to ula_contention[tstates % tstates_per_frame], and the same for ula_contention_no_mreq, or
    2) have a function to add a value to tstates that does the bounds checking, and replace current code that bumps the value to use this function, or
    3) make ula_contention and ula_contention_no_mreq slightly larger, to cover cases when in the final tstates of the frame, there is contention that pushes the tstates into the next frame. Of course then, the array values would just wrap around the initial entries of the array.

    I don't have a deep understanding of the fuse codebase, so just wanted to check in with you if you agree with the analysis, and if you would be interested in me proposing a patch with one of the above solutions.

     

    Last edit: Pete Moore 2024-01-15
  • Philip Kendall

    Philip Kendall - 2024-01-15

    (1) will as commented elsewhere do bad things to RZX playback; RZX frames can run to more than tstates_per_frame if they were created on an inaccurate emulator.
    (2) I'd be against as it is in the hot path.
    (3) has to a very real extent already been done in that the array is 80000 long as opposed to the 70908 tstates of the longest official model (+ a bit of bodge room for RZX). Making it a million long seems very wrong.

    I feel the correct fix here is to work out why tstates is so far out of bounds and deal with the root cause of that, rather than bodging around it being out of bounds.

     
  • Pete Moore

    Pete Moore - 2024-01-19

    Many thanks Philip! I've added some debug, I'll let you know how I get on...
    Unfortunately I can't get rr to work under docker on my M1 mac, so resorting to rather primitive printf tactics!

    https://github.com/spectrum4/spectrum4/commit/31430d6e8fc0881e12676b78b36a4a23131ac7f0

     
  • Pete Moore

    Pete Moore - 2024-01-20

    Turned out to be a simple overflow. Fixed in https://sourceforge.net/p/fuse-emulator/fuse/merge-requests/29/

    Any questions, let me know!
    Thanks.

     
    • Pete Moore

      Pete Moore - 2024-01-21

      Note, also happy to change to this form, if you find it more readable…

      current version in PR:

      return a->tstates != b->tstates ? (a->tstates > b->tstates) - (a->tstates < b->tstates)
      : a->type - b->type;

      but could be rewriten as:

      if (a->tstates > b->tstates) {
      return 1;
      } else if (a->tstates < b->tstates) {
      return -1;
      } else {
      return a->type - b->type;
      }

      this is slightly more optimised (e.g. 7 fewer instructions under gcc / aarch64).

      I don’t mind either way.

      Another question might be whether it makes sense to allow events to be scheduled 2^31 tstates in the future (that’s about 10 mins at x1 speed, I believe), which is when this bug happens. It happens when --speed is very high, e.g. 1000000. Perhaps it would be better to just have an option like "--speed MAX" to signify you want to run at the maximum speed, and doing so would avoid any timer waiting code? Running at full speed is useful for e.g. unit tests you want to run for spectrum software you have written.

      On 21. Jan 2024, at 00:17, Pete Moore worc0260@users.sourceforge.net wrote:

      Turned out to be a simple overflow. Fixed in https://sourceforge.net/p/fuse-emulator/fuse/merge-requests/29/ https://sourceforge.net/p/fuse-emulator/fuse/merge-requests/29/
      Any questions, let me know!
      Thanks.

      [bugs:#504] https://sourceforge.net/p/fuse-emulator/bugs/504/ Segmentation fault in readbyte (memory_pages.c)

      Status: open
      Group: future
      Created: Fri Jan 12, 2024 12:50 PM UTC by Pete Moore
      Last Updated: Fri Jan 19, 2024 02:03 PM UTC
      Owner: nobody
      Attachments:

      test_po_mosaic_half.0.core https://sourceforge.net/p/fuse-emulator/bugs/504/attachment/test_po_mosaic_half.0.core (13.4 MB; application/octet-stream)
      Intermittent crash (Segmentation fault). Core dump attached. Fuse version 1.5.7 (https://sourceforge.net/projects/fuse-emulator/files/fuse/1.5.7/fuse-1.5.7.tar.gz/download).

      Core was generated by `fuse --textfile test_po_mosaic_half.0.log --speed 100000 --machine 128 --no-sou'.
      Program terminated with signal SIGSEGV, Segmentation fault.

      0 0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384

      384 if( mapping->contended ) tstates += ula_contention[ tstates ];
      [Current thread is 1 (Thread 0xffffa42fa020 (LWP 18956))]
      (gdb) bt

      0 0x0000aaaae97dcc34 in readbyte (address=23611) at memory_pages.c:384

      1 0x0000aaaae983ed5c in z80_do_opcodes () at ./z80/z80_cb.c:419

      2 0x0000aaaae97d684c in main (argc=<optimized out="">, argv=<optimized out="">) at fuse.c:199</optimized></optimized>

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/fuse-emulator/bugs/504/ https://sourceforge.net/p/fuse-emulator/bugs/504/
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #504

  • Philip Kendall

    Philip Kendall - 2024-01-21

    For what it's worth, I know where the very large tstates value is coming from - in timer_frame, we set an event 10 (real world) milliseconds into the future. That's fine when running at "reasonable" speeds but when we get to silly speeds that becomes a rather large number.

    Given this is a "housekeeping" event used for speed control/calculations and the like, the simple method here would just be to cap the max time for that event to something reasonable. Agreed in the longer term it might be nice to have a "fully unrestricted" mode but it's honestly a bit of a niche use case.

     
  • Philip Kendall

    Philip Kendall - 2024-01-24
    • status: open --> closed-fixed
     
    • Pete Moore

      Pete Moore - 2024-01-25

      You're welcome. :-)

      ... and thanks for reviewing, merging, and adding the defensive code!

      On 24 Jan 2024, 21:57, at 21:57, Philip Kendall pak21@users.sourceforge.net wrote:

      • status: open --> closed-fixed
      • Comment:

      Defensive code added to timer.c, going to close this one now, many
      thanks for the investigation and the fix here.


      [bugs:#504] Segmentation fault in readbyte (memory_pages.c)

      Status: closed-fixed
      Group: future
      Created: Fri Jan 12, 2024 12:50 PM UTC by Pete Moore
      Last Updated: Sun Jan 21, 2024 09:45 PM UTC
      Owner: nobody
      Attachments:

      -
      test_po_mosaic_half.0.core
      (13.4 MB; application/octet-stream)

      Intermittent crash (Segmentation fault). Core dump attached. Fuse
      version 1.5.7
      (https://sourceforge.net/projects/fuse-emulator/files/fuse/1.5.7/fuse-1.5.7.tar.gz/download).

      ~~~
      Core was generated by `fuse --textfile test_po_mosaic_half.0.log
      --speed 100000 --machine 128 --no-sou'.
      Program terminated with signal SIGSEGV, Segmentation fault.

      0 0x0000aaaae97dcc34 in readbyte (address=23611) at

      memory_pages.c:384
      384 if( mapping->contended ) tstates += ula_contention[ tstates ];
      [Current thread is 1 (Thread 0xffffa42fa020 (LWP 18956))]
      (gdb) bt

      0 0x0000aaaae97dcc34 in readbyte (address=23611) at

      memory_pages.c:384

      1 0x0000aaaae983ed5c in z80_do_opcodes () at ./z80/z80_cb.c:419

      2 0x0000aaaae97d684c in main (argc=<optimized out="">, argv=<optimized</optimized>

      out>) at fuse.c:199

      ~~~


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/fuse-emulator/bugs/504/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #504

  • Philip Kendall

    Philip Kendall - 2024-01-24

    Defensive code added to timer.c, going to close this one now, many thanks for the investigation and the fix here.

     
  • Sergio Baldoví

    Sergio Baldoví - 2024-10-19
    • Group: future --> NextRelease
     

Log in to post a comment.

MongoDB Logo MongoDB