Menu

os9 copy double-spaces some lines

Anonymous
2023-07-12
2023-07-16
  • Anonymous

    Anonymous - 2023-07-12

    Today I noticed that one of my text files has the last 229 lines double-spaced when using "os9 copy" to copy a Windows text file to an OS-9 disk image. Those lines have CR,LF converted to CR,CR instead of only CR.
    I have never noticed this happening before, and can't see what might have triggered it.
    The Windows file has 2480 lines and 72089 bytes in it, but only the last 229 lines are affected.
    The command line doing the copy:
    os9 copy -l -r KRNBOOT\dskboot.asm NOS9DEV.DSK,KRNBOOT/dskboot.asm
    I am using ToolShed v2.2 on a Windows 7 system.
    Has anyone else seen this behaviour? Any ideas as to what might be causing it?
    Dave W

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    Can you share the file? Or a part of it which reproduces the bug?

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    After a quick look at the code I can imagine this happens because the file is processed in buffer-size chunks, and each chunk is checked with DetermineEOLType(). If the file is split so that one chunk ends with CR and the next starts with LF, the latter chunk will be detected as a EOL_UNIX type and simple LF->CR conversion will be done on it.

    Maybe the simplest fix would be for DetermineEOLType() to go through the whole chunk in search for CR,LF and not just be happy with the first LF it finds.

    Obviously the better fix is to determine the line encoding for the file once for all (hopefully the first chunk should be enough to determine it) and then stay with this encoding for all remaining chunks.

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    Maybe you can confirm that the issue disappears (or you get other results) if you specify a buffer size with the -b option different than the default 32768 bytes.

    I guess this bug hasn't popped up much because people rarely deal with files larger than 32K on their NitrOS-9 systems.

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    It is also interesting to note that NativeToCoco() uses DetermineEOLType() to "sniff" the encoding of the file, while CoCoToNative() selects the target file encoding based on the platform it is compiled for. This is not mentioned in the ToolShed documentation.

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    You can test this patch. I will also make new snapshot builds for Windows once I have committed this.

     
  • Anonymous

    Anonymous - 2023-07-13

    I made some minor changes to the file, and the problem went away. Unfortunately I didn't save a copy that had the problem. But the issue did seem to start around the 64KB mark, so your suspicion as to the cause is probably correct.
    What is the largest buffer size that may be specified for "os9 copy"?
    My source code files have lots of comments in them, because internal documentation can't be misplaced like external documentation, so that makes them bigger than many other people's files.
    Dave W

     
  • Tormod Volden

    Tormod Volden - 2023-07-13

    Steps to reproduce and verify:

    printf "1234\r\n7890\r\n3456\r\n" > /tmp/test.crnl
    os9 format /tmp/os9disk.img
    os9 copy -b11 -l /tmp/test.crnl /tmp/os9disk.img,test
    os9 copy /tmp/os9disk.img,test /tmp/test.back
    sed -n l /tmp/test.back
    
     

    Last edit: Tormod Volden 2023-07-13
  • Tormod Volden

    Tormod Volden - 2023-07-13

    I don't there is any practical limit for the buffer size to worry about, the code deals with as an integer, so if you are running this on a 32-bit computer a 2 GB size could work. It will allocate at least one buffer of that size though, so enough RAM must be available for the process.

    -b2111000K seems to work fine here :)

     

    Last edit: Tormod Volden 2023-07-13
  • Anonymous

    Anonymous - 2023-07-13

    I created a test file that had CR,LF split at the 32KB boundary, and used the default 32KB buffer size, and the lines after that point became double-spaced with the v2.2 program, confirming your analysis of the program.
    I'm not currently set up to patch the ToolShed source code and run make, so for now will just specify a buffer size larger than the largest text file I expect to ever process with "os9 copy".
    Thanks for your prompt help!!

     
  • Tormod Volden

    Tormod Volden - 2023-07-16
     
  • Tormod Volden

    Tormod Volden - 2023-07-16

    BTW, I uploaded a new Windows snapshot at https://toolshed.sourceforge.net/snapshots/

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB