Menu

Tree [006ed5] master /
 History

HTTPS access


File Date Author Commit
 AMD64 2021-06-19 gatewood <> [cd1c62] Move .LSleef_rempitabdp to rempitabdp.s
 BOOK 2021-08-29 gatewood <> [13e773] Add qpdf to README, update NEWS, bump version t...
 CSCLASSICS 2021-10-18 gatewood [8f93a8] Add CSCLASSICS/COSINE.BAS demo program
 HAM 2021-11-13 gatewood [f190f9] Use only 1 million loop iterations in P367.BAS
 HAM_compile_output 2021-07-07 gatewood <> [12adbe] Fix error message when DEL character (ASCII 127...
 HAM_run_input 2020-01-26 gatewood <> [1f6445] Convert HAM test suite to new style based on GN...
 HAM_run_output 2021-06-29 gatewood <> [570f9e] Fix wide expected output for more tests
 ISAAC64 2015-05-11 gatewood <> [08cfaf] Added Bob Jenkins' original public domain C lan...
 NBS 2021-06-29 gatewood <> [40e0da] Fix benign typo in NBS test P008.BAS file and i...
 NBS_compile_output 2021-07-06 gatewood <> [d20252] Fixed error message when strange characters are...
 NBS_run_input 2014-04-04 gatewood <> [4ba8fa] Correct NBS test 203 input/output test files
 NBS_run_output 2021-07-03 gatewood <> [8f3d0c] Add suport for INWIDE=1 variable to Makefile.ru...
 SLEEF 2021-01-23 gatewood <> [39ea56] SLEEF/AVX -> SLEEF/SLEEF-3.4.1-AVX, SLEEF/SSE2 ...
 benchmark 2016-12-30 gatewood <> [444383] add benchmark stuff
 dgay 2015-05-22 gatewood <> [208ef8] add back lost patch
 tests 2021-05-09 gatewood <> [a4b304] Add supplementary tests/MRTIME.BAS example
 unit_tests_run_output 2023-12-18 gatewood [3c3e01] Rewrite self-test code so output lines don't ge...
 vDSO 2020-06-04 gatewood <> [33e1e4] Properly use vDSO to access gettimeofday() for ...
 .gitattributes 2020-03-04 gatewood <> [5bdeb4] add .gitattributes
 BASICC 2021-06-19 gatewood <> [90d9af] Update BASICC, BASICCS, & BASICCW to hunt for as
 BASICC.1 2021-06-01 gatewood <> [5185de] Remove AVX support
 BASICC.clang 2021-06-19 gatewood <> [4980e6] Add BASICC.clang, BASICCS.clang, & BASICCW.clang
 BASICCS 2021-06-19 gatewood <> [90d9af] Update BASICC, BASICCS, & BASICCW to hunt for as
 BASICCS.1 2021-06-01 gatewood <> [5185de] Remove AVX support
 BASICCS.clang 2021-06-19 gatewood <> [4980e6] Add BASICC.clang, BASICCS.clang, & BASICCW.clang
 BASICCW 2021-06-19 gatewood <> [90d9af] Update BASICC, BASICCS, & BASICCW to hunt for as
 BASICCW.1 2021-06-01 gatewood <> [5185de] Remove AVX support
 BASICCW.clang 2021-06-19 gatewood <> [4980e6] Add BASICC.clang, BASICCS.clang, & BASICCW.clang
 BOOST_LICENSE-1.0.TXT 2020-05-12 gatewood <> [c5a7be] Replace SSE2 versions of SIN, COS, and TAN with...
 CC0-1.0-Universal 2020-06-04 gatewood <> [33e1e4] Properly use vDSO to access gettimeofday() for ...
 COPYING 2014-07-15 gatewood <> [8b0f0f] convert tabs to spaces
 ChangeLog 2023-12-29 gatewood [006ed5] Cleanups in error handling in parser2.c file
 ECMA-116-NUMERIC-FUNCTIONS.TXT 2021-03-02 gatewood <> [364f50] Fix spelling error
 ECMA-55.TXT 2021-07-13 gatewood <> [0eed43] Apply patch from Doug Kearns for a typo on ECMA...
 ECMA55-slideshow.odp 2017-11-19 gatewood <> [05aff5] Update ECMA55-Slideshow documents
 ECMA55-slideshow.pdf 2017-11-19 gatewood <> [05aff5] Update ECMA55-Slideshow documents
 GNU_FDL 2016-06-09 gatewood <> [3df88b] Documentation uses the GNU FDL Version 1.3
 INSTALL 2021-07-03 gatewood <> [96ce7d] Document in INSTALL how to create and test 132 ...
 INTEL_CET.TXT 2021-07-02 gatewood <> [ae849e] Reflow paragraphs for 80 columns and make some ...
 LUCENT_LICENSE.TXT 2020-05-14 gatewood <> [54dcac] Make -l/-L show more license information
 Makefile.clang 2023-12-18 gatewood [1c8277] Update Makefile.clang to match recent Makefile....
 Makefile.gcc 2023-12-21 gatewood [521157] Disable analyzer for sha256, since heavy-duty m...
 Makefile.runtests 2023-12-18 gatewood [ddec13] Simplify linking in of assembly files, removing...
 Makefile.runtests2 2023-12-18 gatewood [ddec13] Simplify linking in of assembly files, removing...
 Makefile.tcc 2023-12-24 gatewood [7e1d52] Update Makefile.tcc to pass LINKER argument to ...
 NEWS 2023-12-24 gatewood [cc550c] Update NEWS and README for recent changes
 PUFF_LICENSE.TXT 2021-11-22 gatewood [7f6d12] Added Mark Adler's puff.[ch] from zlib-1.2.11 c...
 README 2023-12-24 gatewood [cc550c] Update NEWS and README for recent changes
 README.clang 2021-05-01 gatewood <> [91050c] Update README.clang to note that versions >= 12...
 README.pcc 2021-07-02 gatewood <> [a0e0d3] Clarify the problems with pcc a bit more in the...
 TESTING 2021-07-02 gatewood <> [c00c0c] Add TESTING file to explain how to run the self...
 THANKS 2021-07-13 gatewood <> [0eed43] Apply patch from Doug Kearns for a typo on ECMA...
 TODO 2021-07-05 gatewood <> [253dd7] Update TODO list
 asmgen.c 2023-12-18 gatewood [57de54] Be more careful about flushing output in asmgen...
 asmgen.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 ast.c 2021-06-01 gatewood <> [5185de] Remove AVX support
 ast.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 codegen.c 2023-12-18 gatewood [ddec13] Simplify linking in of assembly files, removing...
 codegen.h 2021-06-13 gatewood <> [b65409] Make more code generation function names begin ...
 computers-03-00069.pdf 2015-04-01 gatewood <> [ada319] Add a copy of the MDPI Computers paper I wrote ...
 dag.c 2021-11-10 gatewood [dc5487] Switch from malloc() to xmalloc()
 dag.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 datum.dot 2016-08-02 gatewood <> [375e0d] Update datum.dot to reflect current reality whe...
 dtoa5_normal.c 2021-04-21 gatewood <> [d95a98] Fix bit-shift undefined behavior
 dtoa5_normal.h 2014-06-30 gatewood <> [346b90] add missing header guard macros
 dumpregs.s 2020-05-11 gatewood <> [4d4400] Fix wrong comments on EFLAGS processing for dum...
 dumpstack.s 2020-05-11 gatewood <> [f5fdea] Add public domain stack dumper
 ecma55.1 2021-11-23 gatewood [7cd242] Add license information for puff.[ch] to ecma55...
 error_messages.c 2021-10-23 gatewood [7bb5b2] Convert FLOAT_BUFFER_LEN to named constant
 error_messages.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 g_fmt_BASIC.s 2016-12-31 gatewood <> [49bb8d] add missing attribution
 g_fmt_BASIC_normal.c 2020-04-21 gatewood <> [a0c90f] Fix warning from clang about implicit conversion
 g_fmt_BASIC_normal.h 2014-06-30 gatewood <> [346b90] add missing header guard macros
 globals.c 2023-05-06 gatewood [23953c] Silence warnings from gcc-13.1.0 analyzer
 globals.h 2021-11-08 gatewood [820ba4] Update compiler to use new and improved hexdump2
 grammar.txt 2020-11-12 gatewood <> [c7c691] Improve discussion about unary minus problems
 hexdump2.1 2021-11-03 gatewood [2134e4] Major fixes for hexdump2/hexdump2mm
 hexdump2.c 2023-12-24 gatewood [d2a21c] Fix memory allocation error in hexdump2 found w...
 hexdump2.h 2021-11-08 gatewood [820ba4] Update compiler to use new and improved hexdump2
 load_textdata.c 2023-12-08 gatewood [131148] Fix the load_textdata WRITEWITHNEWLINE by addin...
 load_textdata.h 2023-12-18 gatewood [ddec13] Simplify linking in of assembly files, removing...
 main.c 2023-12-09 gatewood [d8eacb] Fix leaks in parser2.c when FATAL() is called, ...
 mathnotes.txt 2021-06-01 gatewood <> [5185de] Remove AVX support
 optimizer.c 2023-11-20 gatewood [64d0ab] Add tree_postorder_rw() for updating AST in-place
 optimizer.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 parseinput.c 2021-06-12 gatewood <> [c61ef7] remove dead code
 parseinput.txt 2020-01-13 gatewood <> [d46a57] Remove trailing whitespace
 parser2.c 2023-12-29 gatewood [006ed5] Cleanups in error handling in parser2.c file
 parser2.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 peephole.c 2023-12-24 gatewood [c2e037] Fix another "leak of file descriptor" error in ...
 peephole.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 puff.c 2023-05-11 gatewood [5dc406] fix typos in comments in puff.c
 puff.h 2021-11-22 gatewood [7f6d12] Added Mark Adler's puff.[ch] from zlib-1.2.11 c...
 raw_registers.c 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 raw_registers.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 robert1.c 2021-04-21 gatewood <> [cabb7c] Fix bit-shift undefined behavior
 scanner3.c 2023-12-24 gatewood [e05334] A 'break;' after ICE is silly, since ICE is nor...
 scanner3.h 2021-05-23 gatewood <> [4b9f1d] Implement DATE$ and TIME$ string functions
 semantic_checks.c 2023-12-13 gatewood [1a1bb9] Shut the -fanalyzer up about null destination p...
 semantic_checks.h 2021-01-19 gatewood <> [56aa70] Update copyright year in files
 sha256.1 2021-11-13 gatewood [8824d9] Add simple manpage for sha256 utility
 sha256.c 2023-12-16 gatewood [39a4ec] Avoid dynamic memory when not needed (in self-t...
 sha256.h 2021-10-31 gatewood [3960f9] Add sha256 utility It generates the same output...
 structure.dot 2020-01-13 gatewood <> [d46a57] Remove trailing whitespace
 symbol_table.c 2023-12-16 gatewood [39a4ec] Avoid dynamic memory when not needed (in self-t...
 symbol_table.h 2021-05-08 gatewood <> [215790] Support PI and MAXNUM in DATA statements when e...
 textdata.s.in 2023-12-18 gatewood [ddec13] Simplify linking in of assembly files, removing...
 tree.c 2023-11-20 gatewood [64d0ab] Add tree_postorder_rw() for updating AST in-place
 tree.h 2023-11-20 gatewood [64d0ab] Add tree_postorder_rw() for updating AST in-place
 zonermore.c 2021-11-07 gatewood [a2ea54] Fix a warning about declaration after statement
 zonermore.txt 2014-04-04 gatewood <> [103458] documentation update

Read Me

[The text in this file will only look correct if you use a fixed-width font]

This software is a compiler for 'Minimal BASIC' as specified by the ECMA-55
standard.  The target is AMD64/EM64T/x86-64 machines running a modern Linux
distribution.  This compiler will create assembly language output files.
These must be assembled into object files and linked to create an executable.
The assembly dialect used is that of GNU gas, since that will be available on
any modern general purpose x86-64 Linux distribution.  No libc or libm is used
by the generated code, which allows creating very small executables.  To keep
the generated code small and simple, output of SIN, COS, TAN, ATAN, EXP, POW,
LOG, RND, and RANDOMIZE is only emitted if those features are required by the
input BASIC program.

After completing this project, I did find one other FOSS compiler that claims
to be able to handle much of ANSI Full BASIC, the BASIC Accelerator at
http://hp.vector.co.jp/authors/VA008683/english/BASICAcc.htm, but the output is
Object Pascal for the FreePascal compiler at http://www.freepascal.org/, and
not assembly.  Also, they implement only what ECMA-116 calls OPTION ARITHMETIC
NATIVE mode, which is essentially the same mode implemented in this compiler.
The same developers have created an interpreter called Decimal BASIC at
http://hp.vector.co.jp/authors/VA008683/english/ that does attempt to support
the required decimal arithmetic.  Strangely, these projects did not turn up in
normal web searches, but only when I searched for "BASIC-1 OPTION ARITHMETIC
DECIMAL".

In 2015 I learned of Jorge Giner Cordero's excellent bas55 interpreter for
ECMA-55 Minimal BASIC which you can download at this URL:

  https://jorgicor.niobe.org/bas55

Note that bas55, like most vintage BASIC interpreters, initializes numeric
variable values to zero and does not detect uses of uninitialized variables by
default in batch mode.  However, the --debug switch enables detection which
results in warnings for uninitialized variable values.  When bas55 is run in
interactive mode, the --debug switch is enabled by default.  The ECMA-55
standard states that programs intended to be portable _should_ explicitly
initialize all variables before use, and the ecma55 compiler _requires_ this,
and treats such accesses as fatal errors.

The license for the groff format manual pages and the included book
"An Introduction to Programming with ECMA-55 Minimal BASIC" is the GNU Free
Documentation License version 1.3 only.  See the included GNU_FDL for details.

The author of the book, the groff format manual pages, and the actual compiler
software is John Gatewood Ham.  The source code for the compiler itself is
available under the GNU General Public License version 2 only.  See the
included file COPYING for details.

The following NBS tests were kindly supplied by Emmanuel Roche:
56, 57, 65, 66, 67, 68, 69, 109, 117, 118, 119, 120, 121, 122, 123, and 124.
The rest came from the Google Books PDF files available on the Internet.

Fixes for the following NBS tests were kindly supplied by Jorge Giner Cordero:
12, 14, 25, 39, 43, 74, 108, 115, 128, 185, 191, and 206.

The included runtime library assembly routines for SIN, COS, TAN, ATAN, LOG,
EXP, and POW are from SLEEF-3.4.1 (tweaked), from Naoki Shibata.
https://github.com/shibatch/sleef, and are covered by the Boost Software
License Version 1.0.  This is a FOSS license available for download at
http://www.boost.org/LICENSE_1_0.txt, which is included with this software
and called BOOST_LICENSE-1.0.TXT.

The included runtime library assembly routines for RND, and RANDOMIZE are
modified versions of public domain code from ISAAC-64 from Bob Jenkins.
http://burtleburtle.net/bob/rand/isaacafa.html

The included runtime library assembly routines for floating point input and
output are derived from David M. Gay's dtoa.c and g_fmt.c, which are free
to use but not public domain.  See the comments in those source files for
details.
http://netlib.sandia.gov/fp/index.html

The included runtime library assembly routines for accessing the Linux
vDSO come from the Linux kernel and are written by Andrew Lutomirski.
That code uses the Creative Commons Zero license for the reference vDSO
parser and the GNU GPL v2.0 only for the stack walking and pointer setup
code run at program startup that uses that reference parser.

The included runtime library assembly routines for accessing the timezone
database to generate correct local date and time values come from David
Olson's tzcode2020f (now maintained by Paul Eggert).  The code used (generated
from localtime.c and some definitions from headers) is in the public domain.

The puff.[ch] deflate code was written by Mark Adler and is from the zlib
contrib directory from zlib-1.2.11.  I altered it by adding some type casts to
silence some warnings.  It uses a custom license that requires attribution, but
the code is free to use for any purpose, and is copyrighted by Mark Adler.

I wrote a special file dumpregs.s for dumping CPU registers while debugging,
and unlike the main compiler, this one file is public domain.  The compiler
does not use it, but I used it when debugging programs and include it for
other people who might work at the assembly-language level.

The ECMA-55 standard was chosen over the "ANSI X3.60-1978 minimal BASIC"
standard since it is free.  ANSI, despite canceling the standard, still
keeps the ancient standard locked down and available only if you pay
for it, which is a quite mean-spirited attempt to prevent any compliant
free and open source implementations from being written.  The same attitude
exists with ISO for the "ISO 6373:1984 Data processing -- Programming
languages -- Minimal BASIC" standard.  This standard has many other names,
such as "AS 2797-1985 Programming language - Minimal BASIC", and the only
free one is ECMA-55, since all the other standards bodies are trying to
kill BASIC forever.

http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/

Files in this distribution.

BOOK/LICENSE

   This is the complete text of the GNU Free Documentation License Version
   1.3, which is used for the book.  It is identical to the GNU_FDL file,
   but is bundled with the book for the case when people use the book
   independently of the rest of the compiler distribution.

BOOK/Makefile

   This is the project build file for creating the book.

BOOK/duplex

   This file is used to enable duplex printing of the book.

BOOK/Learn_BASIC.tex

   This contains the LaTeX source code of the included book "An Introduction
   to Programming with ECMA-55 Minimal BASIC".  This file is documentation and
   is licensed under the GNU Free Documentation License Version 1.3 only.

BOOK/Learn_BASIC.pdf

   This contains the included book "An Introduction to Programming with
   ECMA-55 Minimal BASIC".  This file is documentation and is licensed under
   the GNU Free Documentation License Version 1.3 only.

GNU_FDL

   This is the complete text of the GNU Free Documentation License Version
   1.3, which is used for the groff format manual pages and the included
   book.

COPYING

   This contains a copy of the GNU GPL version 2 license for the compiler
   itself.

ChangeLog

   This contains a high-level overview of changes sorted by time in
   ascending date order with the newest changes at the end of the file.

globals.[ch]

   This contains global variables that must be shared across all modules.

scanner3.[ch]

   This is the new scanner that converts the input byte stream into
   tokens for the parser.  This uses a hand-coded switch-based scanner.

parser2.[ch]

   This contains the parser that uses the token stream created by the
   scanner and generates an AST used by semantic_checks and asmgen
   modules.

symbol_table.[ch]

   This contains the symbol table module.

asmgen.[ch]

   This contains the code that walks the AST the parser creates and
   generates the assembly using the low-level routines available in
   the codegen module.  It also uses the raw_registers, optimizer, and
   symbol_table modules.

semantic_checks.[ch]

   This contains the code that walks the AST the parser creates and
   performs semantic checks, symbol table population, and jump target
   checking with help from the symbol_table modules.

codegen.[ch]

   This contains low-level routines that emit the GAS assembler output.  It
   also contains some of the runtime functions and macros.  The runtime
   library code in this file is GPLv2.

main.c

   This contains the main routine that calls everything else.  It does
   the command-line argument processing, loads the input file into a
   buffer, calls the scanner to convert that into a token stream, then
   calls the parser to process the token stream.

g_fmt_BASIC.s

   This contains the assembly code for my tweaked version of
   David M. Gay's g_fmt.c file.  The process to generate this is in
   the magic.txt file in the dgay sub-directory.  This is used as part
   of a compiled BASIC program's runtime.  A tweaked copy of this is
   included in the codegen.c file.  The runtime library code in this file is
   Copyright (C) by Lucent Technologies, but is free to use since it includes
   the copyright notice.

dtoa5_normal.[ch]

   This contains the C code for my tweaked version of David M. Gay's
   dtoa.c file.  This is used by the compiler to ensure it formats
   numbers in exactly the same format as the runtime.  The runtime library
   code in this file is Copyright (C) by Lucent Technologies, but is free to
   use since it includes the copyright notice.  clang versions less than
   12.0 won't build this correctly if PIE=1 with large model.

g_fmt_BASIC_normal.[ch]

   This contains the C code for my tweaked version of David M. Gay's
   g_fmt.c file.  This is used by the compiler to ensure it formats
   numbers in exactly the same format as the runtime.  The runtime library
   code in this file is Copyright (C) by Lucent Technologies, but is free to
   use since it includes the copyright notice.  clang versions less than
   12.0 won't build this correctly if PIE=1 with large model.

textdata.s
AMD64/*.s

   These files contain assembly language for for various runtime features
   and get included in the generated assembly code when needed.  The files
   in AMD64 get included by the textdata.s file.  The files will get
   directly linked into the ecma55 executable, so they do not need to be
   present for ecma55 to work, they only need to be present when you build
   ecma55.

tree.[ch]

   This contains the base n-ary tree code.  These nodes are used
   to create the AST which is as intermediate representation created
   by the parser.

raw_registers.[ch]

   This contains the register management code.

dag.[ch]

   This contains the code to convert an AST into a DAG for arithmetic
   expression evaluation.

optimizer.[ch]

   This contains the AST optimizer code for optimizing arithmetic
   expressions.  Currently, the only supported optimization is constant
   folding.

ast.[ch]

   This contains the code for pretty-printing which traverses the AST that
   is generated during a parse and regenerates a semantically equivalent
   program.  It is used to support the ecma55 compiler's -P and -R options.

Makefile.gcc

   This is the project build file for use by the make program for gcc.

Makefile.clang

   This is the project build file for use by the make program for clang.

Makefile.tcc

   This is the project build file for use by the make program for tcc.

Makefile.runtests

   This is the parallel test running harness for the NBS Minimal BASIC
   test suite.

Makefile.runtests2

   This is the parallel test running harness for the HAM Minimal BASIC
   test suite.

grammar.txt

   This contains a copy of the Minimal BASIC grammar.

ecma55.1

   This is the man page for the compiler.  This file is documentation and
   is licensed under the GNU Free Documentation License Version 1.3 only.

BASICC.1

   This is the man page for the BASICC script.  This file is documentation and
   is licensed under the GNU Free Documentation License Version 1.3 only.

BASICCS.1

   This is the man page for the BASICCS script.  This file is documentation and
   is licensed under the GNU Free Documentation License Version 1.3 only.

BASICCW.1

   This is the man page for the BASICCW script.  This file is documentation and
   is licensed under the GNU Free Documentation License Version 1.3 only.

ECMA-55.TXT

   This file contains the text of the ECMA-55 standard for
   "Minimal BASIC".  This was retyped by me from the PDF version
   both to get a smaller file and to allow easy searching.

BASICC

   This file is a script that will compile, assemble, and link
   an input program.  Note that the input program must have the
   extension '.BAS' for this script to work.

BASICCS

   This file is a script that will compile, assemble, and link
   an input program.  Note that the input program must have the
   extension '.BAS' for this script to work.  This version tells
   the compiler to generate 32bit math for the arithmetic
   expressions, generating output more closely matching the NBS
   Minimal BASIC test suite expectations.

BASICCW

   This file is a script that will compile, assemble, and link
   an input program.  Note that the input program must have the
   extension '.BAS' for this script to work.  This version tells
   the compiler to use 132 column output instead of 80 column
   output.  The floating point numbers will be displayed with
   up to 15 digits with up to 3 digit exponents.  The output in
   this case does not match the NBS standard's examples.  However,
   the output does show the floating point values with the greater
   precision that 64bit floating math supports.

dumpregs.s

   This is an assembler source file you can build and link in to an
   executable.  It contains 'dumpregs', a procedure that takes no
   arguments and returns no values but does dump the registers used
   for normal programming for this project, including the xmm registers,
   eflags, and mxcsr flags.  It does not dump the FP registers or state
   since this project uses SIMD exclusively for floating point math.
   Unlike the main compiler, this file I wrote is in the public domain.
   I hope anybody who needs to code in assembler in 64bit on AMD64/EM64T
   in Linux will file it useful.

datum.dot

   This is the graphviz dot source file for the diagram of the finite
   state machine used by the INPUT runtime subsystem.

parseinput.c

   This is the C source code for the INPUT runtime subsystem. Compile
   with -DTROUBLE to get a trace of the states as the transitions occur.

zonermore.c

   This is the C source code for the PRINT runtime subsystem.

robert1.c

   This is the C source code for the RND function and RANDOMIZE
   statements.  Unlike the compiler itself, this file is in the public
   domain and is derived from Bob Jenkin's ISAAC-64 .
   http://burtleburtle.net/bob/rand/isaacafa.html

peephole.c

   This contains the very simple peephole optimizer code.  It reads
   the assembly language file generated by the compiler and generates
   a new assembly language file.  It removes any superfluous
   'pushsaddr'/'popsaddr %rdi' sequences.

ECMA55-slideshow.odp

   This is a slideshow generated with LibreOffice.  It gives a good
   overview of this compiler project, including the motivation, the
   overall structure, and suggestions for future work.

ECMA55-slideshow.pdf

   PDF/1A version of ECMA55-slideshow.odp.  This includes the fonts
   and should display identically on all machines with a graphical
   PDF file viewer.

mathnotes.txt

   Notes about possible future work regarding the math code in the
   compiler.

parseinput.txt

   This file contains the instructions used to create the original
   parseinput.s file from the parseinput.c file.  The parseinput.s
   file is then updated as explained to create the final version of
   the routines which were included in the codegen.c file.

zonermore.txt

   This file contains the magic incantation used to create the original
   zonermore.s file from the zonermore.c file.  The zonermore.s was
   then edited to produce the final version of the routines which were
   included in the codegen.c file.

dgay/magic.txt

   This file contains the instructions for generating the assembly language
   versions of David M. Gay's code used in the codegen.c file.

GETTING THE CODE

The source code was created and is maintained on a Linux system, and uses
an ASCII encoding and UNIX line endings (0x0A).

If you are reading this you should have a copy of the code from a snapshot or
release tar.xz file.  Between snapshots some changes may exist only in the
upstream git repository.  If you want to start with the absolutely latest
version from the upstream git repository, you need to do a clone
operation like this:

   git clone https://git.code.sf.net/p/buraphakit/MB_git MinimalBASIC

This creates the 'MinimalBASIC' subdirectory which has the code and a local
copy of the upstream repository.  After the initial clone, assuming you didn't
modify anything, you can easily use git to stay up-to-date with a 'make
-fMakefile.gcc distclean', followed by a 'git pull',
followed by a 'make canrelease'.  The 'make canrelease' for LLVM/clang 12
takes about 15 minutes with -j24 on an AMD(R) Ryzen(TM) 9 3900X @ 3.8Ghz with
64GB RAM and a Samsung 870 EVO 1TB SATA SSD.  You cannot push changes upstream
directly with git.  If you want to contribute a fix or improvement, please
generate a patch with 'git format-patch' and submit it on the SourceForge site
(Support->Patches).

The SourceForge site for this project has this URL:

   http://sourceforge.net/projects/buraphakit/

Information on obtaining and using the git version control software
is available from the git web site which has this URL:

   https://www.git-scm.org/

REPORTING BUGS

If you found a bug but do not know how to fix it, please submit a bug report on
the SourceForge site (Support->Bugs).  If you know what assembly should be
generated but do not known how to modify the compiler to make that happen,
please include the .BAS program file and the assembly that should have been
generated in the bug report.  If you just found a problem but have no idea how
to fix it, please include the .BAS program file in the bug report, and explain
what you think should have happened, and what actually happened instead.

EXAMPLE SESSION:

1.  Create source

    vi WHATEVER.BAS

2.  Compile it

    ./BASICC WHATEVER.BAS

3.  Run it

    ./WHATEVER

You can optionally strip the WHATEVER executable with the 'strip' command and
it will still work, and it will probably be (slightly) smaller.

NOTES ON BUILDING THE COMPILER:

To perform an overall check of the compiler using gcc to build it, you should
do something like this:

make -Otarget -fMakefile.gcc distclean
make -Otarget -j -l12 -fMakefile.gcc canrelease PIE=1 LTO=1 2>&1 | tee log.gcc

That example is tuned for a 12 core Zen 2 machine with an x86_64 AMD Ryzen 9
3900X 12-Core processor with hyper-threading disabled in the BIOS.  The -l
should be the number of cores, and the -j should not have a number.  On an
Intel i7-4790 4-Core processor with hyper-threading disabled in the BIOS, I use
-j -l4, for instance.  You can also use Makefile.clang or Makefile.tcc which
use the clang and tcc compilers respectively.

Should you want to do a leak check after modifying the code, here is
what you would do:

   make -fMakefile.gcc distclean all COMPILE_MODE=DEBUG2 PIE=0 LTO=0
   valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all \
            --redzone-size=128 --read-var-info=yes --leak-resolution=high \
            --track-origins=yes --malloc-fill=FF --free-fill=AA \
            --num-callers=40 ./ecma55 -v BAD.BAS

Some older clang compiler versions generate code that results in valgrind
giving a horrible tombstone at the start talking about a DIE it cannot parse,
so use gcc if that happens to you.

At this time (December 2023), current tcc does not produce code that
valgrind 3.22.0 can understand completely, and valgrind always says there
are leaks.  I do not know whether this is a problem with tcc or valgrind yet.

If you want to check for memory problems, you should use the address sanitizer
which is supported by clang and gcc.  You need to rebuild like this for gcc:

   make -fMakefile.gcc distclean
   make -fMakefile.gcc all COMPILE_MODE=DEBUG PIE=0 LTO=0

That is an unoptimized build.  You can do this if you want optimization AND the
address sanitizer for clang:

   make -fMakefile.clang distclean
   make -fMakefile.clang all COMPILE_MODE=ASAN PIE=0 LTO=0

Some bugs only show up with ASAN, but debugging is harder with the optimized
code when you are in gdb.  If you are having trouble, I suggest trying DEBUG
first, and if that isn't enough then try ASAN.

If you are using some older version of clang, then to test the compiler on some
file BOGUS.BAS after you compiled with the sanitizer support, you would do this
(this style of setting environment variables will work in any reasonably
current version of bash shell):

   ASAN_SYMBOLIZER_PATH=/path/to/llvm-symbolizer \
   ASAN_OPTIONS="symbolize=1,detect_odr_violation=0" \
   UBSAN_OPTIONS=print_stacktrace=1 \
   ./ecma55 -v BOGUS.BAS

You will need to adjust the symbolizer path to match your system and clang
compiler version.

Any gcc older than 6.2 should really not be used.  Versions 6.2 and later have
good address sanitizer support and no environment variable settings are
required; it "just works".

LLVM/clang < 8.0.0 is not supported.

To build a production version with gcc, just do this (adjust the 4 to match
the number of cores in your CPU):

   make -Otarget -fMakefile.gcc distclean
   make -Otarget -j -l4 -fMakefile.gcc all PIE=1 LTO=1

Alternatively, to build a production version with clang, just do:

   make -Otarget -fMakefile.clang distclean
   make -Otarget -j -l4 -fMakefile.clang all PIE=1 LTO=1

The tcc compiler never has any actual releases.  They don't have snapshots
either.  If you want to use tcc, build tcc from git.  I have tested with
revision 48798969c558975a78f6441c2f287483436e12d9 successfully.  You will need
to use git, fetch that revision, and build tcc yourself and then install it.
Once you are sure tcc works, you can try using tcc with this project like this:

   make -Otarget -fMakefile.tcc distclean
   make -Otarget -j -l4 -fMakefile.tcc ecma55

Note that while tcc does not document the fact in any place I could find,
according to Michael Matz on the tinycc-devel mailing list, tcc does not
support any code model except small.

The tcc compiler does not support the address sanitizer features, but it is
good for ensuring that you haven't used any horribly non-portable features in
the C code for the compiler itself.

If you have clang, you can perform a static analysis of the C code with the
clang static analyzer like this:

   make -Otarget -fMakefile.clang distclean
   scan-build make -fMakefile.clang ecma55 PIE=0 LTO=0

Do not specify a COMPILE_MODE, and remember that scan-build only works with
clang.  If problems are found, the analyzer program output tells you how to
read the test results.

Another option is to use cppcheck, but you need at least version 1.81, and
on some systems that means you must build it yourself and install it in
/usr/local.  Once you have a new enough version of cppcheck, you can run the
cppcheck static analyzer like this:

   make -Otarget -fMakefile.gcc distclean
   cppcheck --force *.[ch]

The COMPILE_MODE= switch can be any of these:

  ASAN      optimized build with address and undefined behavior sanitizers
  DEBUG     unoptimized build with address and undefined behavior sanitizers
            for gcc/llvm, bounds checking for tcc
  DEBUG2    unoptimized build without sanitizers for use with gdb and valgrind
  DEBUG3    unoptimized build with custom memory interceptor for tracing every
            single allocate/deallocate - this makes the programs run extremely
            slowly, but can be helpful when you just cannot track down a
            memory problem.  In addition to detecting memory leaks, it will
            also hexdump every single byte of leaked memory if the program
            terminates successfully.

If you do not set COMPILE_MODE, you get an optimized build with no debugging
information and no sanitizers.

The distclean target on the Makefiles removes all generated files.  If you
change compilers or the COMPILE_MODE, you need to rebuild everything, and
distclean makes that easy.  There is no default compiler so you need to
specify what Makefile you want with the -f flag.  For most people, gcc
is the reasonable choice and you just do this:

make -Otarrget -j -l4 -fMakefile.gcc all

SOFTWARE USED FOR BUILDING AND TESTING

On Ubuntu-21.04 on a 64 bit x86-64 Linux system:

GNU sed 4.7, Linux kernel 5.11.0-22, git 2.30.2, ghostscript 9.53.3,
GNU binutils-2.36.1, cppcheck-2.3, gcc 10.3.0, bash-5.1.4, clang 12.0.0,
grep 3.6, qpdf-10.3.1, gzip 1.10, mandoc 1.14.5, and texlive 2020.

On a custom from-scratch 64 bit x86-64 Linux system:

GNU sed 4.9, Linux kernel 6.6.7 (vanilla+ipset-7.19), git 2.43.0, GNU
binutils-2.41, ghostscript 10.02.1, cppcheck-2.12.1, gcc-13.2.0, bash 5.2.21,
clang 17.0.6, grep 3.11, qpdf-11.6.3, gzip 1.13, mandoc 1.14.6, and texlive
2023.

OTHER TOOLS USED

glibc 2.38, GNU make 4.4.1, groff 1.23.0, valgrind-3.22.0,
and patched tcc from git 48798969c558975a78f6441c2f287483436e12d9.

NOTES

The gcc-11.2.0 compiler's static analysis reports a memory leak in optimizer.c
that, while genuine, is on a fatal error path.  In the case of a fatal error,
the compiler does not attempt to free all dynamically allocated memory but
instead just aborts, _by design_.  The analyzer doesn't stop for noreturn
functions either, which is arguably an imperfection in the static analyzer.
It is safe to ignore these reported errors:

optimizer.c:442:5: warning: leak of 'xstrdup ("optimizer.c", &__func__, 441, &tbuf)' [CWE-401]
optimizer.c:389:5: warning: leak of 'xstrdup ("optimizer.c", &__func__, 388, &tbuf)' [CWE-401]
optimizer.c:273:5: warning: leak of 'xstrdup ("optimizer.c", &__func__, 272, &tbuf)' [CWE-401]
optimizer.c:131:5: warning: leak of 'xstrdup ("optimizer.c", &__func__, 130, &tbuf)' [CWE-401]

BUILD TWEAKS

When using gcc or clang, to get a PIE (position independent executable) program, use
PIE=1 on your command line.  To use LTO (link time optimization), use LTO=1 on
your command line.  The linker used for gcc is now gold by default, and the linker
for clang is lld.  The tcc compiler does not support PIE or LTO.  Some
examples will help:

make -Otarget -fMakefile.gcc PIE=1 LTO=1

That will use gcc, build a PIE  executable, and will use link time
optimization.

make -Otarget -fMakefile.clang PIE=0 LTO=1

That will use clang, build a normal executable, and will use link time
optimization.  WARNING: The clang compiler generates bad PIE code for versions
before 12.0.

The default values for PIE and LTO depend on how the C compilers were built.
If it was configured to generate PIE by default, PIE will be 1 by default,
otherwise it will be zero.  LTO always defaults to zero for both gcc and clang.

The mold linker works with gcc, as long as you have mold version 1.11.0 or newer.
However, it works better with mold version 2.3.0 or newer.

IMPLEMENTATION-DEFINED FEATURES

  ACCURACY is about 15 digits of precision
    IEEE754 double, as implemented by Intel/AMD CPUs.
    With -s switch, about 7 digits of precision
      IEEE754 single, as implemented by Intel/AMD CPUs.
  END OF LINE = ASCII value 10
  SIGNIFICANCE-WIDTH = 7
    With -w switch, 18
  EXRAD-WIDTH = 2
    With -w switch, 3
  INITIAL VALUE OF VARIABLES
    numeric variables are initialized to SNaN (signaling Not-A-Number) and will
      force an exception if they are read before they are written.
    string variables are initialized to an ASCII 21 byte, followed by
      "uninitialized", and then 4 ASCII 0 bytes.  The 21 will force an
      exception if they are read before they are written.
  INPUT-PROMPT = "? "
  LONGEST STRING THAT CAN BE RETAINED = 18
  VALUE OF MACHINE INFINITESIMAL = 2E-1074 (denormal), 2E-1022 (normal)
    With -s switch,
      2E-149 (denormal), 2E-126 (normal)
  VALUE OF MACHINE INFINITY = +/- Infinity (Intel CPU has special values for
    this)
  MARGIN = 80
    with -w switch, 132
  INPUT_WIDTH = 72
    This can be changed with "make -fMakefile.gcc distclean all
      CPPFLAGS='-DINPUT_WIDTH=256'" where MAXCOLUMN=INPUT_WIDTH+1, but this
      breaks NBS test #202 and makes the resulting compiler not strictly
      ECMA-55 compliant.
  PRECISION is 15 digits of precision
    IEEE754 double, as implemented by Intel/AMD CPUs.
    With -s switch, about 7 digits of precision
      IEEE754 single, as implemented by Intel/AMD CPUs.
  PRINT ZONE WIDTH = 15
    With -w switch, 26
  PSEUDO-RANDOM NUMBER SEQUENCE is from ISAAC-64, see robert1.c for details.
  BATCH MODE INPUT uses standard UNIX redirection of STDIN
  OUTPUT WIDTH = 80 columns
    With -w switch, 132 columns
  MAXIMUM ARRAY SUBSCRIPT VALUE = 10000000

  NOTES:
    1) The implementation-defined numeric functions use doubles, not singles,
       in their internal representation, even with the -s switch.
    2) OUTPUT WIDTH is sometimes called 'margin' in the ECMA-55 standard.

DOCUMENTED BEHAVIOR

1.  Attempts to use the value of uninitialized variables will result in
    a fatal exception 'READ OF UNINITIALIZED VARIABLE'.

DOCUMENTED EXTENSIONS ACTIVATED WITH -X OPTION

1.  Lower-case letters, backslash, and the characters in "[]{}|@\~`" are
    permitted within a quoted string.
2.  Lower-case letters, backslash, and the characters in "[]{}|@\~`" are
    permitted in a REM statement after the REM keyword.
3.  Support for AND, OR, and NOT in conditional expressions.
4.  Support for EXIT FOR statement.
5.  If both -X and -O3 are specified, DAG optimization is used on
    expressions which might alter numerical results slightly because some
    redundant subexpressions are only evaluated once, but it should provide
    better runtime performance and work well in most cases.  It is protected
    by the -X switch and the default behavior of the compiler remains
    conservative.
6.  LEN() function which takes a string variable or string literal value and
    returns the number of ASCII characters as an integer (but stored in the
    usual floating point format used for all numeric values) is supported.
7.  String comparison is extended to support { '<', '<=', '>', '>=' }.
8.  ACOS(), ASIN(), CEIL(), DEG(), FP(), IP(), LOG2(), LOG10(), MAX(), MIN(),
    MOD(), PI, RAD(), MAXNUM, REMAINDER(), COSH(), SINH(), TANH(), SEC(),
    CSC(), COT(), ROUND(), TRUNCATE(), and ANGLE() functions from the ECMA-116
    Full BASIC standard are supported.

TEST MACHINE INFORMATION

Original development of the 1.X versions was done with Fedora 20 64bit on an
Intel(R) Core(TM) 2 Duo E4700 @ 2.6Ghz machine.  Most modern testing is done
on a Linux-from-scratch descended 64bit machine with an AMD(R) Ryzen(TM) 9 3900X
CPU @ 3.8Ghz.  Occasional testing is done with Ubuntu 22.04 inside both QEMU 
and LVM2 virtualization on a Windows 11 machine.  Surprisingly, the Unbuntu
inside LVM2/Windows testing caused some issues with stdout/stderr to manifest,
so that testing was indeed valuable.  The ecma55 code is regularly tested with
gcc, clang, and tcc.

OBTAINING SOFTWARE

You need at least one working compiler and the GNU binutils.  Almost every
Linux distribution will work as is if you choose the gcc compiler.  If you want
to use clang, many distributions have packages you can install.  For tcc,
you really need to build it from source with the version noted
elsewhere in this file.  For the manual pages you can use mandoc as an
alternative to groff.  If you really want to modify or rebuild the included
book, you will need a complete TeXlive installation.  The shell scripts in this
project require bash a 4.x shell, but you know you really should be using the
current bash 5.X, right?

+------------+-----------------------------------------------------------------+
|software    |                  Where to get sources                           |
+------------+-----------------------------------------------------------------+
|bash        |  https://ftp.gnu.org/gnu/bash/                                  |
|binutils    |  https://ftp.gnu.org/gnu/binutils/                              |
|            |  This has the required assembler and recommended linker         |
|            +-----------------------------------------------------------------+
|Boost Software License                                                        |
|      http://www.boost.org/LICENSE_1_0.txt                                    |
|            +-----------------------------------------------------------------+
|clang       |  http://llvm.org/                                               |
|coreutils   |  https://ftp.gnu.org/gnu/coreutils/                             |
|            |  for cut, head, sort, wc, etc.                                  |
|cppcheck    |  http://sourceforge.net/projects/cppcheck/                      |
|            +-----------------------------------------------------------------+
|Creative Commons Zero License                                                 |
|      http://creativecommons.org/publicdomain/zero/1.0/legalcode              |
|Actual upstream text file of license is _very_ hard to find on their website: |
|      https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt         |
|            +-----------------------------------------------------------------+
|diffutils   |  https://ftp.gnu.org/gnu/diffutils/                             |
|dtoa/g_fmt  |  http://www.netlib.org/fp/                                      |
|FDL 1.3     |  http://www.gnu.org/licenses/                                   |
|file        |  ftp://ftp.astron.com/pub/file/                                 |
|gcc         |  https://ftp.gnu.org/gnu/gcc/                                   |
|ghostscript |  https://github.com/ArtifexSoftware/ghostpdl-downloads/releases |
|git         |  https://www.git-scm.org/                                       |
|glibc       |  https://ftp.gnu.org/gnu/glibc/                                 |
|GPLv2       |  http://www.gnu.org/licenses/                                   |
|grep        |  https://ftp.gnu.org/gnu/grep/                                  |
|groff       |  https://ftp.gnu.org/gnu/groff/                                 |
|gzip        |  https://ftp.gnu.org/gnu/gzip/                                  |
|ISAAC64     |  http://burtleburtle.net/bob/rand/isaacafa.html                 |
|make        |  https://ftp.gnu.org/gnu/make/                                  |
|mandoc      |  http://mdocml.bsd.lv/                                          |
|mold        |  https://github.com/rui314/mold                                 |
|musl        |  http://www.musl-libc.org/                                      |
|qpdf        |  http://qpdf.sourceforge.net/                                   |
|            |  https://github.com/qpdf/qpdf/                                  |
|sed         |  https://ftp.gnu.org/gnu/sed/                                   |
|SLEEF       |  https://github.com/shibatch/sleef                              |
|tar         |  https://ftp.gnu.org/gnu/tar/                                   |
|tcc         |  git://repo.or.cz/tinycc.git                                    |
|            |  Yeah, you have to pull from git for this.  They NEVER have any |
|            |  formal releases, ever.                                         |
|texlive     |  http://tug.org/texlive/                                        |
|tzcode      |  ftp://ftp.iana.org:/tz/releases/                               |
|            |            or                                                   |
|            |  http://www.iana.org/time-zones                                 |
|unifdef     |  http://dotat.at/prog/unifdef                                   |
|valgrind    |  http://www.valgrind.org/                                       |
|zlib        |  http://www.zlib.org/                                           |
+------------+-----------------------------------------------------------------+

TESTING

To run a complete regression test, use the 'canrelease' target and specify
the C compiler you want to use, like this for gcc on a 4-core machine:

make -Otarget -j -l4 -fMakefile.gcc canrelease PIE=0 LTO=0 2>&1 | \
tee logfile.gcc.nopie.nolto

and just change the Makefile.gcc to try any of the other two supported
compilers (leaving out the PIE and LTO switches for tcc, and
making sure PIE=0 for clang versions less than 12.0.0).

You need to have valgrind (3.11.0 or newer) installed if you want the leak
testing to work.  You need to have modern gcc (>=10.x), and modern llvm/clang
(>=11.0.0 or newer) for the address and undefined behavior address sanitizers
to work.  The llvm/clang people actually modified the assembly dialect, so
you must use the llvm-mc assembler.  The Makefile.clang takes care of
this for versions 8 through 17, but for any other versions you would need to
modify the Makefile.clang yourself.  You may need to adjust the
ASAN_SYMBOLIZER_PATH to be correct for your clang installation, since it can
vary depending on which version of clang and which Linux distribution you use.

TESTING OPTIMIZATIONS

The compiler now includes some simple optimizations.  To see their effect, one
needs a long-running program with some loops.  Optimization level zero means
disable optimizations.  Optimization level one does constant folding on the
expression tree.  Optimization level two switches to a DAG and removes common
sub-expressions.  Optimization level three, only available when also specifying
the -X switch, will do some simple algebraic simplifications on the DAG like
detecting C-C and replacing it with zero.  At this time, all optimizations are
local to an individual arithmetic expression.  Still, speedups can be seen
using this simple sequence:

cp tests/ADDBENCH.BAS .
./ecma55 -O0 ADDBENCH.BAS -o ADDBENCH.BAS.s.O0
./ecma55 -O1 ADDBENCH.BAS -o ADDBENCH.BAS.s.O1
./ecma55 -O2 ADDBENCH.BAS -o ADDBENCH.BAS.s.O2
./ecma55 -O3 -X ADDBENCH.BAS -o ADDBENCH.BAS.s.O3
as ADDBENCH.BAS.s.O0 -o ADDBENCH0.o
as ADDBENCH.BAS.s.O1 -o ADDBENCH1.o
as ADDBENCH.BAS.s.O2 -o ADDBENCH2.o
as ADDBENCH.BAS.s.O3 -o ADDBENCH3.o
ld -nostdlib -z defs -z nodefaultlib -z nodlopen -z noexecstack -Bstatic \
  --no-omagic -m elf_x86_64 -o ADDBENCH0 ADDBENCH0.o
ld -nostdlib -z defs -z nodefaultlib -z nodlopen -z noexecstack -Bstatic \
  --no-omagic -m elf_x86_64 -o ADDBENCH1 ADDBENCH1.o
ld -nostdlib -z defs -z nodefaultlib -z nodlopen -z noexecstack -Bstatic \
  --no-omagic -m elf_x86_64 -o ADDBENCH2 ADDBENCH2.o
ld -nostdlib -z defs -z nodefaultlib -z nodlopen -z noexecstack -Bstatic \
  --no-omagic -m elf_x86_64 -o ADDBENCH3 ADDBENCH3.o
time -p ./ADDBENCH0
time -p ./ADDBENCH1
time -p ./ADDBENCH2
time -p ./ADDBENCH3

On an Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, the times for the benchmark
programs are as follows:

$ time -p ./ADDBENCH0
*** TEST PASSED ***
real 261.00
user 260.71
sys 0.01
$ time -p ./ADDBENCH1
*** TEST PASSED ***
real 129.84
user 129.84
sys 0.00
$ time -p ./ADDBENCH2
*** TEST PASSED ***
real 104.05
user 104.04
sys 0.00
$ time -p ./ADDBENCH3
*** TEST PASSED ***
real 91.31
user 91.31
sys 0.00

The constant folding at optimization level one is the most effective, but each
increasing level provides some improvement.  The benchmark in question is
rather contrived, and real-world programs would not see such dramatic
improvements, but still should have improved run times compared to unoptimized
programs.

IMPLEMENTATION NOTES

To use extensions, the -X option must be specified.  When using the easy
wrappers, you can specify the switches with the ECMA55FLAGS environment
variable.  For instance, to specify that you want extensions and you want
SSE4.1 instructions to be used when compiling a file DEMO.BAS, you would
do this:

ECMA55FLAGS='-X -4' ./BASICC DEMO.BAS

The scanner adjusts itself automatically if necessary for accepting extensions.

The parser then is run to create an abstract syntax tree (AST) in a second
pass, calling for tokens from the scanner as required.

This tree is then walked once to populate line number information, again to
convert that to a DAG (just for jump targets), again to populate the symbol
table with information about the variables, optionally optimization pass
for constant folding and/or DAG conversion, and finally another traversal
is used to generate the assembly code.

Register allocation for arithmetic expression is done on an expression by
expression process as part of the code generation.  It is not sophisticated.

If the -P option is used, then instead of the three tree walks just described,
a different walk is done that regenerates the source code.

If the -X option is used, extensions are accepted.  These include allowing
lower-case letters in comments and strings, supporting AND, OR, and NOT in
conditional expressions, and supporting the EXIT FOR statement, and many
mathematical functions from ECMA-116 Full BASIC.

On any internal compiler error (ICE), the compiler will abort.  This compiler
does not attempt to continue after an error is encountered.

PUBLICATIONS

The file computers-03-00069.pdf contains a PDF version of the file from

   http://www.mdpi.com/2073-431X/3/3/69

which documented version 1.7 of the compiler.  The paper appeared in the MDPI
Computers journal:

Ham, John G. 2014. "An ECMA-55 Minimal BASIC Compiler for x86-64 Linux®."
Computers 3, no. 3: 69-116.

MISCELLANEOUS

make -Otarget -j -l4 -fMakefile.gcc distclean all COMIPLE_MODE=DEBUG2 PIE=0 LTO=0

This compiles for full debugging without the address sanitizer.  Use this
if you plan to use gdb or valgrind.

make -Otarget -j -l4 -fMakefile.gcc distclean all COMIPLE_MODE=DEBUG3 PIE=0 LTO=0

This compiles like DEBUG2, but switches to the intercepted mymalloc(),
myfree(), etc. in globals.c for heavy-duty debugging.  Using the -v option
to ecma55 will trigger output of every single allocation and deallocation,
and also print a list of anything that was not deallocated, and it will
dump (in hexadecimal) every allocated byte.  The output is of course huge and
slow, so you should redirect to a file when you run, like this:

./ecma55 -v WHATEVER.BAS >logfile 2>&1

This mode is very slow so should only be used if you are working on ecma55
itself and have memory leak problems detected by ASAN or valgrind that you
could not find by just staring at the code.  In other words, this is a last
resort.  Remember that this compiler does not even attempt to clean up memory
if it has to abort, which occurs when you do something silly like use
lower-case letters in a program without specifying the -X switch, or forgetting
the LET keyword on an assignment statement.

If you need more debugging information than the -v option provides, recompile
with CPPFLAGS=-DDEEP_DEBUG, but be aware that -v is then even more verbose.

Frequently Asked Questions

* My whatever.bas file won't compile.  Why not?

  You must use an upper-case '.BAS' suffix on the file name.  You must not have
  any spaces in the file name.  You really should ensure the filename uses only
  7-bit ASCII characters in its name.

* How can I renumber my program?

  If a program WHATEVER.BAS compiles and runs without errors, then you can
  generate a semantically equivalent renumbered version like this:

  ./ecma55 -R -o WHATEVER2.BAS WHATEVER.BAS

  The renumbered program is in the WHATEVER2.BAS file.

* Lower-case letters in my source code don't work!

  The ECMA-55 standard for Minimal BASIC does not permit lower-case letters.
  Now you know why there is a caps lock key on your keyboard.  You can use
  lower-case letters inside of quoted strings or in REM statements if you
  specify the -X option to the compiler to enable extensions, but even then
  all keywords must be in upper case.

* I want to support the whatever compiler.  What do I do?

  Copy Makefile.gcc to Makefile.whatever, update the Makefile, and then use
  -fMakefile.whatever instead of -fMakefile.gcc when you use the make program.
  You should expect to work a little bit at finding the right combination of
  options to the compiler, assembler, and linker for your toolchain.  Also,
  please be aware that you may have trouble with things like __attribute__(),
  inline assembly, etc.  Some assumptions of this software you should be aware
  of if you attempt to use an unsupported toolchain are:

   1. text files are 7-bit ASCII, use UNIX newlines (0x0A), and do not have BOM
      markers
   2. paths and filenames do not include any spaces, tabs, or punctuation
      except for periods and underscores, and cannot begin with a period.
   3. the ulimits must not be unreasonably small
   4. your toolchain can process the C11 dialect of C and is for Linux
      (POSIX support, etc.)
   5. your toolchain provides command-line tools
   6. your assembler can process GNU's version of AT&T syntax
   7. your assembler includes a macro processor compatible with GNU gas's
      macro processor
   8. your linker supports ELF64
   9. you really use GNU make with a version >= 4.x
  10. you really use bash with a version >= 4.x, not csh, tcsh, ash, dash,
      pdksh, etc.  As of version 2.28, you now __can__ build ecma55 using dash
      or ksh, but keep in mind the self-tests absolutely require a modern
      version of bash.

* I want to build a static version of the compiler with musl-libc.  What do I
  need to do?

  # make sure Makefile.gcc is using the gold linker
  make -Otarget -fMakefile.gcc distclean
  make -Otarget -fMakefile.gcc PIE=0 LTO=0 COMPILE_MODE=DEBUG2 CC=musl-gcc \
    LDFLAGS=-static
  strip ecma55
  strip -R .comment ecma55
  strip -R .note.gnu.gold-version ecma55

  This has been tested with binutils-2.40, gcc-13.1.0, and musl-1.2.4 versions.
  Note that without the LDFLAGS=-static, it doesn't work on Ubuntu 19.10...

* I want a PDF of the manual page.  What do I need to do?

  1.  If you are using groff, try this:

  groff -man -T pdf -P-pa4 ecma55.1 >ecma55.1.pdf

  2.  If you are using mandoc, try this:

  mandoc -man -T pdf -O paper=a4 ecma55.1 >ecma55.1.pdf

  Obviously if you use letter paper (U.S.A.), change a4 to letter instead.  In
  my opinion, the groff output looks better, but the mandoc output is
  servicable.

* String support is awful.  What do I need to do?

  This compiler implements the ECMA-55 Minimal BASIC standard, which does not
  support strings well.  This is a problem with the BASIC dialect, and not the
  compiler implementation.  If you need reasonable string data support for your
  program, then ECMA-55 Minimal BASIC is the wrong language to use for
  implementing that program.
MongoDB Logo MongoDB