神秘编译(BASIC?)遗留程序

0 basic legacy-code

我有一个遗留程序,我假设是用 BASIC 编写的(一个以 .bas 结尾的文件)。好像已经编译过了!当在十六进制编辑器中打开时,字符串和注释都是可读的,而完成计算的部分则不可读。AFAIK,BASIC 是一种解释语言。

问题:

  • 过去是否有 BASIC 的编译器或运行时环境将编译后的输出存储为 .bas 文件?
  • 有反编译器吗?

Jer*_*ton 5

For old-school BASIC programs, there was a difference between compilation and tokenization. Compilation converted the BASIC code to machine code, and would usually be stored with a file extension indicating that the code should be run directly rather than interpreted; often, this extension was some variation of \xe2\x80\x9c.BIN\xe2\x80\x9d. On personal computers at least, compilation usually required third-party software to convert the BASIC statements to machine code.

\n

While BASIC programs could usually be saved as straight ASCII, with BASIC statements fully represented by their text, most BASICs tokenized saved programs by default. Tokenized files were usually saved with some variation of a .BAS file extension.

\n

Tokenization was generally a one-to-one translation of BASIC statement/function to the one- or two-byte code for that statement or function. This saved space on the system; both disk space and RAM were limited on older personal computers. But it also made it much easier for the system to run the code on the fly\xe2\x80\x94interpret it\xe2\x80\x94and made the interpretation much faster.

\n

Without tokenization, the difference between RESET and RESTORE in the Radio Shack Color Computer\xe2\x80\x99s Extended Color BASIC, for example, won\xe2\x80\x99t show up until comparing the fourth character. With tokenization, the difference shows up on comparing the first character\xe2\x80\x949D vs. 8F.

\n

For example, this archiveteam.org page lists the tokenization numbers for GW-BASIC.

\n

Detokenization, or conversion from the tokens to the textual representation of the statement, simply reverses the process. This reversal would have been performed every time the user listed the program. On a modern computer, a detokenization program should be able to be easily written in just about any modern scripting language. As long as you know the format, detokenization is just a matter of going through the tokenized file byte-by-byte and converting tokens back to their equivalent BASIC statement or function.

\n

For example, bascat claims to detokenize GW-BASIC.

\n

这里\xe2\x80\x99是一个标记化的例子;我\xe2\x80\x99m 使用 TRS-80 Color Computer\xe2\x80\x99s Extended Color BASIC,因为我有易于使用的工具来标记它,但对于大多数老式 BASIC 来说,基本思想是相同的。

\n

(有点荒谬)BASIC 程序:

\n
10 RESET(14,15)\n20 RESTORE\n
Run Code Online (Sandbox Code Playgroud)\n

标记化文件的十六进制转储:

\n
00000000  26 0b 00 0a 9d 28 31 34  2c 31 35 29 00 26 11 00\n00000010  14 8f 00 00 00                                  \n00000015\n
Run Code Online (Sandbox Code Playgroud)\n

前两个字符是下一行的地址;当从文件中去标记时,如果您的特定语言使用这些地址,您可能会忽略这些地址。(它们\xe2\x80\x99主要用于运行代码:例如,如果一行中有 GOTO 60,则解释器可以找到第 60 行,而无需解释标记来到达那里。)

\n

后两个字符是行号:000A 是 10。下一个字符 9D 是 的标记化RESET。那么,28是左括号的ASCII值,31是\xe2\x80\x9c1\xe2\x80\x9d,34是\xe2\x80\x9c4\xe2\x80\x9d(即14作为第一个参数) to RESET; 2C 是逗号,31 和 35 是 to 的第二个参数的 1 和 5,29RESET是右括号,00 是行尾。

\n

接下来的两个字符是下一行的地址,0014 是第二行\xe2\x80\x99s 行号:14 是 20 的十六进制。最后,8F 是RESTORE, a 00 ends the line, and the final two zeroes end the program.

\n

  • 我不关心基础知识,但这真是一本有趣的书! (2认同)