Reading data stored in S-expressions into memory in another Common Lisp program

Bit*_*inu 1 lisp parsing common-lisp

I have a Common Lisp program that reads data from a S-expression file. Originally the program was intended to be written in C with a CSV file that was parsed into a struct in-memory, but I switched to Common Lisp and S-expressions to remove abstractions between programmer and user. If I have a struct defined in program.lisp as such:

(defstruct flashcard
  front
  back
)
Run Code Online (Sandbox Code Playgroud)

...and a file data.lisp:

(:virology
  (
     (:card
        :front "To what sort of cells does Epstain-Barr virus attach?"
        :back "B-cells"      
     )
     (:card
        :front "For how long does the virus of herpes simplex persist in tissues?"
        :back "lifetime"
     )
     (:card
        :front "What T-cell receptors are recognised by HIV?"
        :back CD4
     )
  )

Run Code Online (Sandbox Code Playgroud)

...with several lists within it much like ":virology".

How would I read data.lisp into memory so that each individual card can be accessed by program.lisp.

For context I'm looking to iterate over each card in a queue based on filters supplied by the user (ie. they list :virology :biochemistry :homeostasis etc etc.), and quiz the user on the card. I can't see how this specific use case may affect how I would load the file into memory though.

(NOTE: As it currently stands, a simple struct with just two fields is more likely to be more suitable for a CSV file, however, as I develop the program I intend to add metadata, etc. so storing data in Lisp S-expressions that are then read by program.lisp is far more useful in the long run)

cor*_*ump 7

读取文件

假设您在data.lispis所在的目录中启动 Lisp 程序(例如 sbcl、ecl),那么这应该会生成一个值树(这里*是 REPL 中的提示,接下来是正在读取的值):

* (with-open-file (in "data.lisp")
    (read in))

(:VIROLOGY
 ((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
   "B-cells")
  (:CARD :FRONT
   "For how long does the virus of herpes simplex persist in tissues?" :BACK
   "lifetime")
  (:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4)))
Run Code Online (Sandbox Code Playgroud)

通常,您会将其包装在一个函数中:

(defun cards (&optional (file "data.lisp"))
  (with-open-file (in file) (read in)))
Run Code Online (Sandbox Code Playgroud)

这假设如果您有多张卡片,请按如下方式编写它们:

(:virology (...) :biology (....) :physics (...))
Run Code Online (Sandbox Code Playgroud)

如果改为编写多个列表,如下所示:

(:virology (...))
(:biology (...))
(:physics (...))
Run Code Online (Sandbox Code Playgroud)

那么上面cards需要循环:

(defun cards (&optional (file "data.lisp"))
  (with-open-file (in file)
    (loop 
      for item = (read in nil in)
      until (eq item in)
      collect item)))
Run Code Online (Sandbox Code Playgroud)

上面有一个技巧,因为您想收集值直到没有更多值,但您不想在文件结束时抛出异常。这就是为什么read将其nil作为第二个参数(没有错误),而第三个参数是它到达文件结尾时应该返回的值。这里的值是流对象in本身: this 用于具有不可能由read(不像,比如说nil)产生的唯一值。在您的情况下,这并不是真正必要的,您可以使用它nil,但总的来说,这是一个很好的做法。

门禁卡

如果您选择一种或另一种方式,则必须适应查询您的值。假设您使用loop上述内容,因此您的数据是循环收集的单个条目列表:

((:virology (...))
 (:biology (...))
 (:physics (...)))
Run Code Online (Sandbox Code Playgroud)

它的形状像一个关联列表,其中每个元素都是一个 cons-cell,这样car是一个键和cdr一个值。如果您致电(assoc :virology (cards)),您将拥有:

(:VIROLOGY
 ((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
   "B-cells")
  (:CARD :FRONT
   "For how long does the virus of herpes simplex persist in tissues?" :BACK
   "lifetime")
  (:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4)))
Run Code Online (Sandbox Code Playgroud)

cdr的是相同的值,即卡片列表清单。

您可以简化数据格式,以便使用以下格式:

(:virology (:card ...) (:card ...) (:card ...)) 
Run Code Online (Sandbox Code Playgroud)

代替:

(:virology ((:card ...) (:card ...) (:card ...)))
Run Code Online (Sandbox Code Playgroud)

这将删除一层嵌套,而不是包含一个卡片列表的列表,您可以直接访问卡片列表作为cdr您的条目。因此,让我们假设您进行编辑data.lisp以删除一层 nesting,然后:

(defun find-cards (cards key)
  (cdr (assoc key cards)))

* (find-cards (cards) :virology)
((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
  "B-cells")
 (:CARD :FRONT
  "For how long does the virus of herpes simplex persist in tissues?" :BACK
  "lifetime")
 (:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4))
Run Code Online (Sandbox Code Playgroud)

到现在为止还挺好。

回顾一下,我们有一个关联列表将键映射到卡片列表。

如果您想改用哈希表,那么您可以自己填充一个,或者使用像alexandria这样的库。为此,您可能应该首先设置Quicklisp

* (ql:quickload :alexandria)
To load "alexandria":
  Load 1 ASDF system:
    alexandria
; Loading "alexandria"

(:ALEXANDRIA)
Run Code Online (Sandbox Code Playgroud)

然后,您可以调用:

* (alexandria:alist-hash-table (cards))
#<HASH-TABLE :TEST EQL :COUNT 1 {1015268CE3}>
Run Code Online (Sandbox Code Playgroud)

如果您到达这一步,您可以在第11章中查看哈希表的工作原理。例如,来自 Peter 的 Seibel Practical Common Lisp 的集合

制作抽认卡结构

在任何情况下,无论您使用find-cards还是使用访问卡片GETHASH,您都将拥有特定格式的卡片列表。如果要将它们转换为结构的实例,那么首先需要定义一种将卡片从列表格式转换为结构的方法。

每张卡片都存储在一个以 开头的列表中,:card其余的是值的属性列表。属性列表是一系列扁平化的键和值:

(:a 0 :b 1 :c 2)
Run Code Online (Sandbox Code Playgroud)

幸运的是,您可以使用DESTRUCTURING-BIND匹配已知格式(对于更复杂的格式,有模式匹配库):

(defun parse-card (list)
  ;; This is the expected format, the list starts with `:card`, so
  ;; I add an assertion here.
  (assert (eq :card (first list)))  
  ;; The rest of the list is a property list, let's bind front and back
  ;; to the values associated with keys :front and :back
  (destructuring-bind (&key front back) (rest list)
    ;; this is a function generated by "defstruct"
    (make-flashcard :front front :back back)))
Run Code Online (Sandbox Code Playgroud)

例如:

* (parse-card '(:card :front 0 :back 1))
#S(FLASHCARD :FRONT 0 :BACK 1)
Run Code Online (Sandbox Code Playgroud)

完成这项工作后,您可以使用 mapcar 来转换卡片列表:

* (mapcar #'parse-card (find-cards (cards) :virology))
(#S(FLASHCARD
    :FRONT "To what sort of cells does Epstain-Barr virus attach?"
    :BACK "B-cells")
 #S(FLASHCARD
    :FRONT "For how long does the virus of herpes simplex persist in tissues?"
    :BACK "lifetime")
 #S(FLASHCARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4))
Run Code Online (Sandbox Code Playgroud)

我知道结构很容易定义,但它们在实时系统中不容易改变:如果你想添加一个新的插槽,那么你需要重新启动你的 Lisp(这是为了高效编译,就像在静态类型语言中一样,而类更具动态性)。

警告

当你读取的数据是一个符号时,比如CD4,它会属于调用时当前绑定的read。这可能会污染您的包裹和/或造成困难。您可能更喜欢使用字符串。

结论

这不是一个完整的解决方案,但您现在应该有不同的工具来进步,这取决于您想去哪里。

  • `uiop:with-safe-io-syntax` 可能有帮助吗?“围绕 BODY 评估建立安全的 CL 阅读器选项” (2认同)