Bit*_*inu 1 lisp parsing common-lisp
I have a Common Lisp program that reads data from a S-expression file. Originally the program was intended to be written in C with a CSV file that was parsed into a struct in-memory, but I switched to Common Lisp and S-expressions to remove abstractions between programmer and user. If I have a struct defined in program.lisp as such:
(defstruct flashcard
front
back
)
Run Code Online (Sandbox Code Playgroud)
...and a file data.lisp:
(:virology
(
(:card
:front "To what sort of cells does Epstain-Barr virus attach?"
:back "B-cells"
)
(:card
:front "For how long does the virus of herpes simplex persist in tissues?"
:back "lifetime"
)
(:card
:front "What T-cell receptors are recognised by HIV?"
:back CD4
)
)
Run Code Online (Sandbox Code Playgroud)
...with several lists within it much like ":virology".
How would I read data.lisp into memory so that each individual card can be accessed
by program.lisp.
For context I'm looking to iterate over each card in a queue based on filters supplied by the user (ie. they list :virology :biochemistry :homeostasis etc etc.), and quiz the user on the card. I can't see how this specific use case may affect how I would load the file into memory though.
(NOTE: As it currently stands, a simple struct with just two fields is more likely to be more suitable for a CSV file, however, as I develop the program I intend to add metadata, etc. so storing data in Lisp S-expressions that are then read by program.lisp is far more useful in the long run)
假设您在data.lispis所在的目录中启动 Lisp 程序(例如 sbcl、ecl),那么这应该会生成一个值树(这里*是 REPL 中的提示,接下来是正在读取的值):
* (with-open-file (in "data.lisp")
(read in))
(:VIROLOGY
((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
"B-cells")
(:CARD :FRONT
"For how long does the virus of herpes simplex persist in tissues?" :BACK
"lifetime")
(:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4)))
Run Code Online (Sandbox Code Playgroud)
通常,您会将其包装在一个函数中:
(defun cards (&optional (file "data.lisp"))
(with-open-file (in file) (read in)))
Run Code Online (Sandbox Code Playgroud)
这假设如果您有多张卡片,请按如下方式编写它们:
(:virology (...) :biology (....) :physics (...))
Run Code Online (Sandbox Code Playgroud)
如果改为编写多个列表,如下所示:
(:virology (...))
(:biology (...))
(:physics (...))
Run Code Online (Sandbox Code Playgroud)
那么上面cards需要循环:
(defun cards (&optional (file "data.lisp"))
(with-open-file (in file)
(loop
for item = (read in nil in)
until (eq item in)
collect item)))
Run Code Online (Sandbox Code Playgroud)
上面有一个技巧,因为您想收集值直到没有更多值,但您不想在文件结束时抛出异常。这就是为什么read将其nil作为第二个参数(没有错误),而第三个参数是它到达文件结尾时应该返回的值。这里的值是流对象in本身: this 用于具有不可能由read(不像,比如说nil)产生的唯一值。在您的情况下,这并不是真正必要的,您可以使用它nil,但总的来说,这是一个很好的做法。
如果您选择一种或另一种方式,则必须适应查询您的值。假设您使用loop上述内容,因此您的数据是循环收集的单个条目列表:
((:virology (...))
(:biology (...))
(:physics (...)))
Run Code Online (Sandbox Code Playgroud)
它的形状像一个关联列表,其中每个元素都是一个 cons-cell,这样car是一个键和cdr一个值。如果您致电(assoc :virology (cards)),您将拥有:
(:VIROLOGY
((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
"B-cells")
(:CARD :FRONT
"For how long does the virus of herpes simplex persist in tissues?" :BACK
"lifetime")
(:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4)))
Run Code Online (Sandbox Code Playgroud)
该cdr的是相同的值,即卡片列表清单。
您可以简化数据格式,以便使用以下格式:
(:virology (:card ...) (:card ...) (:card ...))
Run Code Online (Sandbox Code Playgroud)
代替:
(:virology ((:card ...) (:card ...) (:card ...)))
Run Code Online (Sandbox Code Playgroud)
这将删除一层嵌套,而不是包含一个卡片列表的列表,您可以直接访问卡片列表作为cdr您的条目。因此,让我们假设您进行编辑data.lisp以删除一层 nesting,然后:
(defun find-cards (cards key)
(cdr (assoc key cards)))
* (find-cards (cards) :virology)
((:CARD :FRONT "To what sort of cells does Epstain-Barr virus attach?" :BACK
"B-cells")
(:CARD :FRONT
"For how long does the virus of herpes simplex persist in tissues?" :BACK
"lifetime")
(:CARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4))
Run Code Online (Sandbox Code Playgroud)
到现在为止还挺好。
回顾一下,我们有一个关联列表将键映射到卡片列表。
如果您想改用哈希表,那么您可以自己填充一个,或者使用像alexandria这样的库。为此,您可能应该首先设置Quicklisp:
* (ql:quickload :alexandria)
To load "alexandria":
Load 1 ASDF system:
alexandria
; Loading "alexandria"
(:ALEXANDRIA)
Run Code Online (Sandbox Code Playgroud)
然后,您可以调用:
* (alexandria:alist-hash-table (cards))
#<HASH-TABLE :TEST EQL :COUNT 1 {1015268CE3}>
Run Code Online (Sandbox Code Playgroud)
如果您到达这一步,您可以在第11章中查看哈希表的工作原理。例如,来自 Peter 的 Seibel Practical Common Lisp 的集合。
在任何情况下,无论您使用find-cards还是使用访问卡片GETHASH,您都将拥有特定格式的卡片列表。如果要将它们转换为结构的实例,那么首先需要定义一种将卡片从列表格式转换为结构的方法。
每张卡片都存储在一个以 开头的列表中,:card其余的是值的属性列表。属性列表是一系列扁平化的键和值:
(:a 0 :b 1 :c 2)
Run Code Online (Sandbox Code Playgroud)
幸运的是,您可以使用DESTRUCTURING-BIND匹配已知格式(对于更复杂的格式,有模式匹配库):
(defun parse-card (list)
;; This is the expected format, the list starts with `:card`, so
;; I add an assertion here.
(assert (eq :card (first list)))
;; The rest of the list is a property list, let's bind front and back
;; to the values associated with keys :front and :back
(destructuring-bind (&key front back) (rest list)
;; this is a function generated by "defstruct"
(make-flashcard :front front :back back)))
Run Code Online (Sandbox Code Playgroud)
例如:
* (parse-card '(:card :front 0 :back 1))
#S(FLASHCARD :FRONT 0 :BACK 1)
Run Code Online (Sandbox Code Playgroud)
完成这项工作后,您可以使用 mapcar 来转换卡片列表:
* (mapcar #'parse-card (find-cards (cards) :virology))
(#S(FLASHCARD
:FRONT "To what sort of cells does Epstain-Barr virus attach?"
:BACK "B-cells")
#S(FLASHCARD
:FRONT "For how long does the virus of herpes simplex persist in tissues?"
:BACK "lifetime")
#S(FLASHCARD :FRONT "What T-cell receptors are recognised by HIV?" :BACK CD4))
Run Code Online (Sandbox Code Playgroud)
我知道结构很容易定义,但它们在实时系统中不容易改变:如果你想添加一个新的插槽,那么你需要重新启动你的 Lisp(这是为了高效编译,就像在静态类型语言中一样,而类更具动态性)。
当你读取的数据是一个符号时,比如CD4,它会属于调用时当前绑定的包read。这可能会污染您的包裹和/或造成困难。您可能更喜欢使用字符串。
这不是一个完整的解决方案,但您现在应该有不同的工具来进步,这取决于您想去哪里。