如何使用 GORM 将包含转义码的 JSON 插入到 PostgreSQL 中的 JSONB 列中

Jan*_*hie 2 postgresql byte json go go-gorm

我正在尝试将 JSON 字节存储到 PostgreSQL,但存在问题。

\n
\n

\\u0000 无法转换为文本。

\n
\n

如下所示,JSON 包含转义序列,例如\\u0000,PostgreSQL 似乎将其解释为 unicode 字符,而不是 JSON 字符串。

\n
err := raws.SaveRawData(data, url)\n// if there is "\\u0000" in the bytes\nif err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {\n    // try to remove \\u0000, but not work\n    data = bytes.Trim(data, "\\u0000")\n    e := raws.SaveRawData(data, url) // save data again\n    if e != nil {\n        return e // return the same error\n    }\n    return nil\n}\n
Run Code Online (Sandbox Code Playgroud)\n

Origin API 数据可以从这里访问。其中有 \\u0000:

\n
{\n  "code": 0,\n  "message": "0",\n  "ttl": 1,\n  "data": {\n    "bvid": "BV1jb411C7m3",\n    "aid": 42443484,\n    "videos": 1,\n    "tid": 172,\n    "tname": "\xe6\x89\x8b\xe6\x9c\xba\xe6\xb8\xb8\xe6\x88\x8f",\n    "copyright": 1,\n    "pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",\n    "title": "\xe5\x86\xb0\xe5\xa1\x94\xef\xbc\x9a\xe6\x88\x91\xe8\xbf\x9e\xe4\xbd\xa0\xe7\x9a\x84\xe5\xa4\xa7\xe6\x8b\x9b\xe9\x83\xbd\xe6\x95\xa2\xe7\xa1\xac\xe6\x8a\x97\xef\xbc\x8c\xe6\x89\x80\xe4\xbb\xa5\xe5\x91\x8a\xe8\xaf\x89\xe6\x88\x91\xe8\xb0\x81\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe5\xa5\xb3\xe7\x8e\x8b\xef\xbc\x9f\xef\xbc\x81\xe5\x8d\x95s\xe5\x86\xb0\xe5\xa1\x94\xe6\x80\x92\xe7\xa0\x8d\xe6\xa1\xa3\xe6\xa1\x88\xe5\xa5\xb3\xe7\x8e\x8b\xe5\xb7\xb4\xe5\xbe\xb7\xe5\xb0\x94\xef\xbc\x8c\xe8\xb0\x81\xef\xbc\x8c\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe4\xb8\x80\xe5\xa7\x90\xef\xbc\x9f\xef\xbc\x88\xe6\x89\x8b\xe5\x8a\xa8\xe6\xbb\x91\xe7\xa8\xbd\xef\xbc\x89",\n    "pubdate": 1549100438,\n    "ctime": 1549100438,\n    "desc": "bgm\xef\xbc\x9a\xe9\x80\xae\xe8\x99\xbe\xe6\x88\xb7\\n\xe4\xbb\x8a\xe5\xa4\xa9\xe5\x85\x88\xe6\xb0\xb4\xe4\xb8\x80\xe6\x9c\x9f\xe5\x86\xb0\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe5\x90\x8e\xe5\xa4\xa9\xe5\xb0\xb1\xe5\x8f\xaf\xe4\xbb\xa5\xe4\xb8\x8b\xe7\xba\xa2\xe8\x8e\xb2\xe5\x95\xa6\xef\xbc\x8c\xe8\xae\xa1\xe5\x88\x92\xe9\x80\x9a\xe5\x98\xbf\xe5\x98\xbf\xe5\x98\xbf(\xc2\xba\xef\xb9\x83\xc2\xba )",\n    "desc_v2": [\n      {\n        "raw_text": "bgm\xef\xbc\x9a\xe9\x80\xae\xe8\x99\xbe\xe6\x88\xb7\\n\xe4\xbb\x8a\xe5\xa4\xa9\xe5\x85\x88\xe6\xb0\xb4\xe4\xb8\x80\xe6\x9c\x9f\xe5\x86\xb0\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe5\x90\x8e\xe5\xa4\xa9\xe5\xb0\xb1\xe5\x8f\xaf\xe4\xbb\xa5\xe4\xb8\x8b\xe7\xba\xa2\xe8\x8e\xb2\xe5\x95\xa6\xef\xbc\x8c\xe8\xae\xa1\xe5\x88\x92\xe9\x80\x9a\xe5\x98\xbf\xe5\x98\xbf\xe5\x98\xbf(\xc2\xba\xef\xb9\x83\xc2\xba )",\n        "type": 1,\n        "biz_id": 0\n      }\n    ],\n    "state": 0,\n    "duration": 265,\n    "rights": {\n      "bp": 0,\n      "elec": 0,\n      "download": 1,\n      "movie": 0,\n      "pay": 0,\n      "hd5": 0,\n      "no_reprint": 1,\n      "autoplay": 1,\n      "ugc_pay": 0,\n      "is_cooperation": 0,\n      "ugc_pay_preview": 0,\n      "no_background": 0,\n      "clean_mode": 0,\n      "is_stein_gate": 0\n    },\n    "owner": {\n      "mid": 39699039,\n      "name": "\xe6\x98\x8e\xe7\x9c\xb8-\xe9\x9b\x85\xe6\x9c\x9b",\n      "face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"\n    },\n    "stat": {\n      "aid": 42443484,\n      "view": 1107,\n      "danmaku": 7,\n      "reply": 22,\n      "favorite": 5,\n      "coin": 4,\n      "share": 0,\n      "now_rank": 0,\n      "his_rank": 0,\n      "like": 10,\n      "dislike": 0,\n      "evaluation": "",\n      "argue_msg": ""\n    },\n    "dynamic": "#\xe5\xb4\xa9\xe5\x9d\x8f3#",\n    "cid": 74479750,\n    "dimension": {\n      "width": 1280,\n      "height": 720,\n      "rotate": 0\n    },\n    "no_cache": false,\n    "pages": [\n      {\n        "cid": 74479750,\n        "page": 1,\n        "from": "vupload",\n        "part": "\xe5\x86\xb0\xe5\xa1\x94\xef\xbc\x9a\xe6\x88\x91\xe8\xbf\x9e\xe4\xbd\xa0\xe7\x9a\x84\xe5\xa4\xa7\xe6\x8b\x9b\xe9\x83\xbd\xe6\x95\xa2\xe7\xa1\xac\xe6\x8a\x97\xef\xbc\x8c\xe6\x89\x80\xe4\xbb\xa5\xe5\x91\x8a\xe8\xaf\x89\xe6\x88\x91\xe8\xb0\x81\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe5\xa5\xb3\xe7\x8e\x8b\xef\xbc\x9f\xef\xbc\x81\xe5\x8d\x95s\xe5\x86\xb0\xe5\xa1\x94\xe6\x80\x92\xe7\xa0\x8d\xe6\xa1\xa3\xe6\xa1\x88\xe5\xa5\xb3\xe7\x8e\x8b\xe5\xb7\xb4\xe5\xbe\xb7\xe5\xb0\x94\xef\xbc\x8c\xe8\xb0\x81\xef\xbc\x8c\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe4\xb8\x80\xe5\xa7\x90\xef\xbc\x9f\xef\xbc\x88\xe6\x89\x8b\xe5\x8a\xa8\xe6\xbb\x91\xe7\xa8\xbd\xef\xbc\x89",\n        "duration": 265,\n        "vid": "",\n        "weblink": "",\n        "dimension": {\n          "width": 1280,\n          "height": 720,\n          "rotate": 0\n        }\n      }\n    ],\n    "subtitle": {\n      "allow_submit": false,\n      "list": []\n    },\n    "user_garb": {\n      "url_image_ani_cut": ""\n    }\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

保存的结构是:

\n
type RawJSONData struct {\n    ID        uint64         `gorm:"primarykey" json:"id"`\n    CreatedAt time.Time      `json:"-"`\n    DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`\n    Data      datatypes.JSON `json:"data"`\n    URL       string         `gorm:"index" json:"url"`\n}\n
Run Code Online (Sandbox Code Playgroud)\n

datatypes.JSON来自gorm.io/datatypes。看起来只是json.RawMessage,它是(延伸自?) a []byte

\n

我使用 PostgreSQL 的JSONB类型来存储这些数据。

\n

桌子:

\n
create table raw_json_data\n(\n    id         bigserial not null constraint raw_json_data_pke primary key,\n    created_at timestamp with time zone,\n    deleted_at timestamp with time zone,\n    data       jsonb,\n    url        text\n);\n
Run Code Online (Sandbox Code Playgroud)\n

bla*_*een 6

Postgres和列\\u0000根本不支持Unicode 转义序列:TEXTJSONB

\n
\n

jsonb 类型也拒绝 \\u0000 (因为它不能用 PostgreSQL\ 的文本类型表示)

\n
\n

您可以将列类型更改为JSON

\n
create table Foo (test JSON);\ninsert into Foo (test) values (\'{"text": "\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84"}\');\n-- works\n
Run Code Online (Sandbox Code Playgroud)\n
\n

json 数据类型存储输入文本的精确副本

\n
\n

这样做的优点是可以保持数据与从 API 接收到的数据相同,以防转义序列具有需要保留的某些含义。

\n

它还允许您使用 Postgres JSON 运算符(例如->>)进行查询,尽管将 JSON 字段转换\\u0000为文本仍然会失败:

\n
select test->>\'text\' from Foo\n-- ERROR:  unsupported Unicode escape sequence\n
Run Code Online (Sandbox Code Playgroud)\n
\n

类型的列BYTEA还接受任何字节序列,而无需操作数据。在 Gorm 中,使用type:bytea标签:

\n
type RawJSONData struct {\n    // ... other fields\n    Data      string `gorm:"type:bytea" json:"data"`\n}\n
Run Code Online (Sandbox Code Playgroud)\n
\n

如果上述任何一项对您来说不可接受,那么您必须清理输入字符串......

\n