Jan*_*hie 2 postgresql byte json go go-gorm
我正在尝试将 JSON 字节存储到 PostgreSQL,但存在问题。
\n\n\n\\u0000 无法转换为文本。
\n
如下所示,JSON 包含转义序列,例如\\u0000,PostgreSQL 似乎将其解释为 unicode 字符,而不是 JSON 字符串。
err := raws.SaveRawData(data, url)\n// if there is "\\u0000" in the bytes\nif err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {\n // try to remove \\u0000, but not work\n data = bytes.Trim(data, "\\u0000")\n e := raws.SaveRawData(data, url) // save data again\n if e != nil {\n return e // return the same error\n }\n return nil\n}\nRun Code Online (Sandbox Code Playgroud)\nOrigin API 数据可以从这里访问。其中有 \\u0000:
\n{\n "code": 0,\n "message": "0",\n "ttl": 1,\n "data": {\n "bvid": "BV1jb411C7m3",\n "aid": 42443484,\n "videos": 1,\n "tid": 172,\n "tname": "\xe6\x89\x8b\xe6\x9c\xba\xe6\xb8\xb8\xe6\x88\x8f",\n "copyright": 1,\n "pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",\n "title": "\xe5\x86\xb0\xe5\xa1\x94\xef\xbc\x9a\xe6\x88\x91\xe8\xbf\x9e\xe4\xbd\xa0\xe7\x9a\x84\xe5\xa4\xa7\xe6\x8b\x9b\xe9\x83\xbd\xe6\x95\xa2\xe7\xa1\xac\xe6\x8a\x97\xef\xbc\x8c\xe6\x89\x80\xe4\xbb\xa5\xe5\x91\x8a\xe8\xaf\x89\xe6\x88\x91\xe8\xb0\x81\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe5\xa5\xb3\xe7\x8e\x8b\xef\xbc\x9f\xef\xbc\x81\xe5\x8d\x95s\xe5\x86\xb0\xe5\xa1\x94\xe6\x80\x92\xe7\xa0\x8d\xe6\xa1\xa3\xe6\xa1\x88\xe5\xa5\xb3\xe7\x8e\x8b\xe5\xb7\xb4\xe5\xbe\xb7\xe5\xb0\x94\xef\xbc\x8c\xe8\xb0\x81\xef\xbc\x8c\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe4\xb8\x80\xe5\xa7\x90\xef\xbc\x9f\xef\xbc\x88\xe6\x89\x8b\xe5\x8a\xa8\xe6\xbb\x91\xe7\xa8\xbd\xef\xbc\x89",\n "pubdate": 1549100438,\n "ctime": 1549100438,\n "desc": "bgm\xef\xbc\x9a\xe9\x80\xae\xe8\x99\xbe\xe6\x88\xb7\\n\xe4\xbb\x8a\xe5\xa4\xa9\xe5\x85\x88\xe6\xb0\xb4\xe4\xb8\x80\xe6\x9c\x9f\xe5\x86\xb0\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe5\x90\x8e\xe5\xa4\xa9\xe5\xb0\xb1\xe5\x8f\xaf\xe4\xbb\xa5\xe4\xb8\x8b\xe7\xba\xa2\xe8\x8e\xb2\xe5\x95\xa6\xef\xbc\x8c\xe8\xae\xa1\xe5\x88\x92\xe9\x80\x9a\xe5\x98\xbf\xe5\x98\xbf\xe5\x98\xbf(\xc2\xba\xef\xb9\x83\xc2\xba )",\n "desc_v2": [\n {\n "raw_text": "bgm\xef\xbc\x9a\xe9\x80\xae\xe8\x99\xbe\xe6\x88\xb7\\n\xe4\xbb\x8a\xe5\xa4\xa9\xe5\x85\x88\xe6\xb0\xb4\xe4\xb8\x80\xe6\x9c\x9f\xe5\x86\xb0\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84\xef\xbc\x8c\xe5\x90\x8e\xe5\xa4\xa9\xe5\xb0\xb1\xe5\x8f\xaf\xe4\xbb\xa5\xe4\xb8\x8b\xe7\xba\xa2\xe8\x8e\xb2\xe5\x95\xa6\xef\xbc\x8c\xe8\xae\xa1\xe5\x88\x92\xe9\x80\x9a\xe5\x98\xbf\xe5\x98\xbf\xe5\x98\xbf(\xc2\xba\xef\xb9\x83\xc2\xba )",\n "type": 1,\n "biz_id": 0\n }\n ],\n "state": 0,\n "duration": 265,\n "rights": {\n "bp": 0,\n "elec": 0,\n "download": 1,\n "movie": 0,\n "pay": 0,\n "hd5": 0,\n "no_reprint": 1,\n "autoplay": 1,\n "ugc_pay": 0,\n "is_cooperation": 0,\n "ugc_pay_preview": 0,\n "no_background": 0,\n "clean_mode": 0,\n "is_stein_gate": 0\n },\n "owner": {\n "mid": 39699039,\n "name": "\xe6\x98\x8e\xe7\x9c\xb8-\xe9\x9b\x85\xe6\x9c\x9b",\n "face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"\n },\n "stat": {\n "aid": 42443484,\n "view": 1107,\n "danmaku": 7,\n "reply": 22,\n "favorite": 5,\n "coin": 4,\n "share": 0,\n "now_rank": 0,\n "his_rank": 0,\n "like": 10,\n "dislike": 0,\n "evaluation": "",\n "argue_msg": ""\n },\n "dynamic": "#\xe5\xb4\xa9\xe5\x9d\x8f3#",\n "cid": 74479750,\n "dimension": {\n "width": 1280,\n "height": 720,\n "rotate": 0\n },\n "no_cache": false,\n "pages": [\n {\n "cid": 74479750,\n "page": 1,\n "from": "vupload",\n "part": "\xe5\x86\xb0\xe5\xa1\x94\xef\xbc\x9a\xe6\x88\x91\xe8\xbf\x9e\xe4\xbd\xa0\xe7\x9a\x84\xe5\xa4\xa7\xe6\x8b\x9b\xe9\x83\xbd\xe6\x95\xa2\xe7\xa1\xac\xe6\x8a\x97\xef\xbc\x8c\xe6\x89\x80\xe4\xbb\xa5\xe5\x91\x8a\xe8\xaf\x89\xe6\x88\x91\xe8\xb0\x81\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe5\xa5\xb3\xe7\x8e\x8b\xef\xbc\x9f\xef\xbc\x81\xe5\x8d\x95s\xe5\x86\xb0\xe5\xa1\x94\xe6\x80\x92\xe7\xa0\x8d\xe6\xa1\xa3\xe6\xa1\x88\xe5\xa5\xb3\xe7\x8e\x8b\xe5\xb7\xb4\xe5\xbe\xb7\xe5\xb0\x94\xef\xbc\x8c\xe8\xb0\x81\xef\xbc\x8c\xe6\x89\x8d\xe6\x98\xaf\xe7\x94\x9f\xe7\x89\xa9\xe4\xb8\x80\xe5\xa7\x90\xef\xbc\x9f\xef\xbc\x88\xe6\x89\x8b\xe5\x8a\xa8\xe6\xbb\x91\xe7\xa8\xbd\xef\xbc\x89",\n "duration": 265,\n "vid": "",\n "weblink": "",\n "dimension": {\n "width": 1280,\n "height": 720,\n "rotate": 0\n }\n }\n ],\n "subtitle": {\n "allow_submit": false,\n "list": []\n },\n "user_garb": {\n "url_image_ani_cut": ""\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n保存的结构是:
\ntype RawJSONData struct {\n ID uint64 `gorm:"primarykey" json:"id"`\n CreatedAt time.Time `json:"-"`\n DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`\n Data datatypes.JSON `json:"data"`\n URL string `gorm:"index" json:"url"`\n}\nRun Code Online (Sandbox Code Playgroud)\ndatatypes.JSON来自gorm.io/datatypes。看起来只是json.RawMessage,它是(延伸自?) a []byte。
我使用 PostgreSQL 的JSONB类型来存储这些数据。
桌子:
\ncreate table raw_json_data\n(\n id bigserial not null constraint raw_json_data_pke primary key,\n created_at timestamp with time zone,\n deleted_at timestamp with time zone,\n data jsonb,\n url text\n);\nRun Code Online (Sandbox Code Playgroud)\n
Postgres和列\\u0000根本不支持Unicode 转义序列:TEXTJSONB
\n\njsonb 类型也拒绝 \\u0000 (因为它不能用 PostgreSQL\ 的文本类型表示)
\n
您可以将列类型更改为JSON:
create table Foo (test JSON);\ninsert into Foo (test) values (\'{"text": "\xe6\x98\x8e\xe5\xa4\xa9\xe5\x86\x8d\xe6\xb0\xb4\\u0000\xe7\xbb\xbf\xe5\xa1\x94\xe7\x9a\x84"}\');\n-- works\nRun Code Online (Sandbox Code Playgroud)\n\n\njson 数据类型存储输入文本的精确副本
\n
这样做的优点是可以保持数据与从 API 接收到的数据相同,以防转义序列具有需要保留的某些含义。
\n它还允许您使用 Postgres JSON 运算符(例如->>)进行查询,尽管将 JSON 字段转换\\u0000为文本仍然会失败:
select test->>\'text\' from Foo\n-- ERROR: unsupported Unicode escape sequence\nRun Code Online (Sandbox Code Playgroud)\n类型的列BYTEA还接受任何字节序列,而无需操作数据。在 Gorm 中,使用type:bytea标签:
type RawJSONData struct {\n // ... other fields\n Data string `gorm:"type:bytea" json:"data"`\n}\nRun Code Online (Sandbox Code Playgroud)\n如果上述任何一项对您来说不可接受,那么您必须清理输入字符串......
\n