我正在尝试将GDELT数据集存储在 MySQL 数据库(MySQL 8.0、RHEL 7)中,但它返回了 ERROR 1265(01000),因为一个浮点列中有空值:
CREATE TABLE event (
GlobalEventID INT NOT NULL,
Day INT NOT NULL,
MonthYear MEDIUMINT NOT NULL,
Year SMALLINT NOT NULL,
FractionDate FLOAT NOT NULL,
Actor1Code TINYTEXT NULL,
Actor1Name TINYTEXT NULL,
Actor1CountryCode TINYTEXT NULL,
Actor1KnownGroupCode TINYTEXT NULL,
Actor1EthnicCode TINYTEXT NULL,
Actor1Religion1Code TINYTEXT NULL,
Actor1Religion2Code TINYTEXT NULL,
Actor1Type1Code TINYTEXT NULL,
Actor1Type2Code TINYTEXT NULL,
Actor1Type3Code TINYTEXT NULL,
Actor2Code TINYTEXT NULL,
Actor2Name TINYTEXT NULL,
Actor2CountryCode TINYTEXT NULL,
Actor2KnownGroupCode TINYTEXT NULL,
Actor2EthnicCode TINYTEXT NULL,
Actor2Religion1Code TINYTEXT NULL,
Actor2Religion2Code TINYTEXT NULL,
Actor2Type1Code TINYTEXT NULL,
Actor2Type2Code TINYTEXT NULL,
Actor2Type3Code TINYTEXT NULL,
IsRootEvent TINYINT NOT NULL,
EventCode TINYTEXT NOT NULL,
EventBaseCode TINYTEXT NOT NULL,
EventRootCode TINYTEXT NOT NULL,
QuadClass TINYINT NOT NULL,
GoldsteinScale FLOAT NOT NULL,
NumMentions INT NOT NULL,
NumSources INT NOT NULL,
NumArticles INT NOT NULL,
AvgTone FLOAT NOT NULL,
Actor1Geo_Type TINYINT NULL,
Actor1Geo_Fullname TEXT NULL,
Actor1Geo_CountryCode TINYTEXT NULL,
Actor1Geo_ADM1Code TINYTEXT NULL,
Actor1Geo_ADM2Code TINYTEXT NULL,
Actor1Geo_Lat FLOAT NULL,
Actor1Geo_Long FLOAT NULL,
Actor1Geo_FeatureID TINYTEXT NULL,
Actor2Geo_Type TINYINT NULL,
Actor2Geo_Fullname TEXT NULL,
Actor2Geo_CountryCode TINYTEXT NULL,
Actor2Geo_ADM1Code TINYTEXT NULL,
Actor2Geo_ADM2Code TINYTEXT NULL,
Actor2Geo_Lat FLOAT NULL,
Actor2Geo_Long FLOAT NULL,
Actor2Geo_FeatureID TINYTEXT NULL,
ActionGeo_Type TINYINT NULL,
ActionGeo_Fullname TEXT NULL,
ActionGeo_CountryCode TINYTEXT NULL,
ActionGeo_ADM1Code TINYTEXT NULL,
ActionGeo_ADM2Code TINYTEXT NULL,
ActionGeo_Lat FLOAT NULL,
ActionGeo_Long FLOAT NULL,
ActionGeo_FeatureID TINYTEXT NULL,
DATEADDED BIGINT NOT NULL,
SOURCEURL TEXT NOT NULL,
PRIMARY KEY ( GlobalEventID )
);
Query OK, 0 rows affected (0.01 sec)
LOAD DATA INFILE 'event.csv'
INTO TABLE event
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
;
ERROR 1265 (01000): Data truncated for column 'Actor2Geo_Lat' at row 5
Run Code Online (Sandbox Code Playgroud)
可以在此处检索原始数据。它看起来像这样:
4 Baghdad, Baghdad, Iraq IZ IZ07 36785 33.3386 44.3939 -3103581 4 Baghdad, Baghdad, Iraq
1 Russia RS RS 60 100 RS 1 Russia
1 Ukraine UP UP 49 32 UP 1 Ukraine
4 Sydney, New South Wales, Australia AS AS02 154637 -33.8833 151.217 -1603135 4 Sydney, New South Wales, Australia
0 3 Los Angeles, California, United States
4 Sydney, New South Wales, Australia AS AS02 154637 -33.8833 151.217 -1603135 4 Sydney, New South Wales, Australia
Run Code Online (Sandbox Code Playgroud)
第五行有空值,Actor2Geo_Lat 之前的列可以处理它们。
我应该怎么做才能正确加载数据?非常感谢!
小智 5
LOAD DATA INFILE 实际上不处理空条目。它们必须具有适合于列数据类型的值,否则序列 \N 表示 NULL。
见http://bugs.mysql.com/bug.php?id=64603
要解决此问题,您可以使用 sed 命令(或任何等效的文本替换工具,如果您使用 Windows)将空条目替换为 \N。
您也可以将字段读入局部变量,然后将实际字段值设置为 NULL,如果局部变量最终包含一个空字符串。如果它们都可能为空,那么您会将它们全部读入变量并具有多个 SET 语句,如下所示:
LOAD DATA INFILE 'event.csv'
INTO TABLE event
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
(@GlobalEventID, @Day, @MonthYear, ...)
SET
GlobalEventID = NULLIF(@GlobalEventID,''),
Day = NULLIF(@Day,''),
MonthYear = NULLIF(@MonthYear,'')
...
;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
215 次 |
| 最近记录: |