use*_*649 5 python java csv libsvm
我正在使用libsvm做一个项目,我正在准备我的数据来使用lib.如何将CSV文件转换为LIBSVM兼容数据?
CSV文件:https: //github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/data/iris.csv
在频率问题中:
如何将其他数据格式转换为LIBSVM格式?
这取决于您的数据格式.一种简单的方法是在libsvm matlab/octave接口中使用libsvmwrite.以UCI机器学习库中的CSV(逗号分隔值)文件为例.我们下载SPECTF.train.标签位于第一列.以下步骤以libsvm格式生成文件.
matlab> SPECTF = csvread('SPECTF.train'); % read a csv file
matlab> labels = SPECTF(:, 1); % labels from the 1st column
matlab> features = SPECTF(:, 2:end);
matlab> features_sparse = sparse(features); % features must be in a sparse matrix
matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse);
The tranformed data are stored in SPECTFlibsvm.train.
Alternatively, you can use convert.c to convert CSV format to libsvm format.
Run Code Online (Sandbox Code Playgroud)
但我不想使用matlab,我使用python.
我也使用JAVA找到了这个解决方案
任何人都可以推荐一种解决这个问题的方法吗?
您可以使用csv2libsvm.py转换csv到libsvm data
python csv2libsvm.py iris.csv libsvm.data 4 True
Run Code Online (Sandbox Code Playgroud)
其中4表示target index,True表示csv有标题.
最后,你可以得到libsvm.data如
0 1:5.1 2:3.5 3:1.4 4:0.2
0 1:4.9 2:3.0 3:1.4 4:0.2
0 1:4.7 2:3.2 3:1.3 4:0.2
0 1:4.6 2:3.1 3:1.5 4:0.2
...
Run Code Online (Sandbox Code Playgroud)
从 iris.csv
150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
4.7,3.2,1.3,0.2,0
4.6,3.1,1.5,0.2,0
...
Run Code Online (Sandbox Code Playgroud)
csv2libsvm.py不适用于Python3,而且它也不支持标签目标(字符串目标),我对其进行了轻微修改。现在它应该可以与 Python3 以及标签目标 w\xc4\xb1 一起使用。\n我对 Python 很陌生,所以我的代码可能不遵循最佳实践,但我希望它足以帮助某人。
\n#!/usr/bin/env python\n\n"""\nConvert CSV file to libsvm format. Works only with numeric variables.\nPut -1 as label index (argv[3]) if there are no labels in your file.\nExpecting no headers. If present, headers can be skipped with argv[4] == 1.\n\n"""\n\nimport sys\nimport csv\nimport operator\nfrom collections import defaultdict\n\ndef construct_line(label, line, labels_dict):\n new_line = []\n if label.isnumeric():\n if float(label) == 0.0:\n label = "0"\n else:\n if label in labels_dict:\n new_line.append(labels_dict.get(label))\n else:\n label_id = str(len(labels_dict))\n labels_dict[label] = label_id\n new_line.append(label_id)\n\n for i, item in enumerate(line):\n if item == \'\' or float(item) == 0.0:\n continue\n elif item==\'NaN\':\n item="0.0"\n new_item = "%s:%s" % (i + 1, item)\n new_line.append(new_item)\n new_line = " ".join(new_line)\n new_line += "\\n"\n return new_line\n\n# ---\n\ninput_file = sys.argv[1]\ntry:\n output_file = sys.argv[2]\nexcept IndexError:\n output_file = input_file+".out"\n\n\ntry:\n label_index = int( sys.argv[3] )\nexcept IndexError:\n label_index = 0\n\ntry:\n skip_headers = sys.argv[4]\nexcept IndexError:\n skip_headers = 0\n\ni = open(input_file, \'rt\')\no = open(output_file, \'wb\')\n\nreader = csv.reader(i)\n\nif skip_headers:\n headers = reader.__next__()\n\nlabels_dict = {}\nfor line in reader:\n if label_index == -1:\n label = \'1\'\n else:\n label = line.pop(label_index)\n\n new_line = construct_line(label, line, labels_dict)\n o.write(new_line.encode(\'utf-8\'))\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
10595 次 |
| 最近记录: |