我在 scikit-learn 中使用 CountVectorizer 对特征序列进行矢量化。当它给出如下错误时我被卡住了:ValueError: np.nan is an invalid document, expected byte or unicode string。
我正在拿一个包含两列内容和情绪的示例 csv 数据集。我的代码如下:
df = pd.read_csv("train.csv",encoding = "ISO-8859-1")
X, y = df.CONTENT, df.sentiment
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print X_train, y_train
vect = CountVectorizer(ngram_range=(1,3), analyzer='word', encoding = "ISO-8859-1")
print vect
X=vect.fit_transform(X_train, y_train)
y=vect.fit(X_test)
print vect.get_feature_names()
Run Code Online (Sandbox Code Playgroud)
我得到的错误是:
File "C:/Users/HP/cntVect.py", line 28, in <module>
X=vect.fit_transform(X_train, y_train)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\feature_extraction\text.py", line 839, in fit_transform
self.fixed_vocabulary_)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\feature_extraction\text.py", line 762, in _count_vocab
for feature in analyze(doc):
File …Run Code Online (Sandbox Code Playgroud) 我正在尝试在摄像机录制的视频中检测人脸。当我使用网络摄像头视频时,它工作正常。但是,对于摄像机录制的视频,视频会旋转 -90 度。请建议我,如何获得用于人脸检测的实际视频输出?
import cv2
import sys
cascPath = sys.argv[1]
faceCascade = cv2.CascadeClassifier('C:/Users/HP/Anaconda2/pkgs/opencv-3.2.0-np112py27_204/Library/etc/haarcascades/haarcascade_frontalface_default.xml')
#video_capture = cv2.videoCapture(0)
video_capture = cv2.VideoCapture('C:/Users/HP/sample1.mp4')
w=int(video_capture.get(3))
h=int(video_capture.get(4))
#output = cv2.VideoWriter('output_1.avi',cv2.VideoWriter_fourcc('M','J','P','G'), 60,frameSize = (w,h))
while True:
ret, frame = video_capture.read()
frame = rotateImage(frame,90)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(gray, 1.3, 5)
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
#cv2.imshow('face',i)
#output.write(frame)
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video_capture.release() …Run Code Online (Sandbox Code Playgroud)