could not broadcast input array from shape (20,310,310) into shape (20)

Question

could not broadcast input array from shape (20,310,310) into shape (20)

I'm trying to detect lung cancer nodules using DICOM files. The main steps in cancer detection included following steps.

1) Preprocessing
  * Converting the pixel values to Hounsfield Units (HU)
  * Resampling to an isomorphic resolution to remove variance in scanner resolution
  *Lung segmentation
2) Training the data set using preprocessed images in Tensorflow CNN
3) Testing and validation

Run Code Online (Sandbox Code Playgroud)

I followed few online tutorials to do this.

I need to combine the given solutions in

1) https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial
2) https://www.kaggle.com/sentdex/first-pass-through-data-w-3d-convnet.

I could implement the example in link two. But since it is lack ok lung segmentation and few other preprocessing steps I need to combine the steps in link one with link two. But I'm getting number of errors while doing it. Since I'm new to python can someone please help me in solving it.

有20个患者文件夹,每个患者文件夹都有多个切片,这些切片是dicom文件.

对于process_data方法,发送每个患者的slices_path和患者编号.

def process_data(slices,patient,labels_df,img_px_size,hm_slices):

try:
    label=labels_df.get_value(patient,'cancer')

    patient_pixels = get_pixels_hu(slices)

    segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])

    new_slices=[]

    segmented_lung = segment_lung_mask(segmented_lungs2, False)
    segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)
    segmented_lungs=segmented_lungs_fill-segmented_lung


    #This method returns smallest integer not less than x.
    chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)


    for slice_chunk in chunks(segmented_lungs,chunk_sizes):

            slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []

            #print (slice_chunk)
            new_slices.append(slice_chunk)

    print(len(segmented_lungs), len(new_slices))

    if len(new_slices)==HM_SLICES-1:
            new_slices.append(new_slices[-1])

    if len(new_slices)==HM_SLICES-2:
            new_slices.append(new_slices[-1])
            new_slices.append(new_slices[-1])


    if len(new_slices)==HM_SLICES+2:
            new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
            del new_slices[HM_SLICES]
            new_slices[HM_SLICES-1]=new_val

    if len(new_slices)==HM_SLICES+1:
            new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
            del new_slices[HM_SLICES]
            new_slices[HM_SLICES-1]=new_val

    print('LENGTH ',len(segmented_lungs), len(new_slices))    
except Exception as e:
    # again, some patients are not labeled, but JIC we still want the error if something
    # else is wrong with our code
    print(str(e))


#print(len(new_slices))   

if label==1: label=np.array([0,1])
elif label==0: label=np.array([1,0])
return np.array(new_slices),label

Run Code Online (Sandbox Code Playgroud)

主要方法

    # Some constants 
    #data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
    #patients = os.listdir(data_dir)
    #labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
    #patients.sort()
    #print (labels_df.head())

    much_data=[]
    much_data2=[]
    for num,patient in enumerate(patients):
        if num%100==0:
            print (num)

        try:

            slices = load_scan(data_dir + patients[num])
            img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)
            much_data.append([img_data,label])
            #much_data2.append([processed,label])
        except:
            print ('This is unlabeled data')

    np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)
    #np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2)

Run Code Online (Sandbox Code Playgroud)

预处理部分工作正常,但是当我试图将最终输出输入到卷积NN并训练数据集时,以下是我收到的错误,包括我提出的一些评论

    0
    shape hu
    (113, 512, 512)
    Resize factor
    [ 2.49557522  0.6015625   0.6015625 ]
    shape
    (282, 308, 308)
    chunk size 
    15
    282 19
    LENGTH  282 20
    Tensor("Placeholder:0", dtype=float32)
    ..........1.........
    ..........2.........
    ..........3.........
    ..........4.........
    WARNING:tensorflow:From C:\Research\Python_installation\lib\site-packages\tensorflow\python\util\tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
    Instructions for updating:
    Use `tf.global_variables_initializer` instead.
    ..........5.........
    ..........6.........
    Epoch 1 completed out of 20 loss: 0
    ..........7.........
    Traceback (most recent call last):
    File "C:\Research\LungCancerDetaction\sendbox2.py", line 436, in <module>
    train_neural_network(x)
    File "C:\Research\LungCancerDetaction\sendbox2.py", line 424, in train_neural_network
    print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
    File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 606, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
    File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 3928, in _eval_using_default_session
    return session.run(tensors, feed_dict)
    File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
    run_metadata_ptr)
    File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 968, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
    File "C:\Research\Python_installation\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
    ValueError: could not broadcast input array from shape (20,310,310) into shape (20)

Run Code Online (Sandbox Code Playgroud)

我认为这是'segmented_lungs = segmented_lungs_fill-segmented_lung'的问题

在工作示例中,

segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in patient_pixels]

Run Code Online (Sandbox Code Playgroud)

请帮我解决这个问题.一段时间以后我无法继续.如果有什么不清楚请告诉我.

以下是尝试过的整个代码.

    import numpy as np # linear algebra
    import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
    import dicom
    import os
    import scipy.ndimage
    import matplotlib.pyplot as plt
    import cv2
    import math
    import tensorflow as tf


    from skimage import measure, morphology
    from mpl_toolkits.mplot3d.art3d import Poly3DCollection

     # Some constants 
    data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
    patients = os.listdir(data_dir)
    labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
    patients.sort()
    print (labels_df.head())

    #Image pixel array watching

    for patient in patients[:10]:
        #label is to get the label  of the patient. This is what done in the .get_value method.
        label=labels_df.get_value(patient,'cancer')
        path=data_dir+patient
        slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
        #You have dicom files and they have attributes. 
        slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
        print (len(slices),slices[0].pixel_array.shape)

    #If u need to see many slices and resize the large pixelated 2D images into 150*150 pixelated images

    IMG_PX_SIZE=50
    HM_SLICES=20

    for patient in patients[:1]:
        #label is to get the label of the patient. This is what done in the .get_value method.
        label=labels_df.get_value(patient,'cancer')
        path=data_dir+patient
        slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
        #You have dicom files and they have attributes. 
        slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
        #This shows the pixel arrayed image related to the second slice of each patient

        #subplot
        fig=plt.figure()
        for num,each_slice in enumerate(slices[:16]):
            print (num)
            y=fig.add_subplot(4,4,num+1)
            #down sizing everything. Resize the imag size as their pixel values are 512*512
            new_image=cv2.resize(np.array(each_slice.pixel_array),(IMG_PX_SIZE,IMG_PX_SIZE))
            y.imshow(new_image)
        plt.show()

    print (len(patients))

    ###################################################################################

    def get_pixels_hu(slices):
        image = np.array([s.pixel_array for s in slices])
        # Convert to int16 (from sometimes int16), 
        # should be possible as values should always be low enough (<32k)
        image = image.astype(np.int16)

        # Set outside-of-scan pixels to 0
        # The intercept is usually -1024, so air is approximately 0
        image[image == -2000] = 0

        # Convert to Hounsfield units (HU)
        for slice_number in range(len(slices)):

            intercept = slices[slice_number].RescaleIntercept
            slope = slices[slice_number].RescaleSlope

            if slope != 1:
                image[slice_number] = slope * image[slice_number].astype(np.float64)
                image[slice_number] = image[slice_number].astype(np.int16)

            image[slice_number] += np.int16(intercept)

        return np.array(image, dtype=np.int16)


    #The next problem is each patient is got different number of slices . This is a performance issue.
    # Take the slices and put that into a list of slices and chunk that list of slices into fixed numer of
    #chunk of slices and averaging those chunks.

    #yield is like 'return'. It returns a generator

    def chunks(l,n):
        for i in range(0,len(l),n):
            #print ('Inside yield')
            #print (i)
            yield l[i:i+n]

    def mean(l):
        return sum(l)/len(l)


    def largest_label_volume(im, bg=-1):
        vals, counts = np.unique(im, return_counts=True)

        counts = counts[vals != bg]
        vals = vals[vals != bg]

        if len(counts) > 0:
            return vals[np.argmax(counts)]
        else:
            return None


    def segment_lung_mask(image, fill_lung_structures=True):

        # not actually binary, but 1 and 2. 
        # 0 is treated as background, which we do not want
        binary_image = np.array(image > -320, dtype=np.int8)+1
        labels = measure.label(binary_image)


        # Pick the pixel in the very corner to determine which label is air.
        #   Improvement: Pick multiple background labels from around the patient
        #   More resistant to "trays" on which the patient lays cutting the air 
        #   around the person in half
        background_label = labels[0,0,0]
        #Fill the air around the person
        binary_image[background_label == labels] = 2

        # Method of filling the lung structures (that is superior to something like 
        # morphological closing)

        if fill_lung_structures:
            # For every slice we determine the largest solid structure

            for i, axial_slice in enumerate(binary_image):
                axial_slice = axial_slice - 1
                labeling = measure.label(axial_slice)
                l_max = largest_label_volume(labeling, bg=0)

                if l_max is not None: #This slice contains some lung
                    binary_image[i][labeling != l_max] = 1

        binary_image -= 1 #Make the image actual binary
        binary_image = 1-binary_image # Invert it, lungs are now 1
        # Remove other air pockets insided body
        labels = measure.label(binary_image, background=0)
        l_max = largest_label_volume(labels, bg=0)

        if l_max is not None: # There are air pockets
            binary_image[labels != l_max] = 0

        return binary_image


    #Loading the files
    #Load the scans in given folder path
    def load_scan(path):
        slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
        slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
        try:
            slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
        except:
            slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)

        for s in slices:
            s.SliceThickness = slice_thickness

        return slices



    def resample(image, scan, new_spacing=[1,1,1]):
        # Determine current pixel spacing

        spacing = np.array([scan[0].SliceThickness] + scan[0].PixelSpacing, dtype=np.float32)


        resize_factor = spacing / new_spacing
        new_real_shape = image.shape * resize_factor
        new_shape = np.round(new_real_shape)
        real_resize_factor = new_shape / image.shape
        new_spacing = spacing / real_resize_factor

        print ('Resize factor')
        print (real_resize_factor)

        image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')
        print ('shape')
        print (image.shape)

        return image, new_spacing

    '''def chunks(l,n):
        for i in range(0,len(l),n):
            #print ('Inside yield')
            #print (i)
            yield l[i:i+n]

    def mean(l):
        return sum(l)/len(l)'''

    #processing data
    def process_data(slices,patient,labels_df,img_px_size,hm_slices):
         #for patient in patients[:10]:
                #label is to get the label of the patient. This is what done in the .get_value method.
        try:
            label=labels_df.get_value(patient,'cancer')
            print ('label process data')
            print (label)
            #path=data_dir+patient
            #slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
            #You have dicom files and they have attributes. 
            slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
            #This shows the pixel arrayed image related to the second slice of each patient
            patient_pixels = get_pixels_hu(slices)
            print ('shape hu')
            print (patient_pixels.shape)
            segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])
            #print ('Pix shape')
            #print (segmented_lungs2.shape)

            #segmented_lungs=np.array(segmented_lungs2).tolist()

            new_slices=[]

            segmented_lung = segment_lung_mask(segmented_lungs2, False)
            segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)
            segmented_lungs=segmented_lungs_fill-segmented_lung


            #print ('length of segmented lungs')
            #print (len(segmented_lungs))
            #print ('Shape of segmented lungs......................................')
            #print (segmented_lungs.shape)
            #print ('hiiii')
            #segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in segmented_lungs3]
            #print ('bye')
            #print ('length of slices')
            #print (len(slices))
            #print ('shape of slices')
            #print (slices.shape)


            #print (each_slice.pixel_array)

            #This method returns smallest integer not less than x.
            chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)

            print ('chunk size ')
            print (chunk_sizes)

            for slice_chunk in chunks(segmented_lungs,chunk_sizes):

                    slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []

                    #print (slice_chunk)
                    new_slices.append(slice_chunk)

            print(len(segmented_lungs), len(new_slices))

            if len(new_slices)==HM_SLICES-1:
                    new_slices.append(new_slices[-1])

            if len(new_slices)==HM_SLICES-2:
                    new_slices.append(new_slices[-1])
                    new_slices.append(new_slices[-1])

            if len(new_slices)==HM_SLICES-3:
                    new_slices.append(new_slices[-1])
                    new_slices.append(new_slices[-1])
                    new_slices.append(new_slices[-1])

            if len(new_slices)==HM_SLICES+2:
                    new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
                    del new_slices[HM_SLICES]
                    new_slices[HM_SLICES-1]=new_val

            if len(new_slices)==HM_SLICES+1:
                    new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
                    del new_slices[HM_SLICES]
                    new_slices[HM_SLICES-1]=new_val

            if len(new_slices)==HM_SLICES+3:
                    new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
                    del new_slices[HM_SLICES]
                    new_slices[HM_SLICES-1]=new_val

            print('LENGTH ',len(segmented_lungs), len(new_slices))    
        except Exception as e:
            # again, some patients are not labeled, but JIC we still want the error if something
            # else is wrong with our code
            print(str(e))


        #print(len(new_slices))   

        if label==1: label=np.array([0,1])
        elif label==0: label=np.array([1,0])
        return np.array(new_slices),label



    # Some constants 
    #data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
    #patients = os.listdir(data_dir)
    #labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
    #patients.sort()
    #print (labels_df.head())

    much_data=[]
    much_data2=[]
    for num,patient in enumerate(patients):
        if num%100==0:
            print (num)

        try:

            slices = load_scan(data_dir + patients[num])
            img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)
            much_data.append([img_data,label])
            #much_data2.append([processed,label])
        except:
            print ('This is unlabeled data')

    np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)
    #np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2)

    IMG_SIZE_PX = 50
    SLICE_COUNT = 20

    n_classes=2
    batch_size=10

    x = tf.placeholder('float')
    y = tf.placeholder('float')

    keep_rate = 0.8

    def conv3d(x, W):
        return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')

    def maxpool3d(x):
        #                        size of window         movement of window as you slide about
        return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME')

    def convolutional_neural_network(x):
        #                # 5 x 5 x 5 patches, 1 channel, 32 features to compute.
        weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),
                   #       5 x 5 x 5 patches, 32 channels, 64 features to compute.
                   'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),
                   #                                  64 features
                   'W_fc':tf.Variable(tf.random_normal([54080,1024])),
                   'out':tf.Variable(tf.random_normal([1024, n_classes]))}

        biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
                   'b_conv2':tf.Variable(tf.random_normal([64])),
                   'b_fc':tf.Variable(tf.random_normal([1024])),
                   'out':tf.Variable(tf.random_normal([n_classes]))}

        #                            image X      image Y        image Z
        x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1])

        conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])
        conv1 = maxpool3d(conv1)


        conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])
        conv2 = maxpool3d(conv2)

        fc = tf.reshape(conv2,[-1, 54080])
        fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
        fc = tf.nn.dropout(fc, keep_rate)

        output = tf.matmul(fc, weights['out'])+biases['out']

        return output


    much_data = np.load('muchdata-50-50-20.npy')
    # If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this
    train_data = much_data[:-4]
    validation_data = much_data[-4:]


    def train_neural_network(x):
        print ('..........1.........')
        prediction = convolutional_neural_network(x)
        print ('..........2.........')
        #cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )
        cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
        print ('..........3.........')
        optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)
        print ('..........4.........')
        hm_epochs = 20
        with tf.Session() as sess:
            sess.run(tf.initialize_all_variables())

            successful_runs = 0
            total_runs = 0
            print ('..........5.........')
            for epoch in range(hm_epochs):
                epoch_loss = 0
                for data in train_data:
                    total_runs += 1
                    try:
                        X = data[0]
                        Y = data[1]
                        _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
                        epoch_loss += c
                        successful_runs += 1
                    except Exception as e:
                        # I am passing for the sake of notebook space, but we are getting 1 shaping issue from one 
                        # input tensor. Not sure why, will have to look into it. Guessing it's
                        # one of the depths that doesn't come to 20.
                        pass
                        #print(str(e))
                print ('..........6.........')
                print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss)
                print ('..........7.........')
                correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
                accuracy = tf.reduce_mean(tf.cast(correct, 'float'))

                print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))

            print('Done. Finishing accuracy:')
            print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))

            print('fitment percent:',successful_runs/total_runs)


    print (x)


    # Run this locally:
    train_neural_network(x)

Run Code Online (Sandbox Code Playgroud)

PS:resample(),segment_lung_mask()方法可以从链接1中找到.

Answer 1

Ste*_*hen 4

对于培训，您有

for data in train_data:
    total_runs += 1
    try:
        X = data[0]
        Y = data[1]
        _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})

Run Code Online (Sandbox Code Playgroud)

因此 x 和 y 分别是的单行的前两个元素train_data。

但是，在计算准确性时，您有

print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))

Run Code Online (Sandbox Code Playgroud)

因此 x 是的所有行的第一个元素validation_data，它的维度为(20,310,310)，它不能广播到维度的占位符(20)。y 也是如此。（广播意味着，如果你给它一个维度张量，(20, 310)它就会知道取出 310 列中的每一列并将其分别提供给占位符。它无法弄清楚如何处理的张量(20, 310, 310)。）

顺便说一句，当您声明占位符时，最好指定它们的尺寸，None根据单独示例的数量使用尺寸。这样，当尺寸不匹配时，程序可以向您发出警告。

/sf/answers/2934124651/ 中的一些建议可能会有所帮助。 (2认同)

归档时间：	7 年，11 月前
查看次数：	1569 次
最近记录：	7 年，11 月前