标签: gpu

片段着色器颜色插值：细节和硬件支持

我知道使用一个非常简单的顶点着色器

attribute vec3 aVertexPosition;
attribute vec4 aVertexColor;
uniform mat4 uMVMatrix;
uniform mat4 uPMatrix;
varying vec4 vColor;

void main(void) {
    gl_Position = uPMatrix * uMVMatrix * vec4(aVertexPosition, 1.0);
    vColor = aVertexColor;
}

Run Code Online (Sandbox Code Playgroud)

和一个非常简单的片段着色器，比如

precision mediump float;
varying vec4 vColor;

void main(void) {
    gl_FragColor = vColor;
}

Run Code Online (Sandbox Code Playgroud)

绘制一个带有红色、蓝色和绿色顶点的三角形最终会得到一个像这样的三角形

内插三角形

我的问题是：

对属于一个三角形（或图元）的片段颜色进行插值的计算是否在 GPU 上并行进行？
在三角形内插入片段颜色的算法和硬件支持是什么？

hardware opengl gpu glsl fragment-shader

Far*_*zad

2014 02-22

0
推荐指数

1
解决办法

2124
查看次数

如何获取整数值的 NVIDIA 核心温度？

我正在参加 Arduino 微控制器课程，并且正在研究我的期末项目：根据外壳温度工作的自动化计算机冷却系统。

我无法使用以下来源获取 NVIDIA GPU 核心温度：此 MSDN 链接或此 NVIDIA 链接。如何获取 GPU 的温度值？

我的 C# 知识很基础，我对 MSDN 中的手册或代码示例摸不着头脑。

c# gpu arduino nvidia temperature

Men*_*kes

2014 04-08

0
推荐指数

1
解决办法

2006
查看次数

如何有效地将向量重复到 cuda 中的矩阵？

我想在cuda中重复一个向量来形成一个矩阵，避免太多的memcopy。向量和矩阵都在 GPU 上分配。

例如：

我有一个向量：

a = [1 2 3 4]

Run Code Online (Sandbox Code Playgroud)

将其展开为矩阵：

b = [1 2 3 4;
     1 2 3 4;
     .......
     1 2 3 4]

Run Code Online (Sandbox Code Playgroud)

我尝试过的是分配 b 的每个元素。但这涉及到大量的 GPU 内存到 GPU 内存的复制。

我知道这在 matlab 中很容易（使用 repmat），但是如何在 cuda 中有效地做到这一点？我没有在 cublas 中找到任何常规。

c++ cuda gpu gpgpu matrix

rea*_*ead

2014 08-23

0
推荐指数

1
解决办法

909
查看次数

gpugems3 中的前缀扫描 CUDA 示例代码是否正确？

我在 GPU Gems 3, Chapter 39: Parallel Prefix Sum (Scan) with CUDA一书中写了一段代码来调用内核。

然而，我得到的结果是一堆负数而不是前缀扫描。

我的内核调用是错误的还是 GPU Gems 3 书中的代码有问题？

这是我的代码：

#include <stdio.h>
#include <sys/time.h>
#include <cuda.h>

__global__ void kernel(int *g_odata, int  *g_idata, int n, int dim)
{
    extern __shared__ int temp[];// allocated on invocation
    int thid = threadIdx.x;
    int offset = 1;

    temp[2*thid] = g_idata[2*thid]; // load input into shared memory
    temp[2*thid+1] = g_idata[2*thid+1];
    for (int d = n>>1; d > 0; d >>= 1) // build sum in place …

Run Code Online (Sandbox Code Playgroud)

cuda gpu nvidia prefix-sum

dib*_*bid

2019 11-23

0
推荐指数

1
解决办法

1753
查看次数

不能在 Tensorflow 中使用 GPU

我已经安装了 CUDA 7.5 和 cuDNN 5.0 的 tensorflow。我的显卡是 NVIDIA Geforce 820M，功能为 2.1。但是，我收到此错误。

Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Device mapping: no known devices.

Run Code Online (Sandbox Code Playgroud)

有没有办法在 2.1 功能上运行 GPU？我在网上搜了一下，发现是cuDNN需要这个能力，那么安装较早版本的cuDNN是否可以让我使用GPU？

gpu compute-capability tensorflow

Kee*_*nan

2020 12-31

0
推荐指数

1
解决办法

3717
查看次数

安装 Tensorflow 期间 cuda 内核不匹配

我正在按照官方页面上的说明和“验证您的安装”步骤安装 Tensorflow 。

>>> sess = tf.Session()
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU …

Run Code Online (Sandbox Code Playgroud)

python cuda gpu tensorflow cudnn

use*_*495

lucky-day

0
推荐指数

1
解决办法

1790
查看次数

Pyomo 解决了 NVIDIA Cuda 问题

我想知道是否有办法使用 NVIDIA Cuda 通过 GPU 求解 Pyomo 具体模型。

我查看了https://developer.nvidia.com/how-to-cuda-python，并看到了有关它的视频。事实证明你的输入参数是否可以被 numpy 识别，例如；np.float32、np.float64 等...可以通过 GPU 进行编译/求解，

我们使用一个函数来创建所有模型并用以下方法求解：

optim = SolverFactory('glpk')
optim = setup_solver(optim, logfile=log_filename)
result = optim.solve(prob, tee=True)

Run Code Online (Sandbox Code Playgroud)

在这种情况下，我们求解函数的输入将是 prob（一个 pyomo 具体模型）。有没有办法通过 GPU 而不是 CPU 来解决这个问题？

谢谢你！

python cuda gpu numpy pyomo

oak*_*kca

2017 08-27

0
推荐指数

1
解决办法

1547
查看次数

为什么这个基数排序 CUDA 代码仅对 32 个元素进行排序？

当我在 Visual Studio 15 中执行此代码时，它仅对 32 个元素进行排序。如果我将 WSIZE 设置为大于 32 或小于 32，它会显示与输出相同的未排序元素。谁能帮帮我吗？

我的系统信息。

处理器 - Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz、2400 Mhz、2 个核心、4 个逻辑处理器

内存 - 8GB

专用显卡 - NVIDIA GeForce 940M 4GB（384 个 CUDA 核心）

这是 WSIZE 设置为 32 的输出https://i.stack.imgur.com/mjgZI.jpg

这是 WSIZE 设置为 19 的输出https://i.stack.imgur.com/pXiG5.jpg

这是 WSIZE 设置为 50 的输出https://i.stack.imgur.com/8rPYo.jpg

#pragma once
#ifdef __INTELLISENSE__
void __syncthreads();
#endif

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <Windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <chrono>

using namespace std;
using namespace …

Run Code Online (Sandbox Code Playgroud)

sorting cuda gpu radix-sort visual-studio

作者

2018 07-31

0
推荐指数

1
解决办法

1286
查看次数

OpenCL 一个平台中只有一台设备

我从一开始就在学习 OpenCL，但对平台 = 主机 + 设备的想法感到困惑。在我的工作 PC 中，检测到 2 个平台：平台 0 只有 CPU，平台 1 只有 NVIDIA GPU。检测到的平台

这台电脑实际上也有一个英特尔 GPU，但我认为它不支持 OpenCL，所以它没有在这里显示。我的问题是：在 OpenCL 应用程序中，有主机和设备。根据我从书中的理解，通常主机和设备来自一个平台。然后在我的 PC 中，我需要使用 CPU 或 NVIDIA GPU 作为 OpenCL 应用程序的主机和设备。这是真的？

我试着自己搜索，一些答案确实帮助我了解更多关于这个主题的内容：什么是 opencl 中的主机？. 但是我的搜索并没有回答或确认关于主机和设备使用一种硬件的问题。

host gpu opencl

作者

lucky-day

0
推荐指数

1
解决办法

473
查看次数

Pytorch 运行时错误：设备类型为 cuda 的预期对象，但在调用 _th_index_select 时获得了参数 #1 'self' 的设备类型 cpu

我正在训练一个模型，该模型采用标记化的字符串，然后通过嵌入层和 LSTM。但是，输入中似乎存在错误，因为它没有通过嵌入层。

class DrugModel(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim, drug_embed_dim,
            lstm_layer, lstm_dropout, bi_lstm, linear_dropout, char_vocab_size,
            char_embed_dim, char_dropout, dist_fn, learning_rate,
            binary, is_mlp, weight_decay, is_graph, g_layer,
            g_hidden_dim, g_out_dim, g_dropout):

        super(DrugModel, self).__init__()

        # Save model configs
        self.drug_embed_dim = drug_embed_dim
        self.lstm_layer = lstm_layer
        self.char_dropout = char_dropout
        self.dist_fn = dist_fn
        self.binary = binary
        self.is_mlp = is_mlp
        self.is_graph = is_graph
        self.g_layer = g_layer
        self.g_dropout = g_dropout
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # For one-hot encoded SMILES
        if not is_mlp:
            self.char_embed = nn.Embedding(char_vocab_size, char_embed_dim,
                                           padding_idx=0) …

Run Code Online (Sandbox Code Playgroud)

gpu runtime-error embedding pytorch

Dan*_*iel

2019 12-16

0
推荐指数

1
解决办法

9732
查看次数