小编zin*_*rim的帖子

无法用python读取json文件.获取类型错误:json对象是'TextIOWrapper'

我正在尝试从json文件中读取.

这是我创建文件的方式:

import requests
import json
import time
from pprint import pprint

BASE_URL = "https://www.wikiart.org/en/api/2/UpdatedArtists"
artist_json_data = requests.get(BASE_URL).json()

with open('artistdata.json', 'w') as outfile:
    while artist_json_data['hasMore']:
        print(artist_json_data['paginationToken'])
        url = BASE_URL + "?paginationToken=" +artist_json_data['paginationToken']
        artist_json_data = requests.get(url).json()
        json.dump(artist_json_data, outfile, indent=4)
        time.sleep(1)
Run Code Online (Sandbox Code Playgroud)

这是我的输出的开始:

{
    "data": [
        {
            "id": "57726da5edc2cb3880b4ca54",
            "artistName": "Paul Feeley",
            "url": "paul-feeley",
            "lastNameFirst": "Feeley Paul",
            "birthDay": "/Date(-1893456000000)/",
            "deathDay": "/Date(-126230400000)/",
Run Code Online (Sandbox Code Playgroud)

当我尝试使用以下代码读取同一文件时:

from pprint import pprint

with open('artistdata.json', 'r', encoding='utf-8') as data_file:    
    data = json.loads(data_file)
    pprint(data)
Run Code Online (Sandbox Code Playgroud)

我收到了错误

TypeError: the JSON object must be …
Run Code Online (Sandbox Code Playgroud)

json python-3.x

12
推荐指数
2
解决办法
1万
查看次数

将值的 StandardScaler() 作为新列添加到 DataFrame 会返回部分 NaN

我有一个熊猫数据帧:

df['total_price'].describe()
Run Code Online (Sandbox Code Playgroud)

返回

count    24895.000000
mean       216.377369
std        161.246931
min          0.000000
25%        109.900000
50%        174.000000
75%        273.000000
max       1355.900000
Name: total_price, dtype: float64
Run Code Online (Sandbox Code Playgroud)

当我申请preprocessing.StandardScaler()时:

x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)   
Run Code Online (Sandbox Code Playgroud)

<y 具有标准化值的新列包含一些NaNs:

df[['total_price', 'new_col']].head()

    total_price new_col
0   241.95      0.158596
1   241.95      0.158596
2   241.95      0.158596
3   81.95      -0.833691
4   81.95      -0.833691

df[['total_price', 'new_col']].tail()

        total_price new_col
28167   264.0       NaN
28168   264.0       NaN
28176   94.0        NaN
28177   166.0       NaN
28178   166.0       NaN …
Run Code Online (Sandbox Code Playgroud)

python nan pandas scikit-learn

9
推荐指数
1
解决办法
1638
查看次数

从google.cloud导入存储失败:ImportError:没有名为google.cloud的模块

我在通过Python 3.6访问Google Storage时遇到问题。我正在安装:

pip install --upgrade google-cloud-storage
Run Code Online (Sandbox Code Playgroud)

这是我的Python脚本:

from google.cloud import storage

def main():
    client = storage.Client()
    bucket = client.get_bucket('my_bucket')
    blob1 = bucket.blob('my_file.json')
    blob1.upload_from_filename(filename='my_file.json')

if __name__ == "__main__":
    main()
Run Code Online (Sandbox Code Playgroud)

pip show google-cloud-storage 给我以下输出:

Name: google-cloud-storage
Version: 1.6.0
Summary: Python Client for Google Cloud Storage
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google Cloud Platform
Author-email: googleapis-publisher@google.com
License: Apache 2.0
Location: /usr/local/lib/python3.6/dist-packages
Requires: google-api-core, google-auth, google-cloud-core, requests, google-resumable-media
Run Code Online (Sandbox Code Playgroud)

知道这里有什么问题吗?

python-3.x google-cloud-storage

7
推荐指数
1
解决办法
1325
查看次数

scikit-learn StratifiedShuffleSplit KeyError 与索引

这是我的熊猫数据框lots_not_preprocessed_usd

<class 'pandas.core.frame.DataFrame'>
Index: 78718 entries, 2017-09-12T18-38-38-076065 to 2017-10-02T07-29-40-245031
Data columns (total 20 columns):
created_year              78718 non-null float64
price                     78718 non-null float64
........
decade                    78718 non-null int64
dtypes: float64(8), int64(1), object(11)
memory usage: 12.6+ MB
Run Code Online (Sandbox Code Playgroud)

头(1):

artist_name_normalized  house   created_year    description exhibited_in    exhibited_in_museums    height  images  max_estimated_price min_estimated_price price   provenance  provenance_estate_of    sale_date   sale_id sale_title  style   title   width   decade
    key                                                                             
    2017-09-12T18-38-38-076065  NaN c11 1862.0  An Album and a small Quantity of unframed Draw...   NaN NaN NaN NaN 535.031166  267.515583  845.349242  NaN …
Run Code Online (Sandbox Code Playgroud)

python-3.x pandas scikit-learn

5
推荐指数
1
解决办法
2253
查看次数

重复数据帧的行

我正在尝试重复数据帧的行.这是我的原始数据:

pd.DataFrame([
        {'col1': 1, 'col2': 11, 'col3': [1, 2] },
        {'col1': 2, 'col2': 22, 'col3': [1, 2, 3] },
        {'col1': 3, 'col2': 33, 'col3': [1] },
        {'col1': 4, 'col2': 44, 'col3': [1, 2, 3, 4] },
    ])
Run Code Online (Sandbox Code Playgroud)

这给了我

   col1  col2          col3
0     1    11        [1, 2]
1     2    22     [1, 2, 3]
2     3    33           [1]
3     4    44  [1, 2, 3, 4]
Run Code Online (Sandbox Code Playgroud)

我想根据col3中数组的长度重复行,即我想获得像这样的数据帧.

   col1  col2
0     1    11
1     1    11
2     2    22
3     2    22
4     2 …
Run Code Online (Sandbox Code Playgroud)

python pandas python-3.6

5
推荐指数
1
解决办法
210
查看次数

Azure 函数未在 M1 上运行

跑步

import logging

import azure.functions as func


def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
             status_code=200
        )

Run Code Online (Sandbox Code Playgroud)

在 vscode 中


pyenv shell 3.9.12 …
Run Code Online (Sandbox Code Playgroud)

python azure-functions apple-silicon

5
推荐指数
1
解决办法
3253
查看次数

更改 Scrapy/Splash 用户代理

如何以如下等效的方式使用 Splash 设置 Scrapy 的用户代理:

import requests
from bs4 import BeautifulSoup

ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")
Run Code Online (Sandbox Code Playgroud)

我的蜘蛛看起来像这样:

import scrapy
from scrapy_splash import SplashRequest


class ExampleSpider(scrapy.Spider):
        name = "example"
        allowed_domains = ["example.com"]
        start_urls = ["https://www.example.com/"]

        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(
                    url,
                    self.parse,
                    args={'wait': 0.5}
                )
Run Code Online (Sandbox Code Playgroud)

splash-screen scrapy web-scraping python-3.x

4
推荐指数
2
解决办法
4690
查看次数

使用Comparator和Java中的对象列表

我想通过键(String类型)对Type AVLNode的对象进行排序.我实例化了一个Comparator,并希望compareTo在String属性上应用Method.但是,我的IDE显示错误Cannot resolve method compareTo.我不明白为什么我不能compareTo在字符串上使用这个方法.

import java.util.*;

public class AVLTreeTest {
    public static void main(String[] args){

        Comparator<AVLNode>myComp2 = new Comparator<AVLNode>() {
            @Override public int compare(AVLNode n1, AVLNode n2) {
                return n1.getKey().compareTo(n2.getKey());
            }
        };

        AVLNode<String, AVLNode> a1 = new AVLNode( "test3", new Cuboid (2,3,4,5,6,7) );
        AVLNode<String, AVLNode> a2 = new AVLNode( "test2", new Cuboid (2,3,4,5,6,7) );
        AVLNode<String, AVLNode> a3 = new AVLNode( "test8", new Cuboid (2,3,4,5,6,7) );
        AVLNode<String, AVLNode> a4 = new AVLNode( "test1", new …
Run Code Online (Sandbox Code Playgroud)

java generics comparator java-8

4
推荐指数
1
解决办法
124
查看次数

将 ColumnTransformer() 结果附加到管道中的原始数据?

这是我的输入数据:

在此处输入图片说明

这是所需的输出,其中对列 r、f 和 m 进行了转换,并将结果附加到原始数据

在此处输入图片说明

这是代码:

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PowerTransformer    

df = pd.DataFrame(np.random.randint(0,100,size=(10, 3)), columns=list('rfm'))
column_trans = ColumnTransformer(
    [('r_std', StandardScaler(), ['r']),
     ('f_std', StandardScaler(), ['f']),
     ('m_std', StandardScaler(), ['m']),
     ('r_boxcox', PowerTransformer(method='box-cox'), ['r']),
     ('f_boxcox', PowerTransformer(method='box-cox'), ['f']),
     ('m_boxcox', PowerTransformer(method='box-cox'), ['m']),
    ])

transformed = column_trans.fit_transform(df)
new_cols = ['r_std', 'f_std', 'm_std', 'r_boxcox', 'f_boxcox', 'm_boxcox']

transformed_df = pd.DataFrame(transformed, columns=new_cols)
pd.concat([df, transformed_df], axis = 1)
Run Code Online (Sandbox Code Playgroud)

我还需要额外的转换器,所以我需要将原始列保留在管道中。有没有更好的方法来处理这个问题?特别是在管道中进行串联和列命名?

python pipeline pandas scikit-learn

3
推荐指数
1
解决办法
2912
查看次数

如何在每行和每列中替换表格单元格背景颜色

我想格式化CSS,如图所示:在奇数行中,第一列单元格和偶数行中第二列单元格应使用不同的背景颜色进行格式化.

示例表

我知道如何替换整行或列

tr {
border-top: 0px solid $input-border-col;
&:first-child {
    border-top: none;
}
&:nth-child(even) {background: #CCC;}
}
Run Code Online (Sandbox Code Playgroud)

但是还没有找到如何在每一行中交替的方法

css

2
推荐指数
1
解决办法
2607
查看次数