相关疑难解决方法(0)

除了XHTML自包含标记之外,RegEx匹配开放标记

我需要匹配所有这些开始标记:

<p>
<a href="foo">
Run Code Online (Sandbox Code Playgroud)

但不是这些:

<br />
<hr class="foo" />
Run Code Online (Sandbox Code Playgroud)

我想出了这个,并希望确保我做对了.我只抓住了a-z.

<([a-z]+) *[^/]*?>
Run Code Online (Sandbox Code Playgroud)

我相信它说:

  • 找一个小于,然后
  • 然后,查找(并捕获)az一次或多次
  • 然后找到零个或多个空格
  • 找到任何字符零次或多次,贪婪/,然后
  • 找到一个大于

我有这个权利吗?更重要的是,你怎么看?

html regex xhtml

1323
推荐指数
36
解决办法
270万
查看次数

Pandas read_xml() method test strategies

Interestingly, pandas I/O tools does not maintain a read_xml() method and the counterpart to_xml(). However, read_json proves tree-like structures can be implemented for dataframe import and read_html for markup formats.

Now, if the pandas team does consider such a read_xml method for a future pandas version, what implementation would they pursue: parsing with built-in xml.etree.ElementTree with its iterfind() or iterparse() functions or the third-party module, lxml with its XPath 1.0 and XSLT 1.0 methods?

下面是我在一个简单,扁平,以元素为中心的XML输入上的四种方法类型的测试运行.所有这些都设置为root的任何二级子级的基因化解析,并且每个方法应该产生完全相同的pandas数据帧.除了pd.Dataframe()字典列表上的最后一次调用之外的所有内容.该XSLT转换的方法,以XML为CSV铸造StringIO()在 …

python xml xslt xpath pandas

77
推荐指数
1
解决办法
5525
查看次数

如何在python中将xml文件转换为数据框或csv输出

我有这个xml文件想要在python中将内容转换为csv文件的数据框:

<?xml version="1.0" encoding="utf-8"?>
<dashboardreport name="jvm_report" version="7.0.21.1017" reportdate="2018-08-08T10:37:01.510-04:00" description="">
  <source name="CORP_GTM">
    <filters summary="from Jul-30 23:40 to Jul-31 02:40">
      <filter>tf:CustomTimeframe?1533008450802:1533019250802</filter>
    </filters>
  </source>
  <reportheader>
    <reportdetails>
      <user>test1</user>
    </reportdetails>
  </reportheader>
  <data>
    <chartdashlet name="jvm_mem_percent" description="" showabsolutevalues="false">
      <measures structuretype="tree">
        <measure measure="Memory Utilization - Memory Utilization (split by Agent)" color="#800080" aggregation="Maximum" unit="%" thresholds="false" drawingorder="1">
          <measure measure="Memory Utilization - test@server1" color="#7aebd0" aggregation="Maximum" unit="%" thresholds="false">
            <measurement timestamp="1533008460000" avg="11.116939544677734" min="11.007165908813477" max="11.143875122070312" sum="66.7016372680664" count="6"></measurement>
            <measurement timestamp="1533008520000" avg="11.204706827799479" min="11.144883155822754" max="11.268420219421387" sum="67.22824096679688" count="6"></measurement>
          </measure>
          <measure measure="Memory Utilization - test@server2" color="#a6f2e0" aggregation="Maximum" unit="%" thresholds="false"> …
Run Code Online (Sandbox Code Playgroud)

python

6
推荐指数
1
解决办法
1006
查看次数

将 .iqy 文件中的数据导入 Pandas

我有几个从 Sharepoint 查询数据的 .iqy 文件。我需要在 Python Pandas 中组合和处理这些。Python有办法做到这一点吗?我知道 Python Sharepoint 库存在,但我试图避免通过 Python 设置自己的连接,而是依赖 .iqy 文件。有任何想法吗?

为了回答这个问题,假设该表如下所示:

+------+------+
| col1 | col2 |
+------+------+
|    1 |    2 |
|    3 |    4 |
+------+------+
Run Code Online (Sandbox Code Playgroud)

此外,我愿意接受非 Python 解决方案,以获取自动运行 .iqy 查询并将数据转换为 Python 可读格式(例如 .csv)的方法。但不确定这种方法是什么样的

python sharepoint pandas

5
推荐指数
1
解决办法
5949
查看次数

Python:将 XML 提取到 DataFrame (Pandas)

有一个如下所示的 XML 文件:

?<?xml version="1.0" encoding="utf-8"?>
<comments>
<row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" />
<row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" />
<row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" />
</comments>
Run Code Online (Sandbox Code Playgroud)

我想要做的是将 ID、Text 和 CreationDate 列提取到 Pandas DF 中,我尝试了以下操作:

import xml.etree.cElementTree as et
import pandas as pd
path = '/.../...'
dfcols = ['ID', 'Text', 'CreationDate']
df_xml = pd.DataFrame(columns=dfcols)

root = et.parse(path)
rows = root.findall('.//row')
for row in rows:
    ID = row.find('Id')
    text = row.find('Text')
    date = row.find('CreationDate')
    print(ID, text, date) …
Run Code Online (Sandbox Code Playgroud)

python xml dataframe pandas

5
推荐指数
1
解决办法
1万
查看次数

将 XML 文件读取到 Pandas DataFrame

有人可以帮助将以下 XML 文件转换为 Pandas 数据框:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
	<bathrooms type="dict">
		<n35237 type="number">1.0</n35237>
		<n32238 type="number">3.0</n32238>
		<n44699 type="number">nan</n44699>
	</bathrooms>
	<price type="dict">
		<n35237 type="number">7020000.0</n35237>
		<n32238 type="number">10000000.0</n32238>
		<n44699 type="number">4128000.0</n44699>
	</price>
	<property_id type="dict">
		<n35237 type="number">35237.0</n35237>
		<n32238 type="number">32238.0</n32238>
		<n44699 type="number">44699.0</n44699>
	</property_id>
</root>
Run Code Online (Sandbox Code Playgroud)

它应该是这样的——

输出

这是我写的代码:-

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('real_state.xml')
root = tree.getroot()

dfcols = ['property_id', 'price', 'bathrooms']
df_xml = pd.DataFrame(columns=dfcols)

for node in root:
    property_id = node.attrib.get('property_id')
    price = node.attrib.get('price')
    bathrooms = node.attrib.get('bathrooms')

    df_xml = df_xml.append(
            pd.Series([property_id, …
Run Code Online (Sandbox Code Playgroud)

elementtree pandas

5
推荐指数
1
解决办法
2万
查看次数

更有效地将 xml 文件转换为数据框

我正在尝试将大型 (53MB) XML 文件加载到 Pandas 数据帧中。这里有 3 行实际数据(来自 NTSB 航空事故报告的公共数据库),但实际文件有 77257 行:

<?xml version="1.0"?>
<DATA xmlns="http://www.ntsb.gov">
<ROWS>
    <ROW EventId="20150901X74304" InvestigationType="Accident" AccidentNumber="GAA15CA244" EventDate="09/01/2015" Location="Truckee, CA" Country="United States" Latitude="" Longitude="" AirportCode="" AirportName="" InjurySeverity="" AircraftDamage="" AircraftCategory="" RegistrationNumber="N786AB" Make="JOE SALOMONE" Model="SUPER CUB SQ2" AmateurBuilt="" NumberOfEngines="" EngineType="" FARDescription="" Schedule="" PurposeOfFlight="" AirCarrier="" TotalFatalInjuries="" TotalSeriousInjuries="" TotalMinorInjuries="" TotalUninjured="" WeatherCondition="" BroadPhaseOfFlight="" ReportStatus="Preliminary" PublicationDate=""/>
    <ROW EventId="20150901X92332" InvestigationType="Accident" AccidentNumber="CEN15LA392" EventDate="08/31/2015" Location="Houston, TX" Country="United States" Latitude="29.809444" Longitude="-95.668889" AirportCode="IWS" AirportName="WEST HOUSTON" InjurySeverity="Non-Fatal" AircraftDamage="Substantial" AircraftCategory="Airplane" RegistrationNumber="N452CS" Make="CESSNA" Model="T240" AmateurBuilt="No" NumberOfEngines="" EngineType="" FARDescription="Part 91: General …
Run Code Online (Sandbox Code Playgroud)

python xml dataframe

2
推荐指数
1
解决办法
2107
查看次数

将xml转换为pandas数据框python

我必须将 xml 文件转换为数据框熊猫。我尝试了很多模式,但结果是一样的:无,无......我错了什么?另一个图书馆更好吗?是否可能是因为我的 XML 格式?xml 文件的类型为:

<Document xmlns="xxx/zzz/yyy">  
 <Header>    
  <DocumentName>GXXXXXXXXXX</DocumentName>    
  <DocumentType>G10</DocumentType>    
  <Version>2.0.0.0</Version>    
  <Created>2018-12-11T09:00:02.987777+00:00</Created>    
  <TargetProcessingDate>2019-02-11</TargetProcessingDate>    
  <Part>      
  <CurrentPage>1</CurrentPage>      
  <TotalPages>1</TotalPages>    
  </Part>  
 </Header> 
 <Body>    
  <Accounts>      
    <Account>        
     <Type>20WE</Type>        
     <OldType>19WE</OldType>        
     <Kids>          
      <Kid>            
       <Name>marc</Name>            
       <BirthDate>2000-02-06</BirthDate>                       
       <Year>19</Year>            
       <Email>marc@xxx.com</Email>                         
      </Kid>           
     </Kids>      
    </Account>
   </Accounts> 
  </Body>  
</Document>  
Run Code Online (Sandbox Code Playgroud)

尝试过的代码之一

import xml.etree.ElementTree as ET
import pandas as pd
class XML2DataFrame:

   def __init__(self, xml_data):
        self.root = ET.XML(xml_data)

    def parse_root(self, root):
        """Return a list of dictionaries from the text and attributes of the
        children under this XML root."""
        return [parse_element(child) for child in root.getchildren()]

    def …
Run Code Online (Sandbox Code Playgroud)

python xml pandas

0
推荐指数
1
解决办法
7043
查看次数

标签 统计

python ×6

pandas ×5

xml ×4

dataframe ×2

elementtree ×1

html ×1

regex ×1

sharepoint ×1

xhtml ×1

xpath ×1

xslt ×1