小编fut*_*110的帖子

如何找到与正则表达式重叠的匹配？

>>> match = re.findall(r'\w\w', 'hello')
>>> print match
['he', 'll']

Run Code Online (Sandbox Code Playgroud)

因为\ w\w意味着两个字符,'他'和'll'是预期的.但为什么'el'和'lo' 与正则表达式不匹配？

>>> match1 = re.findall(r'el', 'hello')
>>> print match1
['el']
>>>

Run Code Online (Sandbox Code Playgroud)

python regex overlapping

fut*_*110

2012 07-11

62
推荐指数

3
解决办法

3万
查看次数

使用pyspark分组By,Rank和聚合火花数据框

我有一个看起来像这样的数据框:

A     B    C
---------------
A1    B1   0.8
A1    B2   0.55
A1    B3   0.43

A2    B1   0.7
A2    B2   0.5
A2    B3   0.5

A3    B1   0.2
A3    B2   0.3
A3    B3   0.4

Run Code Online (Sandbox Code Playgroud)

如何将列'C'转换为每列A的相对等级(更高的分数 - >更好的等级)？预期产出:

A     B    Rank
---------------
A1    B1   1
A1    B2   2
A1    B3   3

A2    B1   1
A2    B2   2
A2    B3   2

A3    B1   3
A3    B2   2
A3    B3   1

Run Code Online (Sandbox Code Playgroud)

我想要达到的最终状态是聚合列B并存储每个A的等级:

例:

B    Ranks
B1   [1,1,3]
B2   [2,2,2]
B3   [3,2,1]

Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark spark-dataframe

fut*_*110

2017 01-15

10
推荐指数

1
解决办法

2万
查看次数

在dataframe中创建字典类型列

考虑以下数据帧:

------------+--------------------+
|id|          values
+------------+--------------------+
|          39|a,a,b,b,c,c,c,c,d
|         520|a,b,c
|         832|a,a

Run Code Online (Sandbox Code Playgroud)

我想将其转换为以下DataFrame:

------------+--------------------+
|id|          values
+------------+--------------------+
|          39|{"a":2, "b": 2,"c": 4,"d": 1}
|         520|{"a": 1,"b": 1,"c": 1}
|         832|{"a": 2}

Run Code Online (Sandbox Code Playgroud)

我尝试了两种方法:

将数据帧转换为rdd.然后我将值列映射到frequancy计数器函数.但是我将rdd转换回数据帧时遇到错误
使用udf基本上做与上面相同的事情.

我想拥有一个字典列的原因是将它作为json加载到我的一个python应用程序中.

python pyspark spark-dataframe

fut*_*110

lucky-day

7
推荐指数

2
解决办法

7629
查看次数

Python:导入模块

让我说我有一个python模型fibo.py定义如下:

#Fibonacci numbers module
print "This is a statement"
def fib(n):
    a,b = 0,1
    while b < n:
        print b
        a, b = b, a+b

def fib2(n):
    a,b = 0,1
    result= []
    while(b < n):
        result.append(b)
        a, b = b, a+b
    return result

Run Code Online (Sandbox Code Playgroud)

在我的翻译会话中,我执行以下操作:

>> import fibo
This is a statement
>>> fibo.fib(10)
1
1
2
3
5
8

>>> fibo.fib2(10)
[1, 1, 2, 3, 5, 8]
>>> fibo.__name__
'fibo'
>>>

Run Code Online (Sandbox Code Playgroud)

到目前为止一直很好......解释者:

>>> from fibo import fib,fib2
This is a …

Run Code Online (Sandbox Code Playgroud)

python

fut*_*110

lucky-day

6
推荐指数

1
解决办法

521
查看次数

Django 使用bulk_update 更新所有记录

在 2.2 中，我们现在可以选择批量更新： https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-update

我有一个包含数百万行的模型，我想有效地更新所有记录。我正在尝试使用bulk_update，但这意味着我仍然需要加载内存中的所有模型对象，一一修改字段，然后使用批量更新：

我在做什么：

def migrate_event_ip_to_property(apps, schema_editor):
    Event = apps.get_model('app' 'Event')
    events = Event.objects.all()

    for event in events:
        if event.ip:
            event.properties["$ip"] = event.ip

    Event.objects.bulk_update(events, ['properties'], 10000)

Run Code Online (Sandbox Code Playgroud)

由于有数百万条记录，即使使用bulk_update，我是否可以避免执行Event.objects.all()并将所有对象加载到内存中？

django django-models

fut*_*110

lucky-day

6
推荐指数

1
解决办法

5126
查看次数

Django:urls.py中的语法错误

from django.conf.urls import patterns, include, url
from django.contrib import admin

admin.autodiscover()

urlpatterns = patterns('',
    (r'^admin/', include(admin.site.urls)),
    (r'^events/', include('events.urls')),
)

Run Code Online (Sandbox Code Playgroud)

这是我的events.urls:

from django.conf.urls.defaults import *
from events import views


urlpatterns = patterns('',
    url(r'^tonight/$', views.tonight, name='ev_tonight'),
)

Run Code Online (Sandbox Code Playgroud)

运行服务器后,我收到以下错误:

Exception Type: SyntaxError Exception Value:

invalid syntax (urls.py, line 8)

我在这里错过了什么吗？

编辑:附加跟踪球

环境:

Request Method: GET
Request URL: http://127.0.0.1:8000/admin

Django Version: 1.4
Python Version: 2.7.3
Installed Applications:
('django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.sites',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'django.contrib.admin',
 'events')
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware')


Traceback:
File …

Run Code Online (Sandbox Code Playgroud)

django django-urls

fut*_*110

2012 07-26

3
推荐指数

1
解决办法

1万
查看次数

机器学习的概率基础知识

我最近开始研究机器学习,发现我需要刷新概率基础知识,如条件概率,贝叶斯定理等.

我正在寻找在线资源,我可以快速刷新机器学习的概率概念.

我偶然发现的在线资源要么非常基础,要么太先进.

machine-learning probability

fut*_*110

lucky-day

3
推荐指数

1
解决办法

7535
查看次数

异常处理中的设计模式

我实际上正在编写一个可供多个类使用的库类.我简化了这个例子,以便说明一点.假设我有三个类:A,B和C:

public class B
{
  public static string B_Method()
  {
    string bstr = String.Empty;

    try
       {


        //Do Something

       }

    catch
        {

        //Do Something

        }

  return bstr;

}

Run Code Online (Sandbox Code Playgroud)

B是我正在写的库类.现在可以说其他两个类说A和C:

public class A
{
  public void A_Method()
  {
   string astr = B.B_Method();
  }

}

public class C
{
 public void C_Method()
  {
   string cstr = B.B_Method();
  }

}

Run Code Online (Sandbox Code Playgroud)

问题是关于异常处理.我希望两个类A和B的相应方法以它们自己的不同方式处理B_Method中发生的异常.

我寻找框架设计模式,但觉得没用.

c# design-patterns

fut*_*110

lucky-day

3
推荐指数

1
解决办法

2198
查看次数

获取每组最新的n条记录

假设我有下表：

id  coulmn_id  value    date
1      10      'a'     2016-04-01
1      11      'b'     2015-10-02
1      12      'a'     2016-07-03
1      13      'a'     2015-11-11
2      11      'c'     2016-01-10
2      23      'd'     2016-01-11
3      11      'c'     2016-01-09
3      111     'd'     2016-01-11
3      222      'c'     2016-01-10
3      333      'd'     2016-01-11

Run Code Online (Sandbox Code Playgroud)

对于 n = 3，我想为每个 id 获取最新的 n 条记录<=3。所以我将有以下输出：

id  column_id  value    date
1      10        'a'     2016-04-01
1      12        'a'     2016-07-03
1      13        'a'     2015-11-11
2      11        'c'     2016-01-10
2      23        'd'     2016-01-11
3      111       'd' …

Run Code Online (Sandbox Code Playgroud)

mysql sql greatest-n-per-group

fut*_*110

2016 07-23

2
推荐指数

1
解决办法

1357
查看次数

如何以更好的方式处理索引超出范围

我的代码给了我某些输入超出索引范围的异常。下面是有问题的代码：

string[] snippetElements = magic_string.Split('^');

string a = snippetElements[10] == null ? "" : "hello";
string b = snippetElements[11] == null ? "" : "world";

Run Code Online (Sandbox Code Playgroud)

对于该特定输入，数组snippetElements中仅包含一个元素，因此在尝试索引第10个和第11个元素时，出现了异常。

现在，我介绍了以下检查：

if (snippetElements.Length >= 11)
{
    string a = snippetElements[10] == null ? "" : "hello"; 
    string b = snippetElements[11] == null ? "" : "world";
}

Run Code Online (Sandbox Code Playgroud)

有人可以建议一种更好的方式来写这张支票。以某种方式，数字11在代码中看起来不太好。

fut*_*110

2013 02-08

1
推荐指数

2
解决办法

1万
查看次数

在if条件下检查空值的方法

我有以下代码行,这给了我NPE的麻烦

   ServeUrl = ((NameValueCollection)ConfigurationManager.GetSection("Servers")).Get(ment);

Run Code Online (Sandbox Code Playgroud)

当我以下面的方式写这篇文章时,我不再获得NPE

  if (ConfigurationManager.GetSection("Servers") != null && ((NameValueCollection)ConfigurationManager.GetSection("Servers")).Get(ment) != null)
                            {
                                ServeUrl = ((NameValueCollection)ConfigurationManager.GetSection("Servers")).Get(ment);
                            }

Run Code Online (Sandbox Code Playgroud)

Somwhow,上面的东西对我的眼睛看起来不太好.我怎么能以更好的方式写这个？

fut*_*110

lucky-day

0
推荐指数

1
解决办法

4229
查看次数