小编Kev*_*vin的帖子

字符串相似性:Bitap究竟是如何工作的？

我试图围绕Bitap算法,但我无法理解算法步骤背后的原因.

我理解算法的基本前提,即(如果我错了,请纠正我):

Two strings:     PATTERN (the desired string)
                 TEXT (the String to be perused for the presence of PATTERN)

Two indices:     i (currently processing index in PATTERN), 1 <= i < PATTERN.SIZE
                 j (arbitrary index in TEXT)

Match state S(x): S(PATTERN(i)) = S(PATTERN(i-1)) && PATTERN[i] == TEXT[j], S(0) = 1

Run Code Online (Sandbox Code Playgroud)

在英语术语中,PATTERN.substring(0,i) 如果前一个子字符串PATTERN.substring(0, i-1)成功匹配且字符at与字符at PATTERN[i]相同,则匹配TEXT的子字符串TEXT[j].

我不明白的是这个位移实现.详细介绍这个算法的官方文章基本上已经解决了,但我似乎无法想象应该发生什么. 算法规范只是本文的前两页,但我将重点介绍重要部分:

以下是该概念的位移版本:

在此输入图像描述

以下是样本搜索字符串的T [text]:

在此输入图像描述

这是一个算法的痕迹.

在此输入图像描述

具体来说,我不明白T表的含义,以及OR当前状态下输入的原因.

如果有人能帮我理解到底发生了什么,我将不胜感激

algorithm bit-manipulation similarity

Kev*_*vin

lucky-day

18
推荐指数

2
解决办法

4594
查看次数

Google模糊搜索(又名"建议"):正在使用哪些技术？

我正在我的网络应用程序中实现搜索建议功能,并一直在查看现有的技术实现.

似乎大多数主要站点(亚马逊,Bing等)都以下列方式实现模糊搜索:

Tokenize search string in to terms
processingSearchStringSet = {}
For each term
    if exact term is NOT in index
        Get possible terms (fuzzyTerms) from levenshtein(term, 1 (or 2))
        For each term in fuzzyTerms
            if term is in index
                processingSearchStringSet.intersect(stringsIndexedByTermsSet)
    else
        processingSearchStringSet.intersect(stringsIndexedByTermsSet)

Run Code Online (Sandbox Code Playgroud)

然后,结果集成员可能按度量(例如,术语顺序保留,绝对术语位置,搜索流行度)进行排序,并且在被传递回用户之前基于该排名和预定结果集大小来保留或消除.

另一方面,谷歌的实施与此有很大不同.

具体来说,它允许搜索字符串的组成条款中出现1个以上的错误.错误阈值似乎取决于字符串中感兴趣的术语的位置,尽管它永远不会超过7.

有趣的是:

在整个术语空间中对阈值为5进行Levenstein搜索,对于用户字符串中的每个术语,将非常昂贵
即使#1是完成的,它仍然不能解释没有错误的建议

N-gram也看不到正在使用:修改一个术语,使其不包含原始术语中存在的二元组,似乎不会影响结果.

这是一个说明我的发现的例子:

Example term: "Fiftyyyy shades of grey"

Amazon suggestions: none 
(if the error count exceeds 1 on any term, the search fails)

Bing suggestions: none
(if the error count exceeds …

Run Code Online (Sandbox Code Playgroud)

language-agnostic algorithm search fuzzy-search autocomplete

Kev*_*vin

lucky-day

15
推荐指数

1
解决办法

4420
查看次数

Java日历:为什么UTC偏移被反转？

我正在努力解决时间问题,我偶然发现Java中的某些东西让我感到有些困惑.拿这个示例代码:

public static void main(String[] args)
{
    //Calendar set to 12:00 AM of the current day (Eastern Daylight Time)
    Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("GMT-4"));
    cal.set(Calendar.HOUR_OF_DAY, 0);
    cal.set(Calendar.MINUTE, 0);
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    /////

    //Calendar set to 12:00 AM of the current day (UTC time)
    Calendar utcCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
    utcCal.set(Calendar.HOUR_OF_DAY, 0);
    utcCal.set(Calendar.MINUTE, 0);
    utcCal.set(Calendar.SECOND, 0);
    utcCal.set(Calendar.MILLISECOND, 0);
    /////

    long oneHourMilliseconds = 3600000;
    System.out.println((cal.getTimeInMillis() - utcCal.getTimeInMillis()) / oneHourMilliseconds);
}

Run Code Online (Sandbox Code Playgroud)

我可视化算法,用于计算cal以2种形式中的1种表示的时间:

计算Epoch的毫秒数,添加偏移量(加-4)
计算(Epoch + offset)的毫秒数.所以从(Epoch - 4 * oneHourMilliseconds)的毫秒数#.

这两种算法都应该产生一个落后4小时的结果 …

java timezone datetime

Kev*_*vin

lucky-day

5
推荐指数

1
解决办法

1350
查看次数

Internet Explorer:userData行为的官方状态？

我在使用IE9中的userData行为时遇到了麻烦,我在遇到不支持Web存储规范的IE版本时会使用它.

具体来说,执行.save()函数时似乎没有保存值(事实上,甚至没有userData文件夹C:\Users\USERNAME\AppData\Roaming\Microsoft\Internet Explorer\UserData ,也没有在执行.save()时创建).

我做了一些研究,从各种网站上的一些评论中得出的共识是,它似乎在IE9中被禁用了.

是否有某种官方声明表达这一点？如果是这样,有没有办法测试给定的IE版本是否支持它(没有浏览器嗅探)？

javascript css internet-explorer

Kev*_*vin

lucky-day

4
推荐指数

1
解决办法

1742
查看次数

非规范化:多少钱？

我已经为我正在"按书"构建的网络应用程序设计了数据库.也就是说,我:

创建了包含应用程序实体,属性和关系的ER图
将ER图转换为模式
将模式转换为"无模式"形式以使用(数据库是Cassandra(NoSQL)数据库)对数据库建模.

一切进展顺利(到目前为止).我之前已经非常规化了很好的结果,并且我正在实施应用程序的一部分,它将使用尚未非规范化的数据.我预测,对于这个特定部分这样做会大大提高性能(从1 Column_Family(关系世界中的"表")而不是7).

但是,我担心我可能会非常规范化.如果我要对相关部分这样做,它几乎会将我的应用程序中的Column_Family/table计数减少大约20%,并且由于某种原因,我的数据库非常规化了很多.

如果应用程序最终成功,我能够让数据库设计师或管理员加入,我希望他能够确定我正在执行的非规范化对性能是必要的我是寻求(最佳情况)或至少无害(最坏情况).

在做出非规范化决策时,我应该注意哪些具体事项可能表明这样做是否会很糟糕,还是总是归结为速度与可维护性？

database database-design denormalization cassandra

Kev*_*vin

lucky-day

3
推荐指数

1
解决办法

2883
查看次数

为什么对scanf的调用是这样的？这是标准吗？

我正在完成K&R练习7.4和7.5,并遇到了一个恼人的"功能",我不相信标准状态.

根据K&R,转换规范"%c"的作用方式

"下一个输入字符(默认为1)放置在指定的位置.正常跳过空白区域被抑制;要读取下一个非空格字符,请使用%1s"

我的问题是,该声明应该是这样的:

"接下来的输入字符(默认值为1)被放置在指定的位置.然后,在连续调用scanf中再次使用%c时,正常跳过空白区域被抑制;要读取下一个非空格字符,使用%1s"

...因为这段代码:

void test1()
{
   char t1, t2;

   scanf("%c %c", &t1, &t2);
   printf("%d\n", t1);
   printf("%d\n", t2);

   //INPUT is: "b d" (without quotes)
}

Run Code Online (Sandbox Code Playgroud)

得到t1 = 98(b)和t2 = 100(d).(跳过空白)

但是,这段代码:

void test2()
{
   char t1, t2;

   scanf("%c", &t1);
   scanf("%c", &t2);
   printf("%d\n", t1);
   printf("%d\n", t2);

   //INPUT is: "b d" (without quotes)
}

Run Code Online (Sandbox Code Playgroud)

得到t1 = 98(b)和t2 = 32('').(没有跳过空格)

阅读原始引用,我认为任何合理的人都会认为在同一次调用scanf(%c)期间,空格跳过被抑制.但是,情况似乎并非如此.

似乎为了获得原始功能,人们必须完全清空stdin.

这应该是这样的吗？它有记录吗？因为我环顾四周,并没有看到太多关于此的信息.

作为参考,我在C99编程.

c stdin input

Kev*_*vin

2010 11-04

2
推荐指数

1
解决办法

501
查看次数

C++:为我的类型定义了我自己的赋值运算符,现在.sort()不适用于我的类型的向量？

我有一个类(那些已经读过Accelerated C++的人可能会发现这个类很熟悉)定义如下:

class Student_info{
public:
    Student_info() : midterm(0.0), final(0.0) {};
    Student_info(std::istream& is){read(is);};

    Student_info(const Student_info& s);

    ~Student_info();

    Student_info& operator=(const Student_info& s);

    //Getters, setters, and other member functions ommited for brevity

    static int assignCount;
    static int copyCount;
    static int destroyCount;

private:
    std::string name;
    double midterm;
    double final;
    double finalGrade;
    std::vector<double> homework;

};

typedef std::vector<Student_info> stuContainer;


bool compare(const Student_info& x, const Student_info& y);

Run Code Online (Sandbox Code Playgroud)

函数calculator()使用这种类型的对象.作为函数的一部分,使用库的通用排序函数对(已声明的)Student_info对象的向量进行排序.我的程序没有超过这一点(尽管根据NetBeans没有抛出任何异常并且程序正确退出).

sort函数大量使用容器中保存的任何类型的赋值运算符,但我似乎无法找出我定义的那个错误(程序在我定义之前正常运行).根据Accelerated C++(或至少这是我解释它的方式),赋值运算符应该工作的正确方法是首先销毁左操作数,然后使用等于右操作数的值再次构造它.所以这是我的重载operator =定义:

Student_info& Student_info::operator=(const Student_info& s)
{
    if(this != &s)
    {
        this->~Student_info();
        destroyCount++;

        *this = s;
    }

    return …

Run Code Online (Sandbox Code Playgroud)

c++ sorting copy-constructor assignment-operator

Kev*_*vin

lucky-day

2
推荐指数

1
解决办法

721
查看次数

C++:为什么我对"std :: uninitialized_copy"的调用不起作用？

我构建了一个简单的类,它应该模仿std :: string类的功能(作为练习!):

#ifndef _STR12_1_H
#define _STR12_1_H

#include <string>
#include <iostream>

class Str12_1
{
public:

    typedef char* iterator;
    typedef const char* const_iterator;
    typedef long size_type;


    Str12_1();
    Str12_1(const Str12_1& str);
    Str12_1(const char *p);
    Str12_1(const std::string& s);

    size_type size() const;

    //Other member functions


private:
    iterator first;
    iterator onePastLast;
    iterator onePastAllocated;
};

Run Code Online (Sandbox Code Playgroud)

为了避免与"new"相关的开销(并增加我对<memory>标题的熟悉程度),我选择使用库的allocator模板类为我的字符串分配内存.以下是我在复制构造函数中使用它的示例:

#include <memory>
#include <algorithm>

using std::allocator;
using std::raw_storage_iterator;
using std::uninitialized_copy;


Str12_1::Str12_1(const Str12_1& str)
{
    allocator<char> charAlloc;
    first = charAlloc.allocate(str.size());
    onePastLast = onePastAllocated = first + str.size();
    *onePastLast = …

Run Code Online (Sandbox Code Playgroud)

c++ initialization allocator

Kev*_*vin

2011 02-02

2
推荐指数

1
解决办法

1105
查看次数

Servlet 3.0:无法发送异步响应？

我无法为用户建立AsyncContexts并使用它们向他们推送通知.在页面加载时,我有一些jQuery代码来发送请求:

$.post("TestServlet",{
    action: "registerAsynchronousContext"
        },function(data, textStatus, jqXHR){
            alert("Server received async request"); //Placed here for debugging   
  }, "json");

Run Code Online (Sandbox Code Playgroud)

在"TestServlet"中,我在doPost方法中有这个代码:

HttpSession userSession = request.getSession();
String userIDString = userSession.getAttribute("id").toString();

String paramAction = request.getParameter("action");

if(paramAction.equals("registerAsynchronousContext"))
{              
    AsyncContext userAsyncContext = request.startAsync();

    HashMap<String, AsyncContext> userAsynchronousContextHashMap = (HashMap<String, AsyncContext>)getServletContext().getAttribute("userAsynchronousContextHashMap");
    userAsynchronousContextHashMap.put(userIDString, userAsyncContext);
    getServletContext().setAttribute("userAsynchronousContextHashMap", userAsynchronousContextHashMap);

    System.out.println("Put asynchronous request in global map");
}

    //userAsynchronousContextHashMap is created by a ContextListener on the start of the web-app

Run Code Online (Sandbox Code Playgroud)

但是,根据Opera Dragonfly(像Firebug这样的调试工具),服务器在发送请求后大约30000ms会发送一个HTTP 500响应.

使用userAsyncContext.getResponse().getWriter().print(SOME_JSON)创建的任何响应,并在浏览器未收到HTTP 500响应之前发送,我不知道为什么.只有当处理AsyncContext的"if"语句中的所有代码都不存在时,浏览器才会使用常规响应对象发送响应(response.print(SOME_JSON)).

有人可以帮我吗？我有一种感觉,这是因为我误解了异步API的工作原理.我认为我可以将这些AsyncContexts存储在全局映射中,然后检索它们并使用它们的响应对象将内容推送到客户端.但是,似乎AsyncContexts不能写回客户端.

任何帮助都会被贬低.

java asynchronous comet java-ee servlet-3.0

Kev*_*vin

lucky-day

1
推荐指数

1
解决办法

2713
查看次数

自我复制程序

我在质疑我对Accelerated C++的最后一个练习的解决方案:

写一个自我复制的程序.这样的程序是没有输入的程序,并且在运行时,在标准输出流上写入其自己的源文本的副本.

我的解决方案

using std::string;
using std::cout;
using std::endl;
using std::ifstream;
using std::getline;

void selfReproduce16_1()
{
    ifstream thisFile("C:\\Users\\Kevin\\Documents\\NetBeansProjects\\Accelerated_C++_Exercises\\Chapter_16.cpp", ifstream::in);

    string curLine;

    bool foundHeader = false;

    while(getline(thisFile, curLine))
    {
        if(!curLine.compare("void selfReproduce16_1()") || foundHeader)
        {
            foundHeader = true;
            cout << curLine << endl;
        }

    }

}

Run Code Online (Sandbox Code Playgroud)

这仅打印出解决方案的源文本(此功能).这是他们想到的解决方案吗？

我想要一个动态解决方案,不需要硬编码源文件的位置.但是,我不知道在运行时自动获取源文件位置的方法.

与此相关的另一点是包含"包含"文件,并且(当遇到函数调用时),自动获取存储函数的源文件的位置.对我来说,这将是一个真正的"自我复制" "节目.

这在C++中是否可行？如果是这样,怎么样？

c++ runtime dynamic quine

Kev*_*vin

2014 01-01

0
推荐指数

1
解决办法

3689
查看次数