没有cookie或本地存储的用户识别

sla*_*197 126 javascript php fingerprinting http-headers

我正在构建一个分析工具,我现在可以从他们的用户代理获取用户的IP地址,浏览器和操作系统.

我想知道是否有可能在不使用cookie或本地存储的情况下检测同一个用户?我不期待这里的代码示例; 只是一个简单的提示,在哪里看得更远.

如果它是同一台计算机/设备,忘记提及它需要跨浏览器兼容.基本上我是在设备识别之后并不是用户.

Bab*_*aba 378

Introduction

If I understand you correctly, you need to identify a user for whom you don't have a Unique Identifier, so you want to figure out who they are by matching Random Data. You can't store the user's identity reliably because:

  • Cookies Can be deleted
  • IP address Can change
  • Browser Can Change
  • Browser Cache may be deleted

A Java Applet or Com Object would have been an easy solution using a hash of hardware information, but these days people are so security-aware that it would be difficult to get people to install these kinds of programs on their system. This leaves you stuck with using Cookies and other, similar tools.

Cookies and other, similar tools

您可以考虑构建数据配置文件,然后使用概率测试来识别可能的用户.可以通过以下某些组合生成对此有用的配置文件:

  1. IP地址
    • 真实的IP地址
    • 代理IP地址(用户经常重复使用相同的代理)
  2. 饼干
  3. Web错误(不太可靠,因为错误得到修复,但仍然有用)
    • PDF错误
    • Flash Bug
    • Java Bug
  4. 浏览器
    • 点击跟踪(许多用户在每次访问时访问同一系列的页面)
    • 浏览器指纹 - 已安装的插件(人们通常拥有各种各样独特的插件)
    • 缓存图像(人们有时删除他们的cookie但留下缓存的图像)
    • 使用Blob
    • URL(浏览器历史记录或Cookie可能包含URL中的唯一用户ID,例如/sf/users/85882611/http://www.facebook.com/barackobama?fref=ts)
    • 系统字体检测(这是一个鲜为人知但通常是唯一的密钥签名)
  5. HTML5和Javascript
    • HTML5 LocalStorage
    • HTML5地理位置API和反向地理编码
    • 架构,操作系统语言,系统时间,屏幕分辨率等
    • 网络信息API
    • 电池状态API

当然,我列出的项目只是用户可以唯一识别的几种可能方式.还有更多.

使用这组随机数据元素来构建数据配置文件,下一步是什么?

下一步是开发一些模糊逻辑,或者更好的是,开发一个人工神经网络(使用模糊逻辑).在任何一种情况下,我们的想法是训练您的系统,然后将其训练与贝叶斯推理相结合,以提高结果的准确性.

Artificial Neural Network

PHP 的NeuralMesh库允许您生成人工神经网络.要实现贝叶斯推理,请查看以下链接:

在这一点上,你可能会想:

为什么数学和逻辑看起来如此简单?

基本上,因为它不是一项简单的任务.实际上,你想要实现的是纯概率.例如,给定以下已知用户:

User1 = A + B + C + D + G + K
User2 = C + D + I + J + K + F
Run Code Online (Sandbox Code Playgroud)

当您收到以下数据时:

B + C + E + G + F + K
Run Code Online (Sandbox Code Playgroud)

你基本上问的问题是:

接收数据(B + C + E + G + F + K)实际上是User1还是User2的概率是多少?哪两场比赛有可能?

In order to effectively answer this question, you need to understand Frequency vs Probability Format and why Joint Probability might be a better approach. The details are too much to get into here (which is why I'm giving you links), but a good example would be a Medical Diagnosis Wizard Application, which uses a combination of symptoms to identify possible diseases.

Think for a moment of the series of data points which comprise your Data Profile (B + C + E + G + F + K in the example above) as Symptoms, and Unknown Users as Diseases. By identifying the disease, you can further identify an appropriate treatment (treat this user as User1).

Obviously, a Disease for which we have identified more than 1 Symptom is easier to identify. In fact, the more Symptoms we can identify, the easier and more accurate our diagnosis is almost certain to be.

Are there any other alternatives?

Of course. As an alternative measure, you might create your own simple scoring algorithm, and base it on exact matches. This is not as efficient as probability, but may be simpler for you to implement.

As an example, consider this simple score chart:

+-------------------------+--------+------------+
|        Property         | Weight | Importance |
+-------------------------+--------+------------+
| Real IP address         |     60 |          5 |
| Used proxy IP address   |     40 |          4 |
| HTTP Cookies            |     80 |          8 |
| Session Cookies         |     80 |          6 |
| 3rd Party Cookies       |     60 |          4 |
| Flash Cookies           |     90 |          7 |
| PDF Bug                 |     20 |          1 |
| Flash Bug               |     20 |          1 |
| Java Bug                |     20 |          1 |
| Frequent Pages          |     40 |          1 |
| Browsers Finger Print   |     35 |          2 |
| Installed Plugins       |     25 |          1 |
| Cached Images           |     40 |          3 |
| URL                     |     60 |          4 |
| System Fonts Detection  |     70 |          4 |
| Localstorage            |     90 |          8 |
| Geolocation             |     70 |          6 |
| AOLTR                   |     70 |          4 |
| Network Information API |     40 |          3 |
| Battery Status API      |     20 |          1 |
+-------------------------+--------+------------+

For each piece of information which you can gather on a given request, award the associated score, then use Importance to resolve conflicts when scores are the same.

Proof of Concept

For a simple proof of concept, please take a look at Perceptron. Perceptron is a RNA Model that is generally used in pattern recognition applications. There is even an old PHP Class which implements it perfectly, but you would likely need to modify it for your purposes.

Despite being a great tool, Perceptron can still return multiple results (possible matches), so using a Score and Difference comparison is still useful to identify the best of those matches.

Assumptions

  • Store all possible information about each user (IP, cookies, etc.)
  • Where result is an exact match, increase score by 1
  • Where result is not an exact match, decrease score by 1

Expectation

  1. Generate RNA labels
  2. Generate random users emulating a database
  3. Generate a single Unknown user
  4. Generate Unknown user RNA and Values
  5. The system will merge RNA information and teach the Perceptron
  6. After training the Perceptron, the system will have a set of weightings
  7. You can now test the Unknown user's pattern and the Perceptron will produce a result set.
  8. Store all Positive matches
  9. Sort the matches first by Score, then by Difference (as described above)
  10. Output the two closest matches, or, if no matches are found, output empty results

Code for Proof of Concept

$features = array(
    'Real IP address' => .5,
    'Used proxy IP address' => .4,
    'HTTP Cookies' => .9,
    'Session Cookies' => .6,
    '3rd Party Cookies' => .6,
    'Flash Cookies' => .7,
    'PDF Bug' => .2,
    'Flash Bug' => .2,
    'Java Bug' => .2,
    'Frequent Pages' => .3,
    'Browsers Finger Print' => .3,
    'Installed Plugins' => .2,
    'URL' => .5,
    'Cached PNG' => .4,
    'System Fonts Detection' => .6,
    'Localstorage' => .8,
    'Geolocation' => .6,
    'AOLTR' => .4,
    'Network Information API' => .3,
    'Battery Status API' => .2
);

// Get RNA Lables
$labels = array();
$n = 1;
foreach ($features as $k => $v) {
    $labels[$k] = "x" . $n;
    $n ++;
}

// Create Users
$users = array();
for($i = 0, $name = "A"; $i < 5; $i ++, $name ++) {
    $users[] = new Profile($name, $features);
}

// Generate Unknown User
$unknown = new Profile("Unknown", $features);

// Generate Unknown RNA
$unknownRNA = array(
    0 => array("o" => 1),
    1 => array("o" => - 1)
);

// Create RNA Values
foreach ($unknown->data as $item => $point) {
    $unknownRNA[0][$labels[$item]] = $point;
    $unknownRNA[1][$labels[$item]] = (- 1 * $point);
}

// Start Perception Class
$perceptron = new Perceptron();

// Train Results
$trainResult = $perceptron->train($unknownRNA, 1, 1);

// Find matches
foreach ($users as $name => &$profile) {
    // Use shorter labels
    $data = array_combine($labels, $profile->data);
    if ($perceptron->testCase($data, $trainResult) == true) {
        $score = $diff = 0;

        // Determing the score and diffrennce
        foreach ($unknown->data as $item => $found) {
            if ($unknown->data[$item] === $profile->data[$item]) {
                if ($profile->data[$item] > 0) {
                    $score += $features[$item];
                } else {
                    $diff += $features[$item];
                }
            }
        }
        // Ser score and diff
        $profile->setScore($score, $diff);
        $matchs[] = $profile;
    }
}

// Sort bases on score and Output
if (count($matchs) > 1) {
    usort($matchs, function ($a, $b) {
        // If score is the same use diffrence
        if ($a->score == $b->score) {
            // Lower the diffrence the better
            return $a->diff == $b->diff ? 0 : ($a->diff > $b->diff ? 1 : - 1);
        }
        // The higher the score the better
        return $a->score > $b->score ? - 1 : 1;
    });

    echo "<br />Possible Match ", implode(",", array_slice(array_map(function ($v) {
        return sprintf(" %s (%0.4f|%0.4f) ", $v->name, $v->score,$v->diff);
    }, $matchs), 0, 2));
} else {
    echo "<br />No match Found ";
}
Run Code Online (Sandbox Code Playgroud)

Output:

Possible Match D (0.7416|0.16853),C (0.5393|0.2809)
Run Code Online (Sandbox Code Playgroud)

Print_r of "D":

echo "<pre>";
print_r($matchs[0]);


Profile Object(
    [name] => D
    [data] => Array (
        [Real IP address] => -1
        [Used proxy IP address] => -1
        [HTTP Cookies] => 1
        [Session Cookies] => 1
        [3rd Party Cookies] => 1
        [Flash Cookies] => 1
        [PDF Bug] => 1
        [Flash Bug] => 1
        [Java Bug] => -1
        [Frequent Pages] => 1
        [Browsers Finger Print] => -1
        [Installed Plugins] => 1
        [URL] => -1
        [Cached PNG] => 1
        [System Fonts Detection] => 1
        [Localstorage] => -1
        [Geolocation] => -1
        [AOLTR] => 1
        [Network Information API] => -1
        [Battery Status API] => -1
    )
    [score] => 0.74157303370787
    [diff] => 0.1685393258427
    [base] => 8.9
)
Run Code Online (Sandbox Code Playgroud)

If Debug = true you would be able to see Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights.

+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| o  | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | Bias | Yin | Y  | deltaW1 | deltaW2 | deltaW3 | deltaW4 | deltaW5 | deltaW6 | deltaW7 | deltaW8 | deltaW9 | deltaW10 | deltaW11 | deltaW12 | deltaW13 | deltaW14 | deltaW15 | deltaW16 | deltaW17 | deltaW18 | deltaW19 | deltaW20 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 | deltaBias |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 0   | -1 | 0       | -1      | -1      | -1      | -1      | -1      | -1      | 1       | 1       | 1        | 1        | 1        | 1        | 1        | -1       | -1       | -1       | -1       | 1        | 1        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 19  | 1  | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
Run Code Online (Sandbox Code Playgroud)

x1 to x20 represent the features converted by the code.

// Get RNA Labels
$labels = array();
$n = 1;
foreach ( $features as $k => $v ) {
    $labels[$k] = "x" . $n;
    $n ++;
}
Run Code Online (Sandbox Code Playgroud)

Here is an online demo

Class Used:

class Profile {
    public $name, $data = array(), $score, $diff, $base;

    function __construct($name, array $importance) {
        $values = array(-1, 1); // Perception values
        $this->name = $name;
        foreach ($importance as $item => $point) {
            // Generate Random true/false for real Items
            $this->data[$item] = $values[mt_rand(0, 1)];
        }
        $this->base = array_sum($importance);
    }

    public function setScore($score, $diff) {
        $this->score = $score / $this->base;
        $this->diff = $diff / $this->base;
    }
}
Run Code Online (Sandbox Code Playgroud)

Modified Perceptron Class

class Perceptron {
    private $w = array();
    private $dw = array();
    public $debug = false;

    private function initialize($colums) {
        // Initialize perceptron vars
        for($i = 1; $i <= $colums; $i ++) {
            // weighting vars
            $this->w[$i] = 0;
            $this->dw[$i] = 0;
        }
    }

    function train($input, $alpha, $teta) {
        $colums = count($input[0]) - 1;
        $weightCache = array_fill(1, $colums, 0);
        $checkpoints = array();
        $keepTrainning = true;

        // Initialize RNA vars
        $this->initialize(count($input[0]) - 1);
        $just_started = true;
        $totalRun = 0;
        $yin = 0;

        // Trains RNA until it gets stable
        while ($keepTrainning == true) {
            // Sweeps each row of the input subject
            foreach ($input as $row_counter => $row_data) {
                // Finds out the number of columns the input has
                $n_columns = count($row_data) - 1;

                // Calculates Yin
                $yin = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    $yin += $row_data["x" . $i] * $weightCache[$i];
                }

                // Calculates Real Output
                $Y = ($yin <= 1) ? - 1 : 1;

                // Sweeps columns ...
                $checkpoints[$row_counter] = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    /** DELTAS **/
                    // Is it the first row?
                    if ($just_started == true) {
                        $this->dw[$i] = $weightCache[$i];
                        $just_started = false;
                        // Found desired output?
                    } elseif ($Y == $row_data["o"]) {
                        $this->dw[$i] = 0;
                        // Calculates Delta Ws
                    } else {
                        $this->dw[$i] = $row_data["x" . $i] * $row_data["o"];
                    }

                    /** WEIGHTS **/
                    // Calculate Weights
                    $this->w[$i] = $this->dw[$i] + $weightCache[$i];
                    $weightCache[$i] = $this->w[$i];

                    /** CHECK-POINT **/
                    $checkpoints[$row_counter] += $this->w[$i];
                } // END - for

                foreach ($this->w as $index => $w_item) {
                    $debug_w["W" . $index] = $w_item;
                    $debug_dw["deltaW" . $index] = $this->dw[$index];
                }

                // Special for script debugging
                $debug_vars[] = array_merge($row_data, array(
                    "Bias" => 1,
                    "Yin" => $yin,
                    "Y" => $Y
                ), $debug_dw, $debug_w, array(
                    "deltaBias" => 1
                ));
            } // END - foreach

            // Special for script debugging
             $empty_data_row = array();
            for($i = 1; $i <= $n_columns; $i ++) {
                $empty_data_row["x" . $i] = "--";
                $empty_data_row["W" . $i] = "--";
                $empty_data_row["deltaW" . $i] = "--";
            }
            $debug_vars[] = array_merge($empty_data_row, array(
                "o" => "--",
                "Bias" => "--",
                "Yin" => "--",
                "Y" => "--",
                "deltaBias" => "--"
            ));

            // Counts training times
            $totalRun ++;

            // Now checks if the RNA is stable already
            $referer_value = end($checkpoints);
            // if all rows match the desired output ...
            $sum = array_sum($checkpoints);
            $n_rows = count($checkpoints);
            if ($totalRun > 1 && ($sum / $n_rows) == $referer_value) {
                $keepTrainning = false;
            }
        } // END - while

        // Prepares the final result
        $result = array();
        for($i = 1; $i <= $n_columns; $i ++) {
            $result["w" . $i] = $this->w[$i];
        }

        $this->debug($this->print_html_table($debug_vars));

        return $result;
    } // END - train
    function testCase($input, $results) {
        // Sweeps input columns
        $result = 0;
        $i = 1;
        foreach ($input as $column_value) {
            // Calculates teste Y
            $result += $results["w" . $i] * $column_value;
            $i ++;
        }
        // Checks in each class the test fits
        return ($result > 0) ? true : false;
    } // END - test_class

    // Returns the html code of a html table base on a hash array
    function print_html_table($array) {
        $html = "";
        $inner_html = "";
        $table_header_composed = false;
        $table_header = array();

        // Builds table contents
        foreach ($array as $array_item) {
            $inner_html .= "<tr>\n";
            foreach ( $array_item as $array_col_label => $array_col ) {
                $inner_html .= "<td>\n";
                $inner_html .= $array_col;
                $inner_html .= "</td>\n";

                if ($table_header_composed == false) {
                    $table_header[] = $array_col_label;
                }
            }
            $table_header_composed = true;
            $inner_html .= "</tr>\n";
        }

        // Builds full table
        $html = "<table border=1>\n";
        $html .= "<tr>\n";
        foreach ($table_header as $table_header_item) {
            $html .= "<td>\n";
            $html .= "<b>" . $table_header_item . "</b>";
            $html .= "</td>\n";
        }
        $html .= "</tr>\n";

        $html .= $inner_html . "</table>";

        return $html;
    } // END - print_html_table

    // Debug function
    function debug($message) {
        if ($this->debug == true) {
            echo "<b>DEBUG:</b> $message";
        }
    } // END - debug
} // END - class
Run Code Online (Sandbox Code Playgroud)

Conclusion

Identifying a user without a Unique Identifier is not a straight-forward or simple task. it is dependent upon gathering a sufficient amount of Random Data which you are able to gather from the user by a variety of methods.

Even if you choose not to use an Artificial Neural Network, I suggest at least using a Simple Probability Matrix with priorities and likelihoods - and I hope the code and examples provided above give you enough to go on.


poz*_*ozs 27

这种技术(用于检测没有cookie的相同用户 - 甚至没有ip地址)称为浏览器指纹识别.基本上,您可以抓取有关浏览器的信息 - 使用javascript,flash或java(f.ex.安装的扩展,字体等)可以获得更好的结果.之后,如果需要,可以存储结果哈希值.

这不是绝对可靠的,但是:

83.6%的浏览器具有独特的指纹; 在启用Flash或Java的用户中,有94.2%.这不包括cookies!

更多信息:


Jus*_*der 5

上面提到的拇指印刷工作,但仍然可能遭受崩溃.

一种方法是将UID添加到与用户的每次交互的URL中.

http://someplace.com/12899823/user/profile

网站中的每个链接都使用此修饰符进行调整.它类似于ASP.Net在页面之间使用FORM数据工作的方式.


Ale*_*ler 5

你看过Evercookie吗?它可能适用于浏览器,也可能不适用.来自他们网站的摘录.

"如果用户在一个浏览器上进行烹饪并切换到另一个浏览器,只要他们仍然拥有本地共享对象cookie,cookie就会在两个浏览器中重现."