Search Console中的Google Crawler无法使用Github Page在React中找到路由

huy*_*mha 10 google-crawlers github-pages reactjs react-router react-redux

我的问题是谷歌搜索控制台中无法找到子路由阵营.

网址是https://huynhsamha.github.io/crypto,和履带可以fetch and render主页(路线/)和静态文件等/robots.txt,/favicon.ico但它无法找到子路由,这是由反应,(渲染SPA,使用Redux),如/algorithm/sha256.例如,Crawler找不到https://huynhsamha.github.io/crypto/algorithm/sha256,但可以访问它.

这是我在Google Search Console中截取的屏幕截图.

在此输入图像描述

谁能解释为什么以及如何解决我的问题?我在这里使用github react-router-dom上的react-reduxMy repository

编辑1

我也在这个问题中尝试了答案/sf/answers/3777643691/,但没有奏效.我在index.html(https://github.com/huynhsamha/crypto/blob/gh-pages/index.html)中添加了脚本,但搜索控制台仍然无法找到,因此它也无法在屏幕上呈现任何错误.

编辑2

我也在这个问题上尝试了答案/sf/answers/3782852181//sf/answers/3783368361/,但没有用.我已创建404.html文件并添加脚本作为答案指示,但它也不起作用.

编辑3

我也在这个问题中尝试了答案/sf/answers/3783090391/,创建一个简单的sitemap.xmlgooglebot可以找到这个文件并发现我在站点地图中定义的所有URL.但它也无法获取和呈现提到的URL.

chr*_*sep 5

我发现当我打开https://huynhsamha.github.io/crypto/algorithm/sha256时,我实际上收到了 404 作为响应。我认为您使用 GitHub 上托管 SPA 的解决方法404.html就是这里的问题。虽然我们人类看到您的应用程序在我们的浏览器上正确运行,但 googlebot 并不关心,只是查看响应代码并看到它已收到404. 您将需要一种不同的解决方法,该解决方法不涉及404.html直接使用作为应用程序的入口点。

尝试遵循rafrex 的解决方法,它将浏览器重定向到index.html使用,404.html同时保留原始路由,它声称 googlebot 将其注册为 a301而不是 a 404,对于您的情况,这意味着将下面的这些更改添加到您的网站,请注意下面的脚本<!-- ------Single Page Apps GitHub Pages Workaround------ -->

<!-- 404.html -->

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Cryptography</title>

    <!-- ------Single Page Apps GitHub Pages Workaround------ -->
    <script type="text/javascript">
      // Single Page Apps for GitHub Pages
      // https://github.com/rafrex/spa-github-pages
      // Copyright (c) 2016 Rafael Pedicini, licensed under the MIT License
      // ----------------------------------------------------------------------
      // This script takes the current url and converts the path and query
      // string into just a query string, and then redirects the browser
      // to the new url with only a query string and hash fragment,
      // e.g. http://www.foo.tld/one/two?a=b&c=d#qwe, becomes
      // http://www.foo.tld/?p=/one/two&q=a=b~and~c=d#qwe
      // Note: this 404.html file must be at least 512 bytes for it to work
      // with Internet Explorer (it is currently > 512 bytes)
      // If you're creating a Project Pages site and NOT using a custom domain,
      // then set segmentCount to 1 (enterprise users may need to set it to > 1).
      // This way the code will only replace the route part of the path, and not
      // the real directory in which the app resides, for example:
      // https://username.github.io/repo-name/one/two?a=b&c=d#qwe becomes
      // https://username.github.io/repo-name/?p=/one/two&q=a=b~and~c=d#qwe
      // Otherwise, leave segmentCount as 0.
      var segmentCount = 1;
      var l = window.location;
      l.replace(
        l.protocol + '//' + l.hostname + (l.port ? ':' + l.port : '') +
        l.pathname.split('/').slice(0, 1 + segmentCount).join('/') + '/?p=/' +
        l.pathname.slice(1).split('/').slice(segmentCount).join('/').replace(/&/g, '~and~') +
        (l.search ? '&q=' + l.search.slice(1).replace(/&/g, '~and~') : '') +
        l.hash
      );
    </script>
  </head>
  <body>
  </body>
</html>
Run Code Online (Sandbox Code Playgroud)
<!-- index.html -->

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <meta name="theme-color" content="#000000">
  <meta name="description" content="Cryptography Algorithms: Secure Hash Algorithm (sha256, sha512, ...), Message Digest Algorithm (md5, ripemd160), HMAC-SHA, HMAC-MD, pbkdf2, Advanced Encryption Standard (AES), Triple Data Encryption Standard, (TripleDES, DES), RC4, Rabbit, ...">
  <meta name="keywords" content="crypto, algorithms, secure hash, sha, sha512, sha256, message digest, md5, hmac-sha, aes, des, tripledes, pbkdf2, rc4, rabbit, encryption, descryption">
  <meta name="author" content="huynhsamha">

  <!-- Open Graph -->
  <meta property="fb:app_id" content="440168923127908">
  <meta property="og:url" content="https://huynhsamha.github.io/crypto">
  <meta property="og:title" content="Cryptography Algorithms">
  <meta property="og:description" content="Cryptography Algorithms: Secure Hash Algorithm (sha256, sha512, ...), Message Digest Algorithm (md5, ripemd160), HMAC-SHA, HMAC-MD, pbkdf2, Advanced Encryption Standard (AES), Triple Data Encryption Standard, (TripleDES, DES), RC4, Rabbit, ...">
  <meta property="og:type" content="website">
  <meta property="og:image" content="%PUBLIC_URL%/img/main.jpeg">
  <meta property="og:site_name" content="Cryptography">
  <meta property="og:locale" content="vi_VN">

  <!-- Twitter Card -->
  <meta name="twitter:card" content="summary">
  <meta name="twitter:site" content="@huynhsamha">
  <meta name="twitter:creator" content="@huynhsamha">
  <meta name="twitter:url" content="https://huynhsamha.github.io/crypto">
  <meta name="twitter:title" content="Cryptography Algorithms">
  <meta name="twitter:description" content="Cryptography Algorithms: Secure Hash Algorithm (sha256, sha512, ...), Message Digest Algorithm (md5, ripemd160), HMAC-SHA, HMAC-MD, pbkdf2, Advanced Encryption Standard (AES), Triple Data Encryption Standard, (TripleDES, DES), RC4, Rabbit, ...">
  <meta name="twitter:image:src" content="%PUBLIC_URL%/img/main.jpeg">

  <!--
      manifest.json provides metadata used when your web app is added to the
      homescreen on Android. See https://developers.google.com/web/fundamentals/engage-and-retain/web-app-manifest/
    -->
  <link rel="manifest" href="%PUBLIC_URL%/manifest.json">
  <link rel="shortcut icon" href="%PUBLIC_URL%/favicon.ico">
  <link rel="author" href="//github.com/huynhsamha">
  <link rel="canonical" href="//huynhsamha.github.io/crypto">
  <!--
      Notice the use of %PUBLIC_URL% in the tags above.
      It will be replaced with the URL of the `public` folder during the build.
      Only files inside the `public` folder can be referenced from the HTML.
      Unlike "/favicon.ico" or "favicon.ico", "%PUBLIC_URL%/favicon.ico" will
      work correctly both with client-side routing and a non-root public URL.
      Learn how to configure a non-root public URL by running `npm run build`.
    -->
  <link href="//fonts.googleapis.com/css?family=Open+Sans:400,600,700&amp;subset=vietnamese" rel="stylesheet">
  <link rel="stylesheet" href="%PUBLIC_URL%/css/bootstrap.min.css">
  <link rel="stylesheet" href="%PUBLIC_URL%/lib/font-awesome/css/font-awesome.min.css">

  <!-- ------Single Page Apps GitHub Pages Workaround------ -->
  <script type="text/javascript">
    // Single Page Apps for GitHub Pages
    // https://github.com/rafrex/spa-github-pages
    // Copyright (c) 2016 Rafael Pedicini, licensed under the MIT License
    // ----------------------------------------------------------------------
    // This script checks to see if a redirect is present in the query string
    // and converts it back into the correct url and adds it to the
    // browser's history using window.history.replaceState(...),
    // which won't cause the browser to attempt to load the new url.
    // When the single page app is loaded further down in this file,
    // the correct url will be waiting in the browser's history for
    // the single page app to route accordingly.
    (function(l) {
      if (l.search) {
        var q = {};
        l.search.slice(1).split('&').forEach(function(v) {
          var a = v.split('=');
          q[a[0]] = a.slice(1).join('=').replace(/~and~/g, '&');
        });
        if (q.p !== undefined) {
          window.history.replaceState(null, null,
            l.pathname.slice(0, -1) + (q.p || '') +
            (q.q ? ('?' + q.q) : '') +
            l.hash
          );
        }
      }
    }(window.location))
   </script>


  <title>Cryptography</title>

</head>

<body>
  <noscript>
    You need to enable JavaScript to run this app.
  </noscript>

  <div id="root"></div>

  <!--
      This HTML file is a template.
      If you open it directly in the browser, you will see an empty page.
      You can add webfonts, meta tags, or analytics to this file.
      The build step will place the bundled scripts into the <body> tag.
      To begin the development, run `npm start` or `yarn start`.
      To create a production bundle, use `npm run build` or `yarn build`.
    -->

  <script src="%PUBLIC_URL%/js/jquery-3.3.1.slim.min.js" type="text/javascript"></script>
  <script src="%PUBLIC_URL%/js/popper.min.js" type="text/javascript"></script>
  <script src="%PUBLIC_URL%/js/bootstrap.min.js" type="text/javascript"></script>

  <!-- Google Adsense -->
  <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>

</body>

</html>
Run Code Online (Sandbox Code Playgroud)

有关 GitHub 对单页应用程序支持的更多信息和讨论,请参见此处


Mor*_*gan 4

我浏览了你的源代码,没有看到任何令人担忧的东西;但是,我发现了一些关于类似问题的帖子(1) (2)。第二个似乎特别有帮助,所以我将在这里重复一遍。在 Reddit 上向 @Zerotorescue 喊话。

打开 Google Search Console 并转到 Crawl -> Fetch as Google 并执行获取和渲染。

将其添加到您的网站,作为 HTML 文件中标记的一部分或作为捆绑包的一部分:

https://gist.github.com/mstijak/715fa2dd3f495a98386c3ebbadbabb8c

我推荐前者,因为如果您需要使其更具可读性(无需重新编译您的应用程序),这样可以更轻松地进行更改。

将其推送到您的网站,然后进行另一次获取并显示。现在将显示阻止 Google 运行您的应用程序的错误。搜索控制台分辨率非常低,因此您可能必须增加错误的字体大小并再次获取。不用担心,Google 不介意重复拨打电话。

您可能会发现 Google 的抓取工具无法处理您的代码,因为您正在使用一些它不支持的 ES6 功能。您可以通过填充来解决这个问题。我尝试过一些东西,例如https://polyfill.io/,结果证明它并不真正支持 Googlebot,虽然它有时可能有效,但它非常不可靠。相反,我建议使用 babel-polyfill。对于每个人来说,它都会稍微增加你的捆绑包大小,但根据我的经验,它提供了最广泛的浏览器支持,而且不会让人头疼。只需打开它即可完成。

如果您使用的是 create-react-app,这是我使用的 polyfills.js 文件,您可以复制它:

https://github.com/WoWAanalyzer/WoWAnalyzer/blob/2c67a970f8bd9026fa816d31201c42eb860fe2a3/config/polyfills.js#L1

请注意,有很多注释解释了 polyfill 服务引入的所有问题,如果您使用 babel-polyfill,则不必处理这些问题。