Wednesday, May 1, 2019

10 Ways to Get Google to Index Your Site (That Actually Work)

If Google doesn’t index your website, then you’re pretty much invisible. You won’t show up for any search queries, and you won’t get any organic traffic whatsoever. Zilch. Nada. Zero. Given that you’re here, I’m guessing this isn’t news to you. So let’s get straight down to business. This article teaches you how to fix any of these three problems:
  1. Your entire website isn’t indexed.
  2. Some of your pages are indexed, but others aren’t.
  3. Your newly‐published web pages aren’t getting indexed fast enough.
But first, let’s make sure we’re on the same page and fully‐understand this indexing malarkey.
 
Google discovers new web pages by crawling the web, and then they add those pages to their index. They do this using a web spider called Googlebot. Confused? Let’s define a few key terms.
  • Crawling: The process of following hyperlinks on the web to discover new content.
  • Indexing: The process of storing every web page in a vast database.
  • Web spider: A piece of software designed to carry out the crawling process at scale.
  • Googlebot: Google’s web spider.
Here’s a video from Google that explains the process in more detail:
https://www.youtube.com/watch?v=BNHR6IQJGZs When you Google something, you’re asking Google to return all relevant pages from their index. Because there are often millions of pages that fit the bill, Google’s ranking algorithm does its best to sort the pages so that you see the best and most relevant results first. The critical point I’m making here is that indexing and ranking are two different things. Indexing is showing up for the race; ranking is winning. You can’t win without showing up for the race in the first place.
 
Go to Google, then search for site:yourwebsite.com 10-ways-to-get-google-to-index-your-site-that-actually-work.png10-ways-to-get-google-to-index-your-site-that-actually-work.png This number shows roughly how many of your pages Google has indexed. If you want to check the index status of a specific URL, use the same site:yourwebsite.com/web-page-slug operator. 10-ways-to-get-google-to-index-your-site-that-actually-work-1.png10-ways-to-get-google-to-index-your-site-that-actually-work-1.png No results will show up if the page isn’t indexed. Now, it’s worth noting that if you’re a Google Search Console user, you can use the Coverage report to get a more accurate insight into the index status of your website. Just go to: Google Search Console > Index > Coverage 10-ways-to-get-google-to-index-your-site-that-actually-work-2.png10-ways-to-get-google-to-index-your-site-that-actually-work-2.png Look at the number of valid pages (with and without warnings). If these two numbers total anything but zero, then Google has at least some of the pages on your website indexed. If not, then you have a severe problem because none of your web pages are indexed.
Sidenote. Not a Google Search Console user? Sign up. It’s free. Everyone who runs a website and cares about getting traffic from Google should use Google Search Console. It’s that important.
You can also use Search Console to check whether a specific page is indexed. To do that, paste the URL into the URL Inspection tool. If that page is indexed, it’ll say “URL is on Google.” 10-ways-to-get-google-to-index-your-site-that-actually-work-3.png10-ways-to-get-google-to-index-your-site-that-actually-work-3.png If the page isn’t indexed, you’ll see the words “URL is not on Google.” 10-ways-to-get-google-to-index-your-site-that-actually-work-4.png10-ways-to-get-google-to-index-your-site-that-actually-work-4.png
 
Found that your website or web page isn’t indexed in Google? Try this:
  1. Go to Google Search Console
  2. Navigate to the URL inspection tool
  3. Paste the URL you’d like Google to index into the search bar.
  4. Wait for Google to check the URL
  5. Click the “Request indexing” button
This process is good practice when you publish a new post or page. You’re effectively telling Google that you’ve added something new to your site and that they should take a look at it. However, requesting indexing is unlikely to solve underlying problems preventing Google from indexing old pages. If that’s the case, follow the checklist below to diagnose and fix the problem. Here are some quick links to each tactic—in case you’ve already tried some:
  1. Remove crawl blocks in your robots.txt file
  2. Remove rogue noindex tags
  3. Include the page in your sitemap
  4. Remove rogue canonical tags
  5. Check that the page isn’t orphaned
  6. Fix nofollow internal links
  7. Add “powerful” internal links
  8. Make sure the page is valuable and unique
  9. Remove low‐quality pages (to optimize “crawl budget”)
  10. Build high‐quality backlinks

1) Remove crawl blocks in your robots.txt file

Is Google not indexing your entire website? It could be due to a crawl block in something called a robots.txt file. To check for this issue, go to yourdomain.com/robots.txt. Look for either of these two snippets of code:
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /
Both of these tell Googlebot that they’re not allowed to crawl any pages on your site. To fix the issue, remove them. It’s that simple. A crawl block in robots.txt could also be the culprit if Google isn’t indexing a single web page. To check if this is the case, paste the URL into the URL inspection tool in Google Search Console. Click on the Coverage block to reveal more details, then look for the “Crawl allowed? No: blocked by robots.txt” error. This indicates that the page is blocked in robots.txt. If that’s the case, recheck your robots.txt file for any “disallow” rules relating to the page or related subsection.
10-ways-to-get-google-to-index-your-site-that-actually-work-5.png10-ways-to-get-google-to-index-your-site-that-actually-work-5.png

Important page blocked from indexing in robots.txt.

Remove where necessary.

2) Remove rogue noindex tags

Google won’t index pages if you tell them not to. This is useful for keeping some web pages private. There are two ways to do it:

Method 1: meta tag

Pages with either of these meta tags in their <head> section won’t be indexed by Google:
<meta name=“robots” content=“noindex”>
<meta name=“googlebot” content=“noindex”>
This is a meta robots tag, and it tells search engines whether they can or can’t index the page.
Sidenote. The key part is the “noindex” value. If you see that, then the page is set to noindex.
To find all pages with a noindex meta tag on your site, run a crawl with Ahrefs’ Site Audit. Go to the Internal pages report. Look for “Noindex page” warnings. 10-ways-to-get-google-to-index-your-site-that-actually-work-6.png10-ways-to-get-google-to-index-your-site-that-actually-work-6.png Click through to see all affected pages. Remove the noindex meta tag from any pages where it doesn’t belong.

Method 2: X‐Robots‐Tag

Crawlers also respect the X‐Robots‐Tag HTTP response header. You can implement this using a server‐side scripting language like PHP, or in your .htaccess file, or by changing your server configuration. The URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header. Just enter your URL, then look for the “Indexing allowed? No: ‘noindex’ detected in ‘X‐Robots‐Tag’ http header” 10-ways-to-get-google-to-index-your-site-that-actually-work-7.png10-ways-to-get-google-to-index-your-site-that-actually-work-7.png If you want to check for this issue across your site, run a crawl in Ahrefs’ Site Audit tool, then use the “Robots information in HTTP header” filter in the Data Explorer: 10-ways-to-get-google-to-index-your-site-that-actually-work-8.png10-ways-to-get-google-to-index-your-site-that-actually-work-8.png Tell your developer to exclude pages you want indexing from returning this header. Recommended reading: Using the X‐Robots‐Tag HTTP Header Specifications in SEO: Tips and Tricks

3) Include the page in your sitemap

A sitemap tells Google which pages on your site are important, and which aren’t. It may also give some guidance on how often they should be re‐crawled. Google should be able to find pages on your website regardless of whether they’re in your sitemap, but it’s still good practice to include them. After all, there’s no point making Google’s life difficult. To check if a page is in your sitemap, use the URL inspection tool in Search Console. If you see the “URL is not on Google” error and “Sitemap: N/A,” then it isn’t in your sitemap or indexed. 10-ways-to-get-google-to-index-your-site-that-actually-work-9.png10-ways-to-get-google-to-index-your-site-that-actually-work-9.png Not using Search Console? Head to your sitemap URL—usually, yourdomain.com/sitemap.xml—and search for the page. 10-ways-to-get-google-to-index-your-site-that-actually-work-10.png10-ways-to-get-google-to-index-your-site-that-actually-work-10.png Or, if you want to find all the crawlable and indexable pages that aren’t in your sitemap, run a crawl in Ahrefs’ Site Audit. Go to Data Explorer and apply these filters: 10-ways-to-get-google-to-index-your-site-that-actually-work-11.png10-ways-to-get-google-to-index-your-site-that-actually-work-11.png These pages should be in your sitemap, so add them. Once done, let Google know that you’ve updated your sitemap by pinging this URL: http://www.google.com/ping?sitemap=http://yourwebsite.com/sitemap_url.xml Replace that last part with your sitemap URL. You should then see something like this: 10-ways-to-get-google-to-index-your-site-that-actually-work-12.png10-ways-to-get-google-to-index-your-site-that-actually-work-12.png That should speed up Google’s indexing of the page.

4) Remove rogue canonical tags

A canonical tag tells Google which is the preferred version of a page. It looks something like this: <link rel="canonical” href="/page.html/"> Most pages either have no canonical tag, or what’s called a self‐referencing canonical tag. That tells Google the page itself is the preferred and probably the only version. In other words, you want this page to be indexed. But if your page has a rogue canonical tag, then it could be telling Google about a preferred version of this page that doesn’t exist. In which case, your page won’t get indexed. To check for a canonical, use Google’s URL inspection tool. You’ll see an “Alternate page with canonical tag” warning if the canonical points to another page. 10-ways-to-get-google-to-index-your-site-that-actually-work-13.png10-ways-to-get-google-to-index-your-site-that-actually-work-13.png If this shouldn’t be there, and you want to index the page, remove the canonical tag.
IMPORTANT
Canonical tags aren’t always bad. Most pages with these tags will have them for a reason. If you see that your page has a canonical set, then check the canonical page. If this is indeed the preferred version of the page, and there’s no need to index the page in question as well, then the canonical tag should stay.
If you want a quick way to find rogue canonical tags across your entire site, run a crawl in Ahrefs’ Site Audit tool. Go to the Data Explorer. Use these settings: 10-ways-to-get-google-to-index-your-site-that-actually-work-14.png10-ways-to-get-google-to-index-your-site-that-actually-work-14.png This looks for pages in your sitemap with non‐self‐referencing canonical tags. Because you almost certainly want to index the pages in your sitemap, you should investigate further if this filter returns any results. It’s highly likely that these pages either have a rogue canonical or shouldn’t be in your sitemap in the first place.

5) Check that the page isn’t orphaned

Orphan pages are those without internal links pointing to them. Because Google discovers new content by crawling the web, they’re unable to discover orphan pages through that process. Website visitors won’t be able to find them either. To check for orphan pages, crawl your site with Ahrefs’ Site Audit. Next, check the Incoming links report for “Orphan page (has no incoming internal links)” errors: 10-ways-to-get-google-to-index-your-site-that-actually-work-15.png10-ways-to-get-google-to-index-your-site-that-actually-work-15.png This shows all pages that are both indexable and present in your sitemap, yet have no internal links pointing to them.
IMPORTANT
This process only works when two things are true:
  1. All the pages you want indexing are in your sitemaps
  2. You checked the box to use the pages in your sitemaps as starting points for the crawl when setting up the project in Ahrefs’ Site Audit.
Not confident that all the pages you want to be indexed are in your sitemap? Try this:
  1. Download a full list of pages on your site (via your CMS)
  2. Crawl your website (using a tool like Ahrefs’ Site Audit)
  3. Cross‐reference the two lists of URLs
Any URLs not found during the crawl are orphan pages. You can fix orphan pages in one of two ways:
  1. If the page is unimportant, delete it and remove from your sitemap.
  2. If the page is important, incorporate it into the internal link structure of your website.

6) Fix nofollow internal links

Nofollow links are links with a rel=“nofollow” tag. They prevent the transfer of PageRank to the destination URL. Google also doesn’t crawl nofollow links. Here’s what Google says about the matter:
Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap.
In short, you should make sure that all internal links to indexable pages are followed. To do this, use Ahrefs’ Site Audit tool to crawl your site. Check the Incoming links report for indexable pages with “Page has nofollow incoming internal links only” errors: 10-ways-to-get-google-to-index-your-site-that-actually-work-16.png10-ways-to-get-google-to-index-your-site-that-actually-work-16.png Remove the nofollow tag from these internal links, assuming that you want Google to index the page. If not, either delete the page or noindex it. Recommended reading: What Is a Nofollow Link? Everything You Need to Know (No Jargon!)

7) Add “powerful” internal links

Google discovers new content by crawling your website. If you neglect to internally link to the page in question then they may not be able to find it. One easy solution to this problem is to add some internal links to the page. You can do that from any other web page that Google can crawl and index. However, if you want Google to index the page as fast as possible, it makes sense to do so from one of your more “powerful” pages. Why? Because Google is likely to recrawl such pages faster than less important pages. To do this, head over to Ahrefs’ Site Explorer, enter your domain, then visit the Best by links report. 10-ways-to-get-google-to-index-your-site-that-actually-work-17.png10-ways-to-get-google-to-index-your-site-that-actually-work-17.png This shows all the pages on your website sorted by URL Rating (UR). In other words, it shows the most authoritative pages first. Skim this list and look for relevant pages from which to add internal links to the page in question. For example, if we were looking to add an internal link to our guest posting guide, our link building guide would probably offer a relevant place from which to do so. And that page just so happens to be the 11th most authoritative page on our blog: 10-ways-to-get-google-to-index-your-site-that-actually-work-18.png10-ways-to-get-google-to-index-your-site-that-actually-work-18.png Google will then see and follow that link next time they recrawl the page.
pro tip
Paste the page from which you added the internal link into Google’s URL inspection tool. Hit the “Request indexing” button to let Google know that something on the page has changed and that they should recrawl it as soon as possible. This may speed up the process of them discovering the internal link and consequently, the page you want indexing.

8) Make sure the page is valuable and unique

Google is unlikely to index low‐quality pages because they hold no value for its users. Here’s what Google’s John Mueller said about indexing in 2018: https://twitter.com/JohnMu/status/948544364090970112 He implies that if you want Google to index your website or web page, it needs to be “awesome and inspiring.” If you’ve ruled out technical issues for the lack of indexing, then a lack of value could be the culprit. For that reason, it’s worth reviewing the page with fresh eyes and asking yourself: Is this page genuinely valuable? Would a user find value in this page if they clicked on it from the search results? If the answer is no to either of those questions, then you need to improve your content. You can find more potentially low‐quality pages that aren’t indexed using Ahrefs’ Site Audit tool and URL Profiler. To do that, go to Data Explorer in Ahrefs’ Site Audit and use these settings: 10-ways-to-get-google-to-index-your-site-that-actually-work-19.png10-ways-to-get-google-to-index-your-site-that-actually-work-19.png This will return “thin” pages that are indexable and currently get no organic traffic. In other words, there’s a decent chance they aren’t indexed. Export the report, then paste all the URLs into URL Profiler and run a Google Indexation check.
10-ways-to-get-google-to-index-your-site-that-actually-work-20.png10-ways-to-get-google-to-index-your-site-that-actually-work-20.png

Source: https://urlprofiler.com/blog/google-indexation-checker-tutorial/

IMPORTANT
It’s recommended to use proxies if you’re doing this for lots of pages (i.e., over 100). Otherwise, you run the risk of your IP getting banned by Google. If you can’t do that, then another alternative is to search Google for a “free bulk Google indexation checker.” There are a few of these tools around, but most of them are limited to <25 pages at a time.
Check any non‐indexed pages for quality issues. Improve where necessary, then request reindexing in Google Search Console. You should also aim to fix issues with duplicate content. Google is unlikely to index duplicate or near‐duplicate pages. Use the Content quality report in Site Audit to check for these issues. 10-ways-to-get-google-to-index-your-site-that-actually-work-21.png10-ways-to-get-google-to-index-your-site-that-actually-work-21.png

9) Remove low‐quality pages (to optimize “crawl budget”)

Having too many low‐quality pages on your website serves only to waste crawl budget. Here’s what Google says on the matter:
Wasting server resources on will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.
Think of it like a teacher grading essays, one of which is yours. If they have ten essays to grade, they’re going to get to yours quite quickly. If they have a hundred, it’ll take them a bit longer. If they have thousands, their workload is too high, and they may never get around to grading your essay. Google does state that “crawl budget is not something most publishers have to worry about,” and that “if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.” Still, removing low‐quality pages from your website is never a bad thing. It can only have a positive effect on crawl budget. You can use our content audit template to find potentially low‐quality and irrelevant pages that can be deleted.

10) Build high‐quality backlinks

Backlinks tell Google that a web page is important. After all, if someone is linking to it, then it must hold some value. These are pages that Google wants to index. For full transparency, Google doesn’t only index web pages with backlinks. There are plenty (billions) of indexed pages with no backlinks. However, because Google sees pages with high‐quality links as more important, they’re likely to crawl—and re-crawl—such pages faster than those without. That leads to faster indexing. We have plenty of resources on building high‐quality backlinks on the blog. Take a look at a few of the guides below.
Further reading

Indexing ≠ ranking

Having your website or web page indexed in Google doesn’t equate to rankings or traffic. They’re two different things. Indexing means that Google is aware of your website. It doesn’t mean they’re going to rank it for any relevant and worthwhile queries. That’s where SEO comes in—the art of optimizing your web pages to rank for specific queries. In short, SEO involves:
  • Finding what your customers are searching for;
  • Creating content around those topics;
  • Optimizing those pages for your target keywords;
  • Building backlinks;
  • Regularly republishing content to keep it “evergreen.”
Here’s a video to get you started with SEO:
https://www.youtube.com/watch?v=DvwS7cV9GmQ … and some articles:
Further reading

Final thoughts

There are only two possible reasons why Google isn’t indexing your website or web page:
  1. Technical issues are hindering them from doing so
  2. They see your site or page as low‐quality and worthless to their users.
It’s entirely possible that both of those issues exist. However, I would say that technical issues are far more common. Technical issues can also lead to the auto‐generation of indexable low‐quality content (e.g., problems with faceted navigation). That isn’t good. Still, running through the checklist above should solve the indexation issue nine times out of ten. Just remember that indexing ≠ ranking. SEO is still vital if you want to rank for any worthwhile search queries and attract a constant stream of organic traffic.  

https://www.businesscreatorplus.com/10-ways-to-get-google-to-index-your-site-that-actually-work/

No comments:

Post a Comment