Opened at 2012-04-13T07:10:58Z
Last modified at 2012-05-31T21:55:22Z
#1719 new defect
Improve google search results for phrases like "tahoe file storage"
Reported by: | amiller | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | website | Version: | n/a |
Keywords: | transparency usability | Cc: | |
Launchpad Bug: |
Description
Tahoe-LAFS could benefit from some SEO.
If you search for "tahoe lafs", the first result is tahoe-lafs.org - straight to where you'd expect. However, if you search for tahoe secure file storage, tahoe secure, or other reasonable phrases (omitting "lafs"), the results are much less useful. The pycon talk notes tend to show up as the first result -they're filled with allmydata.org links that correctly redirect to https://tahoe-lafs.org, at least.
<zooko> I think we may be telling google not to index any of https://tahoe-lafs.org with our robots.txt, which would be the first thing to change for that. <zooko> There might be a ticket about the terrible anti-SEO.
Beyond that, perhaps by helping web crawlers access the site, we can benefit from the external search engines when searching for tickets, code, etc. (See #1691 for trac search delays)
Change History (4)
comment:1 Changed at 2012-04-13T15:09:10Z by zooko
comment:2 Changed at 2012-05-09T19:08:36Z by zooko
Some of our content, such as https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/about.rst for example, is served up directly from the trac source browser. To let that stuff be indexable, at Tony Arcieri's suggestion, I removed the exclusion of trac from robots.txt. It now looks like this:
User-agent: * Disallow: /source/ Disallow: /buildbot-tahoe-lafs Disallow: /buildbot-zfec Disallow: /buildbot-pycryptopp Crawl-Delay: 60
This might impose too much CPU and disk-IO load on our server. We'll see.
comment:3 Changed at 2012-05-09T19:15:36Z by zooko
Brian pointed out that this might also clobber the trac.db, which contains cached information from darcs. Specifically, it caches the "annotate" results (a.k.a. "blame") from darcs. I don't know if it caches anything else.
It currently looks like this:
-rw-rw-r-- 1 trac source 408165376 2012-05-09 19:13 trac.db
But "annotate"/"blame" has been broken ever since I upgraded the darcs executable from v2.5 to v2.8, so maybe nothing will get cached.
comment:4 Changed at 2012-05-31T21:55:22Z by warner
Looking at the HTTP logs, I'm seeing hits with the Googlebot UA happening a lot faster than every 60 seconds, e.g. 18 hits in a 4 minute period. The "Crawl-Delay" wasn't changed, though, so I'm wondering if maybe that's the wrong field name.
The site feels slower than it did a few months ago, but I don't have any measurements to support it.
The trac.db file today (2012-05-31) is currently at 567MB, up from 408MB in the last three weeks.
I was wrong about robots.txt. https://tahoe-lafs.org/robots.txt currently says:
Which I think ought to allow search engines to inde the wiki. I don't know what else is needed to get search engines to give useful results to people making those sorts of services.