Incorporating an internal search engine is a powerful weapon from a SEO point of view. It allows us index relevant content-provided that the search engine works properly-and increase greatly the contents of the website thanks to the search results of the engine.
But we must handle this with care. We could generate many URLs without content, with duplicated content or “useless” searches.
Thus, we will say that if the search engine is not properly set the best thing is not to index any search page (hereafter SRP).
These are some of the aspects to bear in mind from a SEO perspective to implement the internal search engine.
Default robots metatag “noindex”
By default, any SRP should appear with a robots metatag “noindex”, and only when some criteria are met and the search is considered valid would the SRP for a specific search be shown without the “noindex”, because the problem with indexing the SRP is that it can generate a lot of useless content.
Data prior to the opening of the indexing
Before opening the SRP of the internal search engine to the rest of searches like Google, it is interesting to know what are the most frequent internal searches, the number of results for those frequent searches, etc.
Number of results
In the beginning we would be more conservative, only taking out the attribute “noindex” of the SRP that gave back more than 10 or 15 (or 20-25) results. Depending on the initial information about how the search engine is working, this number could vary.
The more conservative we are and thus the more results are necessary to prevent that the SrP does not have the noindex, the lower the chance that we get the same result for two different searches.
Snippet including text nearby
On the SrP it is also interesting that in addition to the title of the post, the product or whatever, a small texts shows up with the word(s) searched for including the text surrounding it. This will help us have SrP with more content, and it will be harder to have duplicated content, because even though it is possible that the same search, e.g. in a mobile phone shop, delivers the same results for “Chinese mobile phones” and “cheap mobile phones”, the texts of the snippets will be different.
Whenever possible there should be an automatic search validation system to control those that will be indexed, complemented by a manual reviewing from time to time.
There are different solutions depending on the resources and time you can allocate, but one of them would be to only validate the searches that have been made several times-which would mean that users are interested in them. Another solution is that searches do not have more than X words, that they do not contain forbidden words, etc.
The underlying idea is that we can select automatically the searches that can be valid but using several criteria-no. of times the search was performed, whether they have forbidden words, etc.-but there has to be always a manual control every so often to avoid indexing unwanted results.
Both the title of the page and the rest of tags with SEO relevance should generate automatically according to the search, but we could also consider generating tags manually. If the indexed search is not manually optimized, the automatic optimization would take place.
This way we can trim from an on-page SEO point of view the searches that get more visits.
The structure of the search result pages of the internal search engine should be friendly, like:
This would correspond to the search “Chinese mobiles”. The URL generated that contains the search has to be processed so that there are no special characters like accents, upper cases, etc. As we said previously, we would control that searches with many words are not indexed.
The URLs corresponding to the most popular searches have to be linked from different parts of the web (provided they are related).
We can automatically associate each search with a category / product / topic / etc., and show for each of them the most common searches. This way we give relevance to the best searches and also the pages that link them, since it is related content.