DeveloperWeb

  Dashboard > GX DeveloperWeb > Home > Setting up the GX WebManager search engine
Setting up the GX WebManager search engine Log In | Sign Up   View a printable version of the current page.

Added by martinvm , last edited by William Breuer on Sep 24, 2008  (view change) show comment
Labels: 
(None)

Setup

  • You have to define a user in GX Webmanager (in the authorization panel) and add it to a group with sufficient rights so that every page can be reached.
  • The username and password of this user have to be added to the credentials.xml of george.
  • George has to be started.
  • Set development_mode in the /web/setup tool to false (uncheck the checkbox)
  • Now the site can be indexed.

In the example below, replace the 2 values 'georgeuser' and 'FillHereThePasswordForGeorgeUser' by the username and password you added.

<credentials>

  <credential pattern=".*localhost.*" type="postform" username="georgeuser" password="FillHereThePasswordForGeorgeUser">

    <!-- indicate which input parameters in the login form correspond to the user and password -->
    <param name="userparam" value="user" />
    <param name="passwordparam" value="password" />
    <!-- the action url george needs to post the user/password to -->
    <param name="actionurl" value="http://localhost:8080/web/webmanager?source=login&amp;fromlogin=true" />

    <!-- This should be the URL of the page which contains the login form. (Needed for the Secure Forms.) -->
    <param name="sourceformurl" value="http://localhost:8080/web/webmanager/id=26101/mode=edit/fromlogin=true" />

    <formparam name="user" value="" />
    <formparam name="password" value="" />

    <formparam name="id" value="26101" />
    <formparam name="source" value="login" />
    <formparam name="fromlogin" value="true" />

  </credential>

</credentials>

If there are more webinitiatives, the credential-pattern must be something like <credential pattern=".*-redactie.devel.gx.nl.*" ...

Search engine fails after upgrade or rebuild

Make sure the right credentials.xml and properties.txt are available in the \webmanager-searchengine\src and \webmanager-searchengine\target directories and not only the target directory. Otherwise this copy will be overwritten during and upgrade or rebuild.
Also make sure that since WebManager 9.3 the sourceformurl parameter must be supplied and point to the URL that hosts the editor login form. The sourceformurl is required, because WebManager 9.3 requires a correct form signature, which can be obtained from the login page.

 Further, it's crucial that the searchengine version matches the WebManager version. The searchengine jar files and binaries are not installed automatically, even if they are available in the deploy. The update of the searchengine has to be requested from the system administrator.



Troubleshooting

Basedir error

When you get an error such as 'Unable to set error log to 'C:SVNWM93WM9.3.1/webmanager-searchengine/target/classes/logs/error.log' your webmanager.basedir in your settings.xml file is incorrect. You probably used slashes instead of backslashes. So instead C:\SVN\WM93\WM9.3.1 you should use C:/SVN/WM93/WM9.3.1

properties.txt

Possible problems in properties.txt:

Notes

Below are some notes concerning the search engine.

Indexing of media repository articles

Articles in the media repository are added to the standard INDEXER output using a stored procedure. This stored procedure (wjGetContentForIndexerWithDisplayOn) by default only retrieves articles of the last 5 days. This means that reindexing a complete site requires changing that stored procedure to return all articles, or all older articles will disappear from the search results. If the crontab.txt starts with Fullindex, the index is emptied before indexing (in one transaction), so on websites with a mediarepository, only articles of the last 5 days can be found with the search-function. Sites with a mediarepository should have a crontab.txt starting with 'index'.

Do not forget to change back the stored procedure afterwards!

Page removal

Pages that are removed in GX WebManager, are not automatically removed from the search index. To achieve removal, the page should be offered empty (e.g. non-published) to the search engine. This will cause all references to the page to be removed from the search index. 

Restart

In which cases should the GX WebManager search engine be restarted?

- Changes to the properties.txt require a restart

- The files parser.txt, meta.txt, credentials.txt, crontab.txt are reread every minute 

 

FAQ

Q: Is it possible to filter the search results based on media item terms?
A: Yes, by using metadata and prefix querystring addition. More info here: [GX WebManager search engine filtering on metadata and terms]

Q: Is it possible to find search results of other sites?
A: Yes. You can add the URLs of other sites to the "crontab.txt" to make the search engine index the pages of that site (make sure that you are allowed to index it by its owner). To be able to view search results from that site, create entries in the "meta.txt" as follows:

http://www.gx.nl/.*    webid    26098

 Every line in the "meta.txt" is in the form of "<URL pattern><tab><meta name><tab><value>". Upon matching search results, GX WebManager will filter for results that belong to the website. This is done by adding "(webid:26098^0) AND " to the search query. The instruction in the "meta.txt" tells the indexer to add that meta info to each indexed URL, thereby making a match possible.

Q: Is it possible to influence the order of search results?

A: Yes. Words are matched in fields, adding up to a certain score.  The search engine allows to set weights for fields in order to determine the importance of a field. The default value of a factor is 1. If you want to change the importance of fields, edit the file "properties.txt" and add lines like:

# Change the weight factors for different fields, the default factor is 1.
# This will boost matches in the title by a factor of 10, matches in the
# keyword tag by 5 and matches in the description tag by 3.
factor.title=10
factor.keyword=5
factor.description=3
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators