Monday, November 4, 2013

Upgrading existing Solr installation to new version of Jetty

At work we have been running into a problem with Apache Solr crashing. Depending on how much it was used we would get several weeks of usage out of it before it crashed. Now it is only running for five days at a time. So this fire has started burning hot enough to be at the top of my to-do list.
When it crashes it throws errors saying "Too many open files". Running lsof showed it wasn't actually open files but thousands of orphaned sockets left open. The sockets looked like this in the lsof output:

java 2428 root 2173u sock 0,7 0t0 123291433 can't identify protocol

There won't be anything listed in netstat. These sockets don't have open connections to anything. The Solr log file will start showing errors similar to this:

SEVERE: java.io.FileNotFoundException: /usr/local/apache-solr-3.5.0/example/solr/data/index/_dgf.frq (Too many open files)

SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock

Initially we dealt with this problem by monitoring the number of open files for the java process and running a reindex when it got close to the limit. Not a great solution but at the time there weren't enough hours in the day to put a bunch of effort into figuring this out. In my case the limit when Solr blew up was 4000 open sockets. Once Solr had that many sockets open it would just throw 500 errors.

Usually the answer to a situation like this is upgrade Solr to a newer version. Unfortunately I couldn't do that in this case because we have a ruby gem that is dependent on Solr version 3.5. My research pointed to Jetty as the source problem and not Solr. Once I found this post I knew for sure Jetty was causing the orphaned sockets. Solr 3.5.0 is packaged with Jetty 6.1.26 which has a bug that causes the orphaned sockets under certain conditions. Because Jetty 6 is fairly old the developers are not going to fix it. At this point I set about upgrading Jetty to version 7.

The first thing I had to figure out was what stuff was Solr and what stuff was Jetty. Turns out most of the package is Jetty. Solr is contained in apache-solr-3.5.0/example/solr and apache-solr-3.5.0/example/webapps/solr.war. So I decided to try and stuff Solr 3.5.0 into Jetty 7.6.13. Later I may try moving to the latest version of Jetty 9 but I'm just trying to solve this orphaned socket problem right now and was worried the older version of Solr might have problems with a newer Jetty.

Upgrading Jetty

Here are the steps I took to upgrade Solr 3.5.0 to Jetty 7

Download latest Jetty 7 (jetty-distribution-7.6.13.v20130916.tar.gz at the time this was written) from here http://download.eclipse.org/jetty/7.6.13.v20130916/dist/

Untar jetty-distribution-7.6.13.v20130916.tar.gz
tar xfvz jetty-distribution-7.6.13.v20130916.tar.gz

Create destination directory for all the new files
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13/example

copy the contents of jetty-distribution-7.6.13.v20130916 to new directory
cp -a jetty-distribution-7.6.13.v20130916/* /usr/local/apache-solr-3.5.0-jetty-7.6.13/example

Copy solr files from old solr installation to new Jetty directory
cp -a /usr/local/apache-solr-3.5.0/example/solr  /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
cp -a /usr/local/apache-solr-3.5.0/example/webapps/solr.war /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps/

Edit the jetty.xml config file to change the listening port
vi usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc/jetty.xml
Change this line
 <Set name="port"><Property name="jetty.port" default="8080"/></Set>
To this
 <Set name="port"><Property name="jetty.port" default="8983"/></Set>


At this point solr will run but there are some example war files and config files that aren't needed for Solr and should be cleaned up. 

- Edit /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
   vi /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
   Comment out the line
   etc/jetty-testrealm.xml
   so it reads 
   #etc/jetty-testrealm.xml

- Clean up example war files
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps
  mkdir BAK
  mv test.war spdy.war BAK

- Clean up example config files
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc
  mkdir BAK
  mv jetty-spdy.xml jetty-spdy-proxy.xml jetty-testrealm.xml BAK
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/contexts
  mkdir BAK
  mv test.xml BAK

I use a symbolic link for the installation directory so the start script doesn't have to be modified. Before restarting I have to switch that sym link.
  service solr stop
  cd /usr/local
  rm solr
  ln -s apache-solr-3.5.0-jetty-7.6.13 solr
  service solr start

Then you can test hitting the service locally.
  curl localhost:8983/solr/

it should return html that says something like this:
  <title>Welcome to Solr</title>
  </head>

  <body>
  <h1>Welcome to Solr!</h1>

You will probably need to run a reindex if transactions have been taking place while solr was down for the upgrade.

Resources used to compile this post
http://comments.gmane.org/gmane.comp.ide.eclipse.jetty.user/919
https://github.com/umars/jetty-solr
http://stackoverflow.com/questions/6425759/how-to-upgrade-update-the-solr-jetty-ubuntu-package
https://jira.codehaus.org/browse/JETTY-1458
http://grokbase.com/t/lucene/solr-user/123e6et8e0/too-many-open-files-lots-of-sockets