When it crashes it throws errors saying "Too many open files". Running lsof showed it wasn't actually open files but thousands of orphaned sockets left open. The sockets looked like this in the lsof output:
java 2428 root 2173u sock 0,7 0t0 123291433 can't identify protocol
There won't be anything listed in netstat. These sockets don't have open connections to anything. The Solr log file will start showing errors similar to this:
SEVERE: java.io.FileNotFoundException: /usr/local/apache-solr-3.5.0/example/solr/data/index/_dgf.frq (Too many open files)
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock
Initially we dealt with this problem by monitoring the number of open files for the java process and running a reindex when it got close to the limit. Not a great solution but at the time there weren't enough hours in the day to put a bunch of effort into figuring this out. In my case the limit when Solr blew up was 4000 open sockets. Once Solr had that many sockets open it would just throw 500 errors.
Usually the answer to a situation like this is upgrade Solr to a newer version. Unfortunately I couldn't do that in this case because we have a ruby gem that is dependent on Solr version 3.5. My research pointed to Jetty as the source problem and not Solr. Once I found this post I knew for sure Jetty was causing the orphaned sockets. Solr 3.5.0 is packaged with Jetty 6.1.26 which has a bug that causes the orphaned sockets under certain conditions. Because Jetty 6 is fairly old the developers are not going to fix it. At this point I set about upgrading Jetty to version 7.
The first thing I had to figure out was what stuff was Solr and what stuff was Jetty. Turns out most of the package is Jetty. Solr is contained in apache-solr-3.5.0/example/solr and apache-solr-3.5.0/example/webapps/solr.war. So I decided to try and stuff Solr 3.5.0 into Jetty 7.6.13. Later I may try moving to the latest version of Jetty 9 but I'm just trying to solve this orphaned socket problem right now and was worried the older version of Solr might have problems with a newer Jetty.
Upgrading Jetty
Here are the steps I took to upgrade Solr 3.5.0 to Jetty 7
Download latest Jetty 7 (jetty-distribution-7.6.13.v20130916.tar.gz at the time this was written) from here http://download.eclipse.org/jetty/7.6.13.v20130916/dist/
Untar jetty-distribution-7.6.13.v20130916.tar.gz
tar xfvz jetty-distribution-7.6.13.v20130916.tar.gz
Create destination directory for all the new files
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
copy the contents of jetty-distribution-7.6.13.v20130916 to new directory
cp -a jetty-distribution-7.6.13.v20130916/* /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
Copy solr files from old solr installation to new Jetty directory
cp -a /usr/local/apache-solr-3.5.0/example/solr /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
cp -a /usr/local/apache-solr-3.5.0/example/webapps/solr.war /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps/
Edit the jetty.xml config file to change the listening port
vi usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc/jetty.xml
Change this line
<Set name="port"><Property name="jetty.port" default="8080"/></Set>
To this
<Set name="port"><Property name="jetty.port" default="8983"/></Set>
At this point solr will run but there are some example war files and config files that aren't needed for Solr and should be cleaned up.
- Edit /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
vi /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
Comment out the line
etc/jetty-testrealm.xml
so it reads
#etc/jetty-testrealm.xml
- Clean up example war files
cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps
mkdir BAK
mv test.war spdy.war BAK
cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc
mkdir BAK
mv jetty-spdy.xml jetty-spdy-proxy.xml jetty-testrealm.xml BAK
cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/contexts
mkdir BAK
mv test.xml BAK
I use a symbolic link for the installation directory so the start script doesn't have to be modified. Before restarting I have to switch that sym link.
service solr stop
cd /usr/local
rm solr
ln -s apache-solr-3.5.0-jetty-7.6.13 solr
service solr start
Then you can test hitting the service locally.
curl localhost:8983/solr/
it should return html that says something like this:
<title>Welcome to Solr</title>
</head>
<body>
<h1>Welcome to Solr!</h1>
You will probably need to run a reindex if transactions have been taking place while solr was down for the upgrade.
Resources used to compile this post
http://comments.gmane.org/gmane.comp.ide.eclipse.jetty.user/919
https://github.com/umars/jetty-solr
http://stackoverflow.com/questions/6425759/how-to-upgrade-update-the-solr-jetty-ubuntu-package
https://jira.codehaus.org/browse/JETTY-1458
http://grokbase.com/t/lucene/solr-user/123e6et8e0/too-many-open-files-lots-of-sockets