Troubleshooting

« Back to User Guide

$Id: troubleshooting.html 22318 2014-11-12 23:07:38Z jmfee $
$URL: https://ghttrac.cr.usgs.gov/websvn/ProductDistribution/trunk/etc/documentation/userguide/troubleshooting.html $

Receive Mode

Database is locked

Example exception from log:

WARNING	thread=13	[receiver_pdl] exception during receiver cleanup
java.sql.SQLException: database locked
	at org.sqlite.DB.execute(DB.java:270)
	at org.sqlite.DB.executeUpdate(DB.java:281)
	at org.sqlite.PrepStmt.executeUpdate(PrepStmt.java:77)
	at gov.usgs.earthquake.distribution.JDBCNotificationIndex.removeNotification(JDBCNotificationIndex.java:513)
	at gov.usgs.earthquake.distribution.DefaultNotificationReceiver.removeExpiredNotifications(DefaultNotificationReceiver.java:301)
	at gov.usgs.earthquake.distribution.DefaultNotificationReceiver$2.run(DefaultNotificationReceiver.java:577)
	at java.util.TimerThread.mainLoop(Timer.java:512)
	at java.util.TimerThread.run(Timer.java:462)

Steps to fix:

  1. Stop the client, and comment any cron entry that may automatically restart the client.
  2. Find the database that is locked. For a receiver, this is likely data/receiver_index.db. Usually there will also be a file with the suffix -journal. DO NOT DELETE THE JOURNAL.
  3. Open the database using SQLite 3: sqlite3 data/receiver_index.db , as the user who owns the file. SQLite will automatically apply the journaled updates and unlock the database.
  4. Exit SQLite: .quit<ENTER> .
  5. Remove the EIDS tracking file, usually data/receiver_pdl_tracking.dat. This is because EIDS has already delivered notifications for products that were still queued. This will force EIDS to reprocess these notifications so any missed products are processed.
  6. Restart the client, and uncomment any cron entry that was commented in the first step.

Products are not being processed

Example log message:

FINE    thread=1762     [client_receiver] listener (indexer) has 1341 queued notifications

NOTE: Have you recently restarted the client for the first time, after an outage, or after clearing the EIDS tracking file? Sometimes it is normal for notifications to queue, although you should also see evidence that they are being processed.

Steps to fix:

  1. Stop the client, and comment any cron entry that may automatically restart the client. If the client will not stop gracefully, you may need to kill -9. Be careful to check that the database was not locked (which usually happens during ungraceful shutdowns).
  2. Remove the EIDS tracking file, usually data/receiver_pdl_tracking.dat. This is because EIDS has already delivered notifications for products that were still queued. This will force EIDS to reprocess these notifications so any missed products are processed.
  3. Restart the client, and uncomment any cron entry that was commented in the first step.

Too many open files

Example exception from log:

 java.io.FileNotFoundException: data/htdocs/us_general-link_nc71810601-0_1341367815000.xml 
 (Too many open files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.(FileInputStream.java:106)
    at gov.usgs.util.StreamUtils.getInputStream(StreamUtils.java:60)
    at gov.usgs.earthquake.distribution.URLProductStorage.getProductSourceFormat(URLProductStorage.java:139)
    at gov.usgs.earthquake.distribution.FileProductStorage.getProductSource(FileProductStorage.java:470)
    at gov.usgs.earthquake.distribution.FileProductStorage.hasProduct(FileProductStorage.java:506)
    at gov.usgs.earthquake.distribution.EIDSNotificationSender.onBeforeProcessNotification(EIDSNotificationSender.java:79)
    ...
or

 Exception: data/htdocs/us_general-link_nc71810601-0_1341367815000.xml (Too many open files) 
 source=us, type=general-link, code=nc71810601-0, updateTime=Wed Jul 04 02:10:15 UTC 2012

    at org.sqlite.DB.execute(DB.java:270)
    at org.sqlite.DB.executeUpdate(DB.java:281)
    at org.sqlite.PrepStmt.executeUpdate(PrepStmt.java:77)
    at gov.usgs.earthquake.distribution.JDBCNotificationIndex.removeNotification(JDBCNotificationIndex.java:513)
    at gov.usgs.earthquake.distribution.DefaultNotificationReceiver.removeExpiredNotifications(DefaultNotificationReceiver.java:301)
    at gov.usgs.earthquake.distribution.DefaultNotificationReceiver$2.run(DefaultNotificationReceiver.java:577)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

Steps to fix:

  1. Verify (for RHEL): As [pdluser], run ulimit -n. When there is no limit, unlimited, or a large limit; this usually indicates another problem like a failed yum auto-update, which may be resolved by a system reboot.
  2. When ulimit -n returns 8192 or less, the resolution is:
    1. Increase number of open files limit (/etc/security/limits.conf). This can usually be set to 65536, or unlimited.
    2. Re-login as [pdluser] to inherit the new limit.
    3. Start PDL process.

No space left on device

Example exception from log:

INFO  thread=16 Exception executing task
java.io.FileNotFoundException: data/indexer_storage/origin/ci10204610/ci/1415644284160/quakeml.xml (No space left on device)
  at java.io.FileOutputStream.open(Native Method)
  at java.io.FileOutputStream.(FileOutputStream.java:212)
  at gov.usgs.util.StreamUtils.getOutputStream(StreamUtils.java:163)
  at gov.usgs.util.StreamUtils.getOutputStream(StreamUtils.java:185)
  at gov.usgs.earthquake.product.FileContent.(FileContent.java:110)
  at gov.usgs.earthquake.product.io.DirectoryProductHandler.onContent(DirectoryProductHandler.java:54)
  at gov.usgs.earthquake.product.io.FilterProductHandler.onContent(FilterProductHandler.java:67)
  at gov.usgs.earthquake.product.io.ObjectProductSource.sendContents(ObjectProductSource.java:144)
  at gov.usgs.earthquake.product.io.ObjectProductSource.streamTo(ObjectProductSource.java:63)
  at gov.usgs.earthquake.distribution.FileProductStorage.storeProductSource(FileProductStorage.java:669)
  at gov.usgs.earthquake.distribution.FileProductStorage.storeProduct(FileProductStorage.java:651)
  at gov.usgs.earthquake.indexer.Indexer.onProduct(Indexer.java:489)
  at gov.usgs.earthquake.indexer.Indexer.onProduct(Indexer.java:461)
  at gov.usgs.earthquake.distribution.DefaultNotificationListener.onNotification(DefaultNotificationListener.java:115)
  at gov.usgs.earthquake.distribution.ExecutorListenerNotifier$1.call(ExecutorListenerNotifier.java:172)
  at gov.usgs.earthquake.distribution.ExecutorListenerNotifier$1.call(ExecutorListenerNotifier.java:169)
  at gov.usgs.util.ExecutorTask.run(ExecutorTask.java:181)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:701)

Steps to fix:

  1. Verify (for RHEL): check disk free space, run df -h; and check disk free inodes, run df -i
  2. If the partition where PDL files are stored is full, or out of inodes: