Indexer

« Back to Receiving Documentation

$Id: indexer.html 19047 2013-03-14 00:01:19Z jmfee $
$URL: https://ghttrac.cr.usgs.gov/websvn/ProductDistribution/trunk/etc/documentation/userguide/indexer.html $

The Indexer maintains an index of received products. It uses this index to associate related products into events, based on eventid or time, latitude, and longitude. When multiple sources submit information for the same event, the indexer determines which source is considered preferred for that type of information.

The Index

The index is typically a database, although it is not required to be. The default implementation uses JDBC, and should be able to maintain an index in any JDBC compliant database.

Archive Policies

Archive policies define rules for when the indexer should remove information from its index.

Search

Enabled by default. The indexer listens on a socket to allow external users to search and retrieve information from The Index.

See the command line client --search option, or SearchSocket API class.

Searches and results use an XML format. See etc/schema/indexer.xsd for details.

Indexer Events

When a product arrives and is added to the index, the indexer keeps track of the changes it makes. Each Indexer Event is a group of one or more changes that were made in response to one product arriving.

This tracking is performed through an onEventTrigger database trigger. For a technical description of this trigger and instructions for implementing it on the MySQL database, see Configuring the Product Index to Use MySQL

Change Types

EVENT_ADDED
An event was added to the index. This occurs when a product arrives that cannot associate to an existing event, but has enough information (time, latitude, longitude) to create a new event.
EVENT_SPLIT
An event that was part of another event in the index, in now considered a separate event. This usually occurs when a network updates is location far enough away from the "parent" event.

There may be several EVENT_SPLIT changes, but there will always also be an EVENT_UPDATED for the event that the split events were split from.
EVENT_UPDATED
An event that already existed in the index was updated. This occurs when a product arrives and associates to an existing event. This does not necessarily mean the preferred event properties (eventid, time, latitude, longitude, magnitude, depth) have changed, only that information associated to this event is different than before.
EVENT_DELETED
An event that already existed in the index was deleted. This effectively means the event did not occur. This occurs when a product arrives, associates to an existing event, and because of the new information the event no longer has a time, latitude, or longitude.
EVENT_MERGED
An event that already existed in the index merged with another event. This means this event still occured, but is now part of another event (and is not preferred).

There may be several EVENT_MERGED changes, but there will always also be an EVENT_UPDATED for the event that the merged events were merged into.
EVENT_ARCHIVED
An event was removed from the index due to a configured archive policy. The event still occured, but is no longer being tracked by this indexer.
PRODUCT_ADDED
A product arrived, was unable to associate to an event, and did not have enough information (time, latitude, longitude) to create a new event.
PRODUCT_UPDATED
An unassociated product was updated. If an update causes the product to associate, there will be an EVENT_UPDATED change instead of PRODUCT_UPDATED.
PRODUCT_DELETED
An unassociated product was deleted.
PRODUCT_ARCHIVED
An unassociated product was removed from the index due to a configured archive policy.

Example Indexer Configuration File

In this example, an indexer is configured to:

; note this configuration does not include senders,
; which would be required for sending products.

receivers = receiver_pdl
listeners = indexer
enableTracker = false


; receive from production hubs
[receiver_pdl]
type = gov.usgs.earthquake.distribution.EIDSNotificationReceiver
storageDirectory = data/receiver_storage
indexFile = data/receiver_index.db
serverHost = ehppdl1.cr.usgs.gov
serverPort = 39977
alternateServers = ehppdl2.wr.usgs.gov:39977
cleanupInterval = 900000
storageage = 900000


; indexer is only listener
; currently it only receives origin messages
[indexer]
type = gov.usgs.earthquake.indexer.Indexer
listenerIndexFile = data/indexer_listener_index.db
storageDirectory = data/indexer_product_storage
indexfile = data/indexer_product_index.db
includeTypes = origin
listeners = indexerlistener_example
archivePolicy = policyOldEvents, policyOldProducts, policyOldProductVersions

[policyOldEvents]
; remove events after one month
type = gov.usgs.earthquake.indexer.ArchivePolicy
maxAge = 2592000000

[policyOldProducts]
; remove unassociated products after one week
type = gov.usgs.earthquake.indexer.ProductArchivePolicy
maxAge = 604800000
onlyUnassociated = true

[policyOldProductVersions]
; remove old versions of products after one hour
type = gov.usgs.earthquake.indexer.ProductArchivePolicy
maxAge = 3600000
onlySuperseded = true


; whenever the indexer makes a change, it calls this listener
; currently it only receives changes triggered by origin products
[indexerlistener_example]
type = gov.usgs.earthquake.indexer.ExternalIndexerListener
storageDirectory = data/indexerlistener_storage
command = echo
processPreferredOnly = true
includeTypes = origin

Indexer Summarization

As an aid to indexing, the Indexer maintains a product summary of products, associating them to seismic events using time, latitude and longitude. Using these three attributes, the Indexer assigns an eventID to the summaries, so that multiple products can be efficiently cross-referenced to a single event.

As part of the summarization process, the Indexer extracts a specific subset of properties from various products, so that important key aspects of an event are visible without having to interrogate the details of multiple products.

Summarized Properties

The following properties are extracted from products and are associated with summarizations of events:

region
The name of a particular geographic region. Initially the Indexer makes an attempt at obtaining the region directly from the origin or geoserve products. Failing that, it derives the region using the event's latitude and longitude. This derivation is performed by the feplus feature of the Indexer, where individual regions are defined by latitude/longitude within the etc/config/regions.xml file.
maxmmi
The maximum shaking intensity found in the shakemap product, although maxmmi is directly obtained from the losspager product. If not available from losspager, then maxmmi is obtained from the dyfi product.
alertlevel
A categorized fatality or economic loss level, obtained from the losspager product.:
Green
0 fatalities OR less than 1 million U.S. dollars economic loss.
Yellow
1-99 fatalities OR less than 100 million U.S. dollars economic loss.
Orange
100-999 fatalities OR less than 1 billion U.S. dollars economic loss.
Red
1000+ fatalities OR greater than 1 billion U.S. dollars economic loss.
review_status
Whether this event has been reviewed by a human, obtained from the origin product.
event_type
The type of event, such as earthquake or landslide, obtained from the origin product.
azimuthal_gap
Azimuthal Gap is obtained from the origin product.
magnitude
Magnitude is obtained from the origin product.
num_Resp
The number of individuals completing the DYFI web dialogue for this event, obtained from the nresponses attribute of the event_data.xml file included in the dyfi product.
tsunamiFlag
A [“true”|“false”] Boolean string indicating if the tsunami flag should be triggered automatically, obtained from the geoserve product.
utcOffset
Number of minutes between the epicenter timezone and UTC, obtained from the geoserve product.
significance
An integer value indicating the significance of an event, calculated from properties of the origin, losspager and dyfi products.

Significance is calculated from the following multi-step formula:

magnitude_significance
= (100/6.5) * magnitude2
pager_significance
=  2000 if red   1000 if orange  500 if yellow
dyfi_significance
= MIN(num_Resp, 1000) * maxmmi * 0.10
significance
= MAX(magnitude_significance, pager_significance) + dyfi_significance

Product Summarized Preferred Weight

Within each type of product, the summary with the largest preferred weight is considered preferred. This calculated weight is the sum of four components:

DEFAULT_PREFERRED_WEIGHT = 1
All product summaries have a preferred weight of at least 1.
SAME_SOURCE_WEIGHT = 5
Weight added when product source is same as event source.
AUTHORITATIVE_WEIGHT = 100
Weight added when product author is in the product's authoritative region.
AUTHORITATIVE_EVENT_WEIGHT = 50
Weight added when product refers to an authoritative event.

Indexer Components

Indexer SQL Dependencies

The Indexer is dependent on two SQL components: the feplus system and OnEventUpdate stored procedures:

mysql_feplus
Found in the schema/mysql_feplus directory, feplus implements region-identifying functionality based on latitude and longitude. It uses the definitions in the etc/config/regions.xml file to associate a region-name with a particular latitude/longitude location of an event or product. OnEventUpdate stored procedures uses this functionality for origin and geoserve products, which ultimately determine properties such as event significance.
onEventTrigger Stored Procedures
Found in the schema/productIndexOnEventUpdateMysql.sql file, these procedures summarize products and events for efficient retrieval. The trigger is evoked when the Indexer's Java classes use time/latitude/longitude information in products to create or modify events.

Some Major Java Components

JDBCProductIndex
This class implements the ProductIndex interface to maintain events, product summaries, event summaries and properties. It contains and executes the SQL manipulations of the database.
Indexer
This key class uses JDBCProductIndex to maintain the database, as well as adds and removes listeners, receives products and sends notifications. It extends the DefaultNotificationListener class.

Indexer Modules

Specific products sometimes have special needs for indexing; the three existing product type of this nature are the shakemap, dyfi, and moment-tensor products. This special indexing is configured in config.ini, as is documented in the Indexer Components section of the configuration documentation and illustrated below.

The following code snippet from config.ini shows the minimum entries necessary for requesting special indexing for the shakemap and dyfi products:

[indexer]
modules = indexer_module_shakemap, indexer_module_dyfi

[indexer_module_shakemap]
type = gov.usgs.earthquake.shakemap.ShakeMapIndexerModule

[indexer_module_dyfi]
type = gov.usgs.earthquake.dyfi.DYFIIndexerModule

[indexer_module_momenttensor]
type = gov.usgs.earthquake.momenttensor.MTIndexerModule

As has been noted elsewhere in this documentation, the custom programming of these special indexing classes requires coordination between the product producer and the PDL web team at jmfee@usgs.gov.

gov.usgs.earthquake.shakemap.ShakeMapIndexerModule
This class implements the ProductIndex interface to maintain events, product summaries, event summaries and properties. It contains and executes the SQL manipulations of the database.
gov.usgs.earthquake.dyfi.DYFIIndexerModule
This key class uses JDBCProductIndex to maintain the database, as well as adds and removes listeners, receives products and sends notifications. It extends the DefaultNotificationListener class.
gov.usgs.earthquake.momenttensor.MTIndexerModule
This class adjusts the weight of moment tensor products.