2014年10月27日 星期一

Solr_05_01_Field Types

  • General Properties(通用屬性)

    Property
    描述
    Description
    Values
    name
    主要用來定義field名稱, field型態取決於屬性"type", 在定義
    名稱上建議使用不以數字為開頭的英文字母、數字、 下底線
    所組合而成的命名。
    The name of the fieldType. This value gets used in field definitions, in
    the "type" attribute. It is strongly recommended that names consist of
    alphanumeric or underscore characters only and not start with a digit.
    This is not currently strictly enforced.

    class
    儲存資料及索引資料用的類別, 請注意有些類別以solr為前綴
    的類名...例如"solr.TextField", 該類型會生效是因為solr本身
    會自動搜尋帶有solr前綴field的名稱後,尋找內部對應的類別,
    如果為第三方的類別庫,則必需打上完整的類別名稱。 例如:
    "org.apache.solr.schema.TextField" 
    但這個完整引用field等同於 "solr.TextField"
    The class name that gets used to store and index the data for this type.
    Note that you may prefix included class names with "solr." and Solr will
    automatically figure out which packages to search for the class - so
    "solr.TextField" will work. If you are using a third-party class, you will
    probably need to have a fully qualified class name. The fully qualified
    equivalent for "solr.TextField" is "org.apache.solr.schema.TextField".

    positionIncrementGap
    指定field值之間的間距, 以防止多值field雜亂無章的語句匹配
    For multivalued fields, specifies a distance between multiple values,
    which prevents spurious phrase matches
    integer
    autoGeneratePhraseQueries
    對於文字field。如果為true,Solr將自動生成語句查詢 相鄰的
    條件。如果為false,則必須用雙引號括住作為短語句的另一個
    條件。
    For text fields. If true, Solr automatically generates phrase queries for
    adjacent terms. If false, terms must be enclosed in double-quotes to be
    treated as phrases.
    true or
    false
    docValuesFormat
    自定義文件格式器,這需要一個感知模式編解碼器,例如配置
    在solrconfig.xml中的SchemaCodecFactory.
    Defines a custom DocValuesFormat to use for fields of this type. This
    requires that a schema-aware codec, such as the SchemaCodecFacto
    ry has been configured in solrconfig.xml.
    n/a
    postingsFormat
    自定義張貼格式器,這需要一個感知模式編解碼器,  例如配置
    在solrconfig.xml中的SchemaCodecFactory.
    Defines a custom PostingsFormat to use for fields of this type. This
    requires that a schema-aware codec, such as the SchemaCodecFacto
    ry has been configured in solrconfig.xml.
    n/a

  • Field Element Default Properties (Field元素預設屬性)

    Property
    描述
    Description
    Values
    indexed
    如果為true,該field可被檢索查詢。
    If true, the value of the field can be used in queries to retrieve matching
    documents
    true or
    false
    stored
    如果為true,會將field的內容進行儲存,
    而且在檢索同時也會回傳該field的原始內容。
    If true, the actual value of the field can be retrieved by queries
    true or
    false
    docValues
    如果為true,field值將放置在以DocValues結構中的導向列。
    If true, the value of the field will be put in a column-oriented DocValues str
    ucture
    true or
    false
    sortMissingFirst
    sortMissingLast
    當排序field不存在時,控制文件放置的位置. 於Solr的3.5後
    適用於所有的數字型field、日期型field和包括Trie。
    Control the placement of documents when a sort field is not present. As of
    Solr 3.5, these work for all numeric fields, including Trie and date fields.
    true or
    false
    multiValued
    如果為true,表示這個filed所儲存的資料為多筆記錄。
    If true, indicates that a single document might contain multiple values for
    this field type
    true or
    false
    omitNorms(忽略準則)
    如果為true,省略了該領域相關的規範(這將停用正常長度
    索引時增壓field,並節省一些記憶體)。
    為了所有的原始(未分析)field類型,所以預設為true,例
    如整數、浮點、數據、布爾和字符串。
    只有text-field或fields需要 索引時需要升壓規範。
    If true, omits the norms associated with this field (this disables length
    normalization and index-time boosting for the field, and saves some
    memory). Defaults to true for all primitive (non-analyzed) field types, such
    as int, float, data, bool, and string. Only full-text fields or fields that need
    an index-time boost need norms.
    true or
    false
    omitTermFreqAndPositions
    (忽略詞頻率和立場)
    如果為true,省略了field的詞頻信息、定位、及為了此field的有
    效負載張貼。一但少了不需要的field信息,有可能提升額外的性
    能,而且也減少了所需要索引的存儲空間。 如果依賴已發行的field
    而使用此選項的查詢,會導致無法找到文件。在非 "text" field中,
    屬性默認為true。
    If true, omits term frequency, positions, and payloads from postings for
    this field. This can be a performance boost for fields that don't require that
    information. It also reduces the storage space required for the index.
    Queries that rely on position that are issued on a field with this option will
    silently fail to find documents. This property defaults to true for all fields
    that are not text fields.
    true or
    false
    omitPositions(忽略位置)
    類似 omitTermFreqAndPositions 但保留詞頻信息。
    Similar to omitTermFreqAndPositions but preserves term frequency
    information
    true or
    false
    termVectors
    termPositions
    termOffsets
    這個操作指示 Solr 去維護每一個文件的向量週期,任何選擇的 
    每一個期限出現向量所包含的位置及偏移量信訊。這些可以用 
    來加速高亮和其它輔助功能,但需要在索引大小方面增加相當 
    程度的成本。他們是Solr沒有必要的典型用法。
    These options instruct Solr to maintain full term vectors for each
    document, optionally including the position and offset information for each
    term occurrence in those vectors. These can be used to accelerate
    highlighting and other ancillary functionality, but impose a substantial cost
    in terms of index size. They are not necessary for typical uses of Solr
    true or
    false
    required
    如果為true,強制該Field資料是必需的,如果為NULL值會出錯
    當使用DB匯入有一對多關聯關系時,這個選項建議為false
    Instructs Solr to reject any attempts to add a document which does not
    have a value for this field. This property defaults to false.
    true or
    false


  • Field Types Included with Solr

    以下為Solr目前可以使用的field型態列表. 

    這些列表classes都包含在 org.apache.solr.schema package

    了解這些類別,有助於幫助建立自訂的 FieldType elements。

    ClassDescription
    BCDIntField
    Binary-coded decimal (BCD) integer. BCD is a relatively inefficient
    encoding that offers the benefits of quick decimal calculations and quick
    conversion to a string. This field has been deprecated and will be
    removed in Solr 5.0, use TrieIntField instead.
    BCDLongField
    Binary-coded decimal long integer. This field has been deprecated and
    will be removed in Solr 5.0, use TrieLongField instead.
    BCDStrField
    Binary-coded decimal string. This field has been deprecated and will be
    removed in Solr 5.0, use TrieIntField instead.
    BinaryField
    Binary data.
    BoolField
    Contains either true or false. Values of "1", "t", or "T" in the first character
    are interpreted as true. Any other values in the first character are
    interpreted as false.
    ByteField
    Contains a byte (an 8-bit signed integer). This field has been deprecated
    and will be removed in Solr 5.0, use TrieIntField instead.
    CollationField
    Supports Unicode collation for sorting and range queries.
    ICUCollationField is a better choice if you can use ICU4J. See the section
    Unicode Collation.
    CurrencyField
    Supports currencies and exchange rates. See the section Working with
    Currencies and Exchange Rates.
    DateField
    已不建議使用,請改用TrieDateField
    Represents a point in time with millisecond precision. See the section Wor
    king with Dates. This field has been deprecated and will be removed in
    Solr 5.0, use TrieDateField instead.
    DoubleField
    Double (64-bit IEEE floating point). This field has been deprecated and
    will be removed in Solr 5.0, use TrieDoubleField instead.
    ExternalFileField
    Pulls values from a file on disk. See the section Working with External
    Files and Processes.
    EnumField
    Allows defining an enumerated set of values which may not be easily
    sorted by either alphabetic or numeric order (such as a list of severities,
    for example). This field type takes a configuration file, which lists the
    proper order of the field values. See the section Working with Enum
    Fields for more information.
    FloatField
    Floating point (32-bit IEEE floating point). This field has been deprecated
    and will be removed in Solr 5.0, use TrieFloatField instead.
    ICUCollationField
    Supports Unicode collation for sorting and range queries. See the section
    Unicode Collation.
    IntField
    Integer (32-bit signed integer). This field has been deprecated and will be
    removed in Solr 5.0, use TrieIntField instead.
    LatLonType
    Spatial Search: a latitude/longitude coordinate pair. The latitude is
    specified first in the pair.
    LongField
    Long integer (64-bit signed integer). This field has been deprecated and
    will be removed in Solr 5.0, use TrieLongField instead.
    PointType
    Spatial Search: An arbitrary n-dimensional point, useful for searching
    sources such as blueprints or CAD drawings.
    PreAnalyzedField
    Provides a way to send to Solr serialized token streams, optionally with
    independent stored values of a field, and have this information stored and
    indexed without any additional text processing. Useful if you want to
    submit field content that was already processed by some existing external
    text processing pipeline (e.g. tokenized, annotated, stemmed, inserted
    synonyms, etc.), while using all the rich attributes that Lucene's TokenSt
    ream provides via token attributes.
    RandomSortField
    Does not contain a value. Queries that sort on this field type will return
    results in random order. Use a dynamic field to use this feature.
    ShortField
    Short integer. This field has been deprecated and will be removed in Solr
    5.0, use TrieIntField instead.
    SortableDoubleField
    The Sortable fields provide correct numeric sorting. This field has been
    deprecated and will be removed in Solr 5.0, use TrieDoubleField instead.
    SortableFloatField
    Numerically sorted floating point. This field has been deprecated and will
    be removed in Solr 5.0, use TrieFloatField instead.
    SortableIntField
    Numerically sorted integer. This field has been deprecated and will be
    removed in Solr 5.0, use TrieIntField instead.
    SortableLongField
    Numerically sorted long integer. This field has been deprecated and will
    be removed in Solr 5.0, use TrieLongField instead.
    SpatialRecursivePrefixTreeFieldType
    (RPT for short) Spatial Search: Accepts latitude comma longitude strings
    or other shapes in WKT format.
    StrField
    String (UTF-8 encoded string or Unicode).
    TextField
    Text, usually multiple words or tokens.
    TrieDateField
    Date field. Represents a point in time with millisecond precision. See the
    section Working with Dates. precisionStep="0" enables efficient date
    sorting and minimizes index size; precisionStep="8" (the default)
    enables efficient range queries.
    TrieDoubleField
    Double field (64-bit IEEE floating point). precisionStep="0" enables
    efficient numeric sorting and minimizes index size; precisionStep="8"
    (the default) enables efficient range queries.
    TrieField
    If this field type is used, a "type" attribute must also be specified, valid
    values are: integer, long, float, double, date. Using this field is the
    same as using any of the Trie fields. precisionStep="0" enables
    efficient numeric sorting and minimizes index size; precisionStep="8"
    (the default) enables efficient range queries.
    TrieFloatField
    Floating point field (32-bit IEEE floating point). precisionStep="0" en
    ables efficient numeric sorting and minimizes index size; precisionSte
    p="8" (the default) enables efficient range queries.
    TrieIntField
    Integer field (32-bit signed integer). precisionStep="0" enables
    efficient numeric sorting and minimizes index size; precisionStep="8"
    (the default) enables efficient range queries.
    TrieLongField
    Long field (64-bit signed integer). precisionStep="0" enables efficient
    numeric sorting and minimizes index size; precisionStep="8" (the
    default) enables efficient range queries.
    UUIDField
    Universally Unique Identifier (UUID). Pass in a value of "NEW" and Solr
    will create a new UUID. Note: configuring a UUIDField instance with a
    default value of "NEW" is not advisable for most users when using
    SolrCloud (and not possible if the UUID value is configured as the unique
    key field) since the result will be that each replica of each document will
    get a unique UUID value. Using UUIDUpdateProcessorFactory to
    generate UUID values when documents are added is recommended
    instead.

  1. Field Properties by Use Case

    下面總結了常見的使用案例,以及field或field type應該提供的屬性
    在使用案例列表條目中的true/false所對應的屬性必須照下表正確設置,才能正常工作。
    如果使用案例沒有提供true/false,則該屬性的設置對案件本身不會有影響。
    Use CaseindexedstoredmultiValuedomitNormstermVectorstermPositionsdocValues
    search within field
    field檢索
    true





    retrieve contents
    內容檢索

    true




    use as unique key
    採用唯一鍵值
    true
    false



    sort on field 排序field true7
    falsetrue1

    true7
    use field boosts 5 提升field效能


    false


    document boosts affect searches within
    field
    field檢索中會影響效能的文檔



    false


    highlighting 高亮 true4true

    true2true3
    faceting 5 true7




    true7
    add multiple values,
    maintaining order
    新增多個值
    維護排序


    true



    field length affects doc score
    field長度會影響文檔的分數



    false6


    MoreLikeThis 5更多類似這樣



    true

    以下為數字所代表的意思

    1. 推薦,但非必需的。
    2. 如果存在的話將被使用,但是非必需的。
    3. 如果termVectors=true
    4. 必需對field進行tokenizer定義,但是它並不需要被索引。
    5. 說明在Understanding Analyzers,Tokenizers,和Filters。
    6. Term vectors 在這裡並非是強制性的,如果不為真, 那麼會對儲存的field進行分析, 所以建議使用term vectors, 但是只有當field的stored=false時。
    7. 無論是indexed或是docValues都必需為真,但兩者不是必需的。DocValues可以在更多高效能的個案中使用。 


沒有留言:

張貼留言