Menu

Lucen and underscores

Help
Jim Murray
2023-08-29
2023-09-05
  • Jim Murray

    Jim Murray - 2023-08-29

    hi Ulf,
    is there any easy way to change the Lucene Analyzer used by Jtrac to allow underscores in words ?
    The standard analyzer will word-break at underscores, but what if you wish to maintain words containing underscores?
    thanks
    jim

     
  • Ulf Dittmer

    Ulf Dittmer - 2023-08-29

    When I create a ticket with "under_score" in it, I can find it by both searching for "score" and "under_score" (which makes sense, as both indexing and searching should be using the same analyzer).

    What behavior would you like to see: not finding it when searching for "score"? Or something else?

     
  • Ulf Dittmer

    Ulf Dittmer - 2023-08-29

    And to answer the actual question - no, there is no easy way to change the analyzer through a config change. But I wouldn't mind added configurability for a good use case.

     
  • Jim Murray

    Jim Murray - 2023-08-30

    thanks Ulf.
    its more about the false positives than not eventually finding a hit you actually wanted.
    with technical and IT text its not unusual to have compound words for object names etc
    eg 'This_is_my_very_special_routine_name'. ( a lot of older source code does not use bumpy or camel case for routine names etc, and if maintaining issues on that code base, hyphenated and underscored words are common)
    the only hit I would be interested in would be when the whole compound word/phrase matches, not interested in hits of any of the component words by themselves. Also although small words such as 'is' might be dropped as noise during indexing, they may form an important part of what you are really interested in finding, eg we dont want to see 'This_was_my_very_special_routine_name' as a hit, its an entirely different object.
    trying fancy searches like weighting and proximity is tedious and error prone and if an important component word of the compound name is dropped as noise word by the indexer an exact match cant be guaranteed.

    a simple white-space analyzer would suffice I think for a lot of simple searches on the issues in my case.

    i tried to figure out how to use the spring framework with its lucene components, so I could build by own version of the indexer and search for Jtrac but have no experience in that area, so failed miserably.

     

    Last edit: Jim Murray 2023-08-30
  • Ulf Dittmer

    Ulf Dittmer - 2023-09-05

    I see. I'm a bit swamped now, and about to go on vacation, but I'll look into what might be involved in adding a bit of configurability in a few weeks time.

     

Log in to post a comment.

MongoDB Logo MongoDB