lucene – Row Coding

How to use a Lucene Analyzer to tokenize a String?

September 26, 2023 by Tarik

Based off of the answer above, this is slightly modified to work with Lucene 4.0. public final class LuceneUtil { private LuceneUtil() {} public static List<String> tokenizeString(Analyzer analyzer, String string) { List<String> result = new ArrayList<String>(); try { TokenStream stream = analyzer.tokenStream(null, new StringReader(string)); stream.reset(); while (stream.incrementToken()) { result.add(stream.getAttribute(CharTermAttribute.class).toString()); } } catch (IOException e) { … Read more

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

September 24, 2023 by Tarik

SQL Server FTS is going to be easier to manage for a small deployment. Since FTS is integrated with the DB, the RDBMS handles updating the index automatically. The con here is that you don’t have an obvious scaling solution short of replicating DB’s. So if you don’t need to scale, SQL Server FTS is … Read more

Difference between BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene

September 20, 2023 by Tarik

BooleanClause.Occur.SHOULD means that the clause is optional, whereas BooleanClause.Occur.Must means that the clause is compulsory. However, if a boolean query only has optional clauses, at least one clause must match for a document to appear in the results. For better control over what documents match a BooleanQuery, there is also a minimumShouldMatch parameter which lets … Read more

ElasticSearch – Searching For Human Names

September 9, 2023 by Tarik

First, I recreated your current configuration in Play: https://www.found.no/play/gist/867785a709b4869c5543 If you go there, switch to the “Analysis”-tab to see how the text is transformed: Note, for example that Heaney ends up tokenized as [hn, heanei] with the search_analyzer and as [HN, heanei] with the index_analyzer. Note the case-difference for the metaphone-term. Thus, that one is … Read more

What is best and most active open source .Net search technology?

September 7, 2023 by Tarik

While they were no ‘full blown’ releases (i.e. full documentation, web site updates) of Lucene.Net for quite some time, there are still fresh commits to its SVN repository. The latest release (2.3.2) for example was tagged in 07/24/09 (see here). Since the development is still active I would use it for new full-text-search projects.

Search engine Lucene vs Database search

September 6, 2023 by Tarik

I suggest you read Full Text Search Engines vs. DBMS. A one-liner would be: If the bulk of your use case is full text search, use Lucene. If the bulk of your use case is joins and other relational operations, use a database. You may use a hybrid solution for a more complicated use case.

solr search for documents where a field doesn’t exist

September 6, 2023 by Tarik

-field:[* TO *] In SolrNet, use a negated SolrHasValueQuery

How can I search on a list of values using Solr/Lucene?

September 4, 2023 by Tarik

Use field:(value1 value2 value3) or if your default operator is AND then use field:(value1 OR value2 OR value3)

Entity Extraction/Recognition with free tools while feeding Lucene Index

August 26, 2023 by Tarik

Lucene Score results

August 20, 2023 by Tarik

The scoring contains the Inverse Document Frequency(IDF). If the term “John Smith” is in one partition, 0, 100 times and in partition 1, once. The score for searching for John Smith would be higher search in partition 1 as the term is more scarce. To get round this you would wither have to have your … Read more