Friday, March 23, 2012

Hidden Ignored Words?

We are getting the dreaded "A clause of the query contained only
ignored words." error when a user searches for the following term:
"3.0" (without quotes).
I've checked all the noise word lists and "3.0" is not in there. "3" is
however.
Also, when one searches for "3.1" it's fine - no error.
When one searches for "3.00" it's also fine.
How can one explain this and which other "hidden" noise words should we
add to our noise word filter (in the search form)?
remove the 3 and the 0 from your noise word lists. You will need to rebuild
the catalogs!
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Jack" <cawoodm@.gmail.com> wrote in message
news:1140526298.751202.15490@.f14g2000cwb.googlegro ups.com...
> We are getting the dreaded "A clause of the query contained only
> ignored words." error when a user searches for the following term:
> "3.0" (without quotes).
> I've checked all the noise word lists and "3.0" is not in there. "3" is
> however.
> Also, when one searches for "3.1" it's fine - no error.
> When one searches for "3.00" it's also fine.
> How can one explain this and which other "hidden" noise words should we
> add to our noise word filter (in the search form)?
>
|||Aha! It's because of the "." being a word-breaker - is that correct?
Shall I remove it from the noise_eng file? or the enu...
I also noticed that a standalone "#" was being treated as a noise word
and throwing that ugly error. Can you explain that one?
Many thanks
|||Yes . is a word or token boundary (actually there are some rules about
this). # is not a noise word when prefaced by a c, j or f.
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Jack" <cawoodm@.gmail.com> wrote in message
news:1140543145.061487.313930@.f14g2000cwb.googlegr oups.com...
> Aha! It's because of the "." being a word-breaker - is that correct?
> Shall I remove it from the noise_eng file? or the enu...
> I also noticed that a standalone "#" was being treated as a noise word
> and throwing that ugly error. Can you explain that one?
> Many thanks
>
|||I actually need to understand where I could see a complete list of
noise words. # is not in the noise word list. Is it a token boundary?
Where is the complete list of these?
Currently we are fighting fires as these errors occur.
|||The noise words are in the noise word lists. Some characters have special
significance - for example C#, and C++.
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Jack" <cawoodm@.gmail.com> wrote in message
news:1140593482.893539.110220@.o13g2000cwo.googlegr oups.com...
>I actually need to understand where I could see a complete list of
> noise words. # is not in the noise word list. Is it a token boundary?
> Where is the complete list of these?
> Currently we are fighting fires as these errors occur.
>

No comments:

Post a Comment