Just a question (though possibly a feature request)... Is there any means (as with Special:Ask) to add a condition searching for content within entire articles (not just a marked up Has text block), but in a manner where the results might be displayed like [[Has text::]] search results per https://www.semantic-mediawiki.org/wiki/Help:Full-text_search/Searching#Search_highlighting .
I might be wrong, but I guess that SMW by design only stores annotated property values in its tables, thus making this impossible.
If you need full text searching and search result highlighting, you have to store the text as properties.
The recently added elastic search features might make it possible, but I haven't looked into those yet.
The recently added elastic search features might make it possible, but I
haven't looked into those yet.
So, yes, if we talk about unstructured text (text not explicitly
identified by a property assignment) then Elasticsearch [0] and hereby
the ElasticStore is (and will be) the only Store that supports
querying unstructured text and structured annotations together.
For example, you can query for:
[[in:some text]] (search the entire index for some text)[[in:some text]] [[Category:Foo]] [[Has number::123]] (find allsome text in the raw text and containsHas number == 123 and is a member of the Foo[[phrase:text that appears in the same order]][[not:text not to appear]]All those will return correct results especially when paging (limit,
offset set) and ordering of results is required.
Just a question (though possibly a feature request).
The support for unstructured text is only available in the
ElasticStore (with Elasticsearch as backend) given that it provides
the only technical feasible means to run text queries efficiently.
MySQL full-text index support is not equipped to handle free text in
the most efficient way [1] especially unstructured once and while
Postgres may be a bit better in this regard [2, 3, 4] it isn't
implemented [5] in SMW.
As noted, the full-text support [6] in the SQLStore (for either
MySQL or SQLite) is only provided for structured elements to achieve a
better average performance on larger text segments that don't fit into
a VCHAR index (> 255 chars, we can only search on fields that are
indexed).
Additional references on the topic of MySQL vs. Elasticsearch can be
found plenty on Google search [7, 8].
[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/usage.md#unstructured-text
[1] https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search
[2] https://blog.lateral.io/2015/05/full-text-search-in-milliseconds-with-postgresql/
[3] https://www.compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/
[4] https://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf
[5] https://github.com/SemanticMediaWiki/SemanticMediaWiki/projects/4?card_filter_query=postgres
[6] https://www.semantic-mediawiki.org/wiki/Help:Full-text_search
[7] https://scoutapp.com/blog/from-mysql-full-text-search-to-elasticsearch
[8] https://hackernoon.com/dont-waste-your-time-with-mysql-full-text-search-61f644a54dfa
Cheers
On 4/2/19, Bernhard Krabina notifications@github.com wrote:
I might be wrong, but I guess that SMW by design only stores annotated
property values in its tables, thus making this impossible.
If you need full text searching and search result highlighting, you have to
store the text as properties.The recently added elastic search features might make it possible, but I
haven't looked into those yet.--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3880#issuecomment-478728336
Indeed, the text must be stored as a property value and extending the full-text search to all text stored on a page disregarding an annotation is out of scope of SMW. Thus this is not even a feature request. Instead CirrusSearch should be used for search.
full-text search to all text stored on a page disregarding an annotation is
out of scope of SMW.
Generally, yes, except for ES as outlined above.
On 4/2/19, Karsten Hoffmeyer notifications@github.com wrote:
Indeed, the text must be stored as a property value and extending the
full-text search to all text stored on a page disregarding an annotation is
out of scope of SMW. Thus this is not even a feature request. Instead
CirrusSearch should be used for search.--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3880#issuecomment-478771829
Generally, yes, except for ES as outlined above.
So this appears to be reality already. I really should have a closer look at this.
So this appears to be reality already. I really should have a closer look at
this.
We had this enabled on the sandbox with the following settings:
{
"indexer": {
"raw.text": true,
"experimental.file.ingest": true
},
experimental.file.ingest to ingest the file content in ES (== leave
the entire file content in ES and just don't replicate it to MW)) and
make it possible to query the content as any other "text" using the
notations as outlined.
Cheers
On 4/2/19, Karsten Hoffmeyer notifications@github.com wrote:
Generally, yes, except for ES as outlined above.
So this appears to be reality already. I really should have a closer look at
this.--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3880#issuecomment-478773424
Right, cool! I should have known.
Wow, this ability to search both structured and unstructured sounds incredible. Eager to see about getting this set up on our server.
But as far as the other part of my question... Is there an equivalent means of displaying results of in and phrase as with ?Has text#-hl such that the text that is found shows up highlighted along with some surrounding context, or is it only possible to get the page titles in the results?
If not implemented, I think this would be a very handy feature to have if possible (ElasticStore only).
Btw, I don't see these awesome capabilities documented at https://www.semantic-mediawiki.org/wiki/Help:Selecting_pages where I think it would make sense to advertise them (and possibly referenced as an aside under the likes of https://www.semantic-mediawiki.org/wiki/Help:Full-text_search )
So this appears to be reality already. I really should have a closer look at this.
Indeed. It is hard to keep up with mwjames development pace :-)
Btw, I don't see these awesome capabilities documented at https://www.semantic-mediawiki.org/wiki/Help:Selecting_pages where I think it would make sense to advertise them (and possibly referenced as an aside under the likes of https://www.semantic-mediawiki.org/wiki/Help:Full-text_search )
Your are absolutely right. Please feel free to add the documentation. We are happy for any helping hands.
Regarding my question:
Is there an equivalent means of displaying results of
inandphraseas with?Has text#-hlsuch that the text that is found shows up highlighted along with some surrounding context, or is it only possible to get the page titles in the results?
Can I file this as a new issue (for displaying results of in and phrase queries with highlighted search terms in context) if there is no current way to do this?
Can I file this as a new issue (for displaying results of
inandphrase
queries with highlighted search terms in context) if there is no current way
to do this?
On the matter of unstructured content (content that is not stored in a field of a SMW table), there is no default way to show highlighted excerpts from unstructured sources.
We make one exception, SMWSearch (#3238, #3116) when ES is used has access to highlighted excerpts on unstructured text but that doesn't make it a general available feature for any other #ask query.
The support for unstructured content searches in tandem with ES is only provided as a means to extend the search horizon and enable users to broaden the query context and by that enhance the filtering experience but it is not seen as a replacement or "lazy" alternative to not structure the content.
As for the question, "Can I file this as a new issue ..." of course you can but without sounding too pessimistic it is unlikely for me to have a look at it in near future. Aside from the resourcing aspect, I would advice any person with an interest in this topic to look and implement a specific result printer that can access the Excerpt class (#3238, #3116 contains the necessary components) and coalesces the output with the standard printrequests of a query.
Thank you very much for your replies, as well as your work on the project (and the others respsonsible)!
Regarding the "lazy" alternative to structuring content, I would think though that, except for sites which might wish to opt out because of either having little structured content鈥攅.g., being data-centric rather than document-centric鈥攁nd/or because of concerns with the presumably larger indexes on-by-default full-text-indexing would all create, that as with all good inventions, time-saving (aka "laziness") is indeed a worthy goal, especially when I see no semantic difference between an explicitly marked up "full text" block and one applied by default (unless one wished to inherit from "full text" and make different semantic properties for it, in which case I see the need for this type, but that is a different matter). (I also don't know whether the full text tags have any inherent limitations on size or allowable characters--I seem to remember trying and finding a limit to how large the block could be as well as markup it could allow.)
This may be "above my pay grade" to take on the task, at least or especially given what I already have on my plate, but I would think such an issue may appeal to others, so with your acceptance, I've gratefully filed a separate issue to at least have a dedicated place for raising the idea and desire for it, regardless of when it may or may not be possible for someone to take a look at it: #3910.
I installed SMW with Elastic without problems but I don't no where to configure these setting: The "normal" Mediawiki-Search expects at least 4 letters and then a wildcard * is doing fine. With 3 letters and wildcard there are no search results. 3 letters without * are also fine.
Where can I change the settings that the search string prefix could be shorter then 4 letters? I could fin it in the documentation [0]
configure these setting: The "normal" Mediawiki-Search expects at least 4
letters and then a wildcard * is doing fine. With 3 letters and wildcard
there are no search results. 3 letters without * are also fine.
SMW doesn't impose any length restrictions on either the index nor
query side. So, given that you made a reference to "Mediawiki-Search"
I assume you mean Special:Search and in order for SMW to work there,
you need to have SMWSearch [0, 1, 2] enabled to provide the interface
to SMW and Elasticsearch.
Did I misunderstand the issue? Can you demonstrate your issue on the
sandbox [3]? (PS: It uses Elasticsearch together with SMW)
[4] shows a search (+debug output) for a two letter search.
[0] https://www.semantic-mediawiki.org/wiki/Help:SMWSearch
[1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/usage.md#specialsearch-integration
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/faq.md#mediawiki-semantic-mediawiki-and-elasticsearch
[3] https://sandbox.semantic-mediawiki.org
[4] https://sandbox.semantic-mediawiki.org/w/index.php?title=Sp%C3%A9cial:Requ%C3%AAter&q=%5B%5B%7E%2Alo%2A%5D%5D&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3D-26hellip-3B-20autres-20r%C3%A9sultats%2Fclass%3Dsortable-20wikitable-20smwtable&sort=&order=asc&eq=yes&offset=0&limit=250&debug=true#search
On 7/9/19, m-art-in notifications@github.com wrote:
I installed SMW with Elastic without problems but I don't no where to
configure these setting: The "normal" Mediawiki-Search expects at least 4
letters and then a wildcard * is doing fine. With 3 letters and wildcard
there are no search results. 3 letters without * are also fine.Where can I change the settings that the search string prefix could be
shorter then 4 letters?--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3880#issuecomment-509515448
SMW doesn't impose any length restrictions on either the index nor query side. So, given that you made a reference to "Mediawiki-Search" I assume you mean
Special:Searchand in order for SMW to work there, you need to have SMWSearch [0, 1, 2] enabled to provide the interface to SMW and Elasticsearch.
Yes I'm talking about Special:Search (not Special:Ask).
Can you demonstrate your issue on the sandbox [3]? (PS: It uses Elasticsearch together with SMW)
I can't reproduce my issue there. The search string e.g. 'Exp' finds all expected sites with "Export", "Expert" etc. In my wiki I would have to type at least 4 letters and 'Exp' has no search results.
So it's not a bug and my config seems to be different. I'll dig into that and will read the listed documentation carefully.
I can't reproduce my issue there. The search string e.g. 'Exp' finds all
expected sites with "Export", "Expert" etc. In my wiki I would have to type
at least 4 letters and 'Exp' has no search results.
Please keep in mind that using SMW in Special:Search requires to use
either the standard #ask syntax [[~exp*]] (or [[Has foo::bar]]) or
some of the provided prefix tags in:exp (in:, phrase:, has:) because
without them, a SMW query is not triggered and the search request is
forwarded to the standard MediaWiki search engine.
PS: The sandbox also uses CirrusSearch as fallback engine which is
why just entering exp* will forward the request to CirrusSearch.
On 7/15/19, m-art-in notifications@github.com wrote:
SMW doesn't impose any length restrictions on either the index nor query
side. So, given that you made a reference to "Mediawiki-Search" I assume
you meanSpecial:Searchand in order for SMW to work there, you need to
have SMWSearch [0, 1, 2] enabled to provide the interface to SMW and
Elasticsearch.Yes I'm talking about Special:Search (not Special:Ask).
Can you demonstrate your issue on the sandbox [3]? (PS: It uses
Elasticsearch together with SMW)I can't reproduce my issue there. The search string e.g. 'Exp' finds all
expected sites with "Export", "Expert" etc. In my wiki I would have to type
at least 4 letters and 'Exp' has no search results.So it's not a bug and my config seems to be different. I'll dig into that
and will read the listed documentation carefully.--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3880#issuecomment-511343088
Thank you very much for your explanation. So in the Sandbox Wiki CirrusSearch invokes the different behavior. The SemanticSearch works fine in my wiki.
With Cirrus set up, [[in:text]] only seems to be searching for "text" if within the page title. Do I have something configured wrong?
With Cirrus set up, [[in:text]] only seems to be searching for "text"
As outlined in [0, #3939] SMW and Cirrus don't share any indices, SMW and Elasticsearch do work independent of "CirrusSearch". Now, using something like [[in:text]] as query syntax (provided by SMW) requires appropriate settings as outlined in [1] and for the support of unstructured text, [2] has an extended explanation about what can be expected and what needs to be configured.
Some example queries that usable against the sandbox can be found in the [3] thread.
If you think the mentioned documents don't contain the information required you were looking for then please open a new ticket with a specific description of what information is missing.
[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/faq.md#mediawiki-semantic-mediawiki-and-elasticsearch
[1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/config.md
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/replication.md#structured-and-unstructured-data
[3] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/4488#issuecomment-581007052
Most helpful comment
As outlined in [0, #3939] SMW and Cirrus don't share any indices, SMW and Elasticsearch do work independent of "CirrusSearch". Now, using something like
[[in:text]]as query syntax (provided by SMW) requires appropriate settings as outlined in [1] and for the support of unstructured text, [2] has an extended explanation about what can be expected and what needs to be configured.Some example queries that usable against the sandbox can be found in the [3] thread.
If you think the mentioned documents don't contain the information required you were looking for then please open a new ticket with a specific description of what information is missing.
[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/faq.md#mediawiki-semantic-mediawiki-and-elasticsearch
[1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/config.md
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/replication.md#structured-and-unstructured-data
[3] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/4488#issuecomment-581007052