Describe the bug
When writing a post in Chinese the word count shown in the content structure does not show an accurate word count.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When using the same content in word processor like Pages the word count is significantly larger than the one shown in Gutenberg. The expected behavior would be to have an accurate word count independent of the language used.
Screenshots
You can see a large amount of content but it is showing only 10 words

Desktop (please complete the following information):
It looks like this happens because in some languages words may not be separated by spaces. e.g: 这是鸟 means "This is a bird" and it was 3 words without a single character space.
Counting words is a complex problem, in some languages, the best approach in some cases may be count each character and use a character to word ratio but then we may have docs that mix languages so we need to identify the best method to use per segment.
The following external link describes an algorithm used to count words https://docs.sdl.com/LiveContent/content/en-US/SDL%20WorldServer-v3/GUID-376E123B-1C7E-4D64-82B0-1D33F088ABD5 it may be helpful for this issue.
@jorgefilipecosta I think there may be two ways to fix this bug. One way is like the atom-word-counter, we will present both the count of words(based on English words) and the count of characters (All kinds of characters excluding white space). This only requires little changes in the UI.

Another way is like the MS Office word, if we count the words in sentences mixed with East Asian languages and Latin languages like "Hello ä½ å¥½", there are three words (one English word+two Chinese characters). This requires a significant change in the count function, especially the matchWords

Which one is better? Does anyone have any idea?
Thank you for summarizing and sharing your thoughts @Jackie6.
I am also not sure which option is better in a case like this, cc: @jasmussen, @mapk, @kjellr in case you have some thoughts on this.
Great ticket. It seems like the two options presented appear to be the "easy" version (count words and characters), and the hard version (be aware of the language when counting words).
It _seems_ like the latter is the better user experience, but it could be so difficult that unless we get solid pull requests it may take a while for this to appear. Whereas for the former, it's probably both easy to build, and a character count could likely be useful regardless of language.
Keeping in mind we mean to merge the Document Outline tool with the Block Navigation tool, we could possibly build solution 1 at the same time, and then consider upgrading to version 2 at a later time?
@sandymcfadden, there is #14589 opened with a proposal of how to resolve this issue as suggested in the discussion above.
Related: #24823 was merged, but it seems this issue is still relevant.
cc @david-szabo97
@swissspidy Ugh, this is a difficult topic.
IMHO, the best would be to move to the list view and do what Google Docs does.
Show Words, Characters, and Characters excluding spaces. Even though Words is not useful information for languages like Chinese or Japanese, I don't think we can accurately handle all the languages. If we show all the three variations, then we can leave it to the writer to decide which information is useful to him/her.
