Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ General
➜ Problem about auto-wrap in non-English MUD
|
Problem about auto-wrap in non-English MUD
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Tue 20 Mar 2018 06:29 AM (UTC) |
| Message
| I'm playing a Chinese MUD in which both Chinese and English may exist in one line.
Each Chinese word occupies 2 characters while English word occupies only 1 character.
Which means I can not predict the exact position of a separator when the length of a line exceeds the specified wrap width as fixed position may divide a Chinese word into two half.
I have an idea in my mind to resolve such problem:
1. Capturing all lines from MUD server.
2. Directly print lines within wrap width.
3. For lines out of limit, using string.byte() to count all characters and get the position of the separator avoiding dividing Chinese word.
4. Loop print separated lines
It should be able to resolve the auto-wrap problem. But I'm worrying about the performance of such method as it may cost huge of additional computing resource and may impact on the speed of the whole MUD.
Does anyone has similar experience and any better method to resolving the problem? | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #1 on Tue 20 Mar 2018 06:43 AM (UTC) |
| Message
| | What do you mean by a separator? The program currently wraps at spaces, so it shouldn't split a word into two. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #2 on Tue 20 Mar 2018 06:44 AM (UTC) |
| Message
| | Do you mean that it wraps too early? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #3 on Tue 20 Mar 2018 07:51 AM (UTC) |
| Message
| Yes, it may wrap too early so that Chinese word will be divided into two half.
For example, the auto-wrap width is set to 100 characters. The 101st character will be printed in a new line. While the 100th word is a Chinese word which occupies 2 characters(100th and 101st). Auto-wrap at the end of 100th character may divide the Chinese.
Nick Gammon said:
The program currently wraps at spaces
Different from English, Chinese text don't use space separating two words. If a line doesn't contain any space character, what will happen? | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #4 on Thu 22 Mar 2018 05:37 AM (UTC) |
| Message
| I guess wrap will separate the specified column when no space characters in the line.
It do be a problem for all non-alphabet languages as space will not be the separator between two words. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #5 on Thu 22 Mar 2018 06:41 AM (UTC) Amended on Thu 22 Mar 2018 06:45 AM (UTC) by Nick Gammon
|
| Message
| This plugin may help:
 |
To save and install the Split_long_Big5_lines plugin do this:
- Copy between the lines below (to the Clipboard)
- Open a text editor (such as Notepad) and paste the plugin into it
- Save to disk on your PC, preferably in your plugins directory, as Split_long_Big5_lines.xml
- Go to the MUSHclient File menu -> Plugins
- Click "Add"
- Choose the file Split_long_Big5_lines.xml (which you just saved in step 3) as a plugin
- Click "Close"
|
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<muclient>
<plugin
name="Split_long_Big5_lines"
author="Nick Gammon"
id="19e492fbb1ac8429ad1f9b1b"
language="Lua"
purpose="Splits long Big5 lines"
date_written="2018-03-22 17:11:23"
requires="4.97"
version="1.0"
>
<description trim="y">
<![CDATA[
Breaks Big5 chinese lines to not be inbetween characters.
]]>
</description>
</plugin>
<!-- Script -->
<script>
<![CDATA[
local LINE_LIMIT = 40
local big5 = "[\129-\254][\064-\126\161-\254]"
-- handle incoming packet
function OnPluginPacketReceived (sText)
return (string.gsub (sText, "(" .. string.rep (big5, LINE_LIMIT) .. ")", "%1 "))
end -- function
]]>
</script>
</muclient>
What that does is look for 40 lots of "Big5" characters (0x81 to 0xFE then 0x40 to 0x7E or 0xA1 to 0xFE) and inserts a space after them.
I'm not sure how well that will work for you, if you have non-Chinese characters in the middle then the count will be out. However by inserting a space that should let MUSHclient wrap at the end of a character and not in the middle of it.
Change the "40" in the plugin to be some other number if you want to tweak the splitting point. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #6 on Thu 22 Mar 2018 10:14 AM (UTC) |
| Message
| It might be possible to make the client handle this better, if the wrapping algorithm was improved. At present it wraps at the closest space to the end of the line (when the limit is reached). Conceivably, if no space was found, it could backtrack to the most recent double-byte character.
Would I be correct in thinking:
- The lines in question have no spaces at all.
- You are using Big5 or GB2132 encoding?
- You are not using Unicode? (see the UTF-8 checkbox in the output window configuration)
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #7 on Thu 22 Mar 2018 10:00 PM (UTC) |
| Message
| I think I may have fixed this problem. Don't use the plugin, however download the latest pre-release version of MUSHclient as described here:
http://www.gammon.com.au/forum/?id=13903
Replace your MUSHclient.exe file with the newer one. This causes a line-wrap to occur at the start of a Big5 or GB2132 character rather than in the middle of it. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #8 on Fri 23 Mar 2018 04:40 AM (UTC) |
| Message
| So cool. I have tried the per-release. It do works well for my GB2132 characters.
One comment, seems the space character has higher priority to separate a line which may get an English/non-English mixed line wrap earlier.
I recommend get space and double-byte character using same priority to get a mixed line wrapped at the closest column of line limit in big5/gb2312 environment. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #9 on Fri 23 Mar 2018 04:52 AM (UTC) |
| Message
| | Is this a real problem? You say it "may" happen. It is happening enough to be annoying? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #10 on Fri 23 Mar 2018 05:15 AM (UTC) Amended on Fri 23 Mar 2018 05:16 AM (UTC) by Zhenzh
|
| Message
| | Yes, it happened in my environment | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #11 on Fri 23 Mar 2018 05:34 AM (UTC) |
| Message
| Suppose I have an mixed line:
Quote:
English1 English2 English3 Chinese1Chinese2Chinese3
The line limit is specified at the column of "Chinese2".
Note: EnglishX stands for an English word, ChineseX for a Chinese word
The expected result will be:
Quote:
English1 English2 English3 Chinese1Chinese2
Chinese3
The current real result is:
Quote:
English1 English2 English3
Chinese1Chinese2Chinese3
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #12 on Sat 24 Mar 2018 12:33 AM (UTC) |
| Message
| OK, reworked the line splitting somewhat so it should split at the last Big5 character or a space, whichever comes last.
Grab the latest pre-release version as described earlier. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Zhenzh
China (68 posts) Bio
|
| Date
| Reply #13 on Sat 24 Mar 2018 06:20 PM (UTC) |
| Message
| | The new version works very well. Thank you for your support. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #14 on Sun 25 Mar 2018 05:59 AM (UTC) |
| Message
| Just out of curiosity, did you (or anyone) every translate the menus and dialog boxes into Chinese? In particular see this thread:
http://gammon.com.au/forum/?id=7953 |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
60,721 views.
This is page 1, subject is 2 pages long: 1 2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top