Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ General ➜ Problem about auto-wrap in non-English MUD

Problem about auto-wrap in non-English MUD

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  

Posted by Zhenzh   China  (68 posts)  Bio
Date Tue 20 Mar 2018 06:29 AM (UTC)
Message
I'm playing a Chinese MUD in which both Chinese and English may exist in one line.

Each Chinese word occupies 2 characters while English word occupies only 1 character.

Which means I can not predict the exact position of a separator when the length of a line exceeds the specified wrap width as fixed position may divide a Chinese word into two half.

I have an idea in my mind to resolve such problem:
1. Capturing all lines from MUD server.
2. Directly print lines within wrap width.
3. For lines out of limit, using string.byte() to count all characters and get the position of the separator avoiding dividing Chinese word.
4. Loop print separated lines

It should be able to resolve the auto-wrap problem. But I'm worrying about the performance of such method as it may cost huge of additional computing resource and may impact on the speed of the whole MUD.

Does anyone has similar experience and any better method to resolving the problem?
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #1 on Tue 20 Mar 2018 06:43 AM (UTC)
Message
What do you mean by a separator? The program currently wraps at spaces, so it shouldn't split a word into two.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #2 on Tue 20 Mar 2018 06:44 AM (UTC)
Message
Do you mean that it wraps too early?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #3 on Tue 20 Mar 2018 07:51 AM (UTC)
Message
Yes, it may wrap too early so that Chinese word will be divided into two half.

For example, the auto-wrap width is set to 100 characters. The 101st character will be printed in a new line. While the 100th word is a Chinese word which occupies 2 characters(100th and 101st). Auto-wrap at the end of 100th character may divide the Chinese.

Nick Gammon said:

The program currently wraps at spaces


Different from English, Chinese text don't use space separating two words. If a line doesn't contain any space character, what will happen?
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #4 on Thu 22 Mar 2018 05:37 AM (UTC)
Message
I guess wrap will separate the specified column when no space characters in the line.

It do be a problem for all non-alphabet languages as space will not be the separator between two words.
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #5 on Thu 22 Mar 2018 06:41 AM (UTC)

Amended on Thu 22 Mar 2018 06:45 AM (UTC) by Nick Gammon

Message
This plugin may help:

Template:saveplugin=Split_long_Big5_lines To save and install the Split_long_Big5_lines plugin do this:
  1. Copy between the lines below (to the Clipboard)
  2. Open a text editor (such as Notepad) and paste the plugin into it
  3. Save to disk on your PC, preferably in your plugins directory, as Split_long_Big5_lines.xml
  4. Go to the MUSHclient File menu -> Plugins
  5. Click "Add"
  6. Choose the file Split_long_Big5_lines.xml (which you just saved in step 3) as a plugin
  7. Click "Close"



<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>

<muclient>
<plugin
   name="Split_long_Big5_lines"
   author="Nick Gammon"
   id="19e492fbb1ac8429ad1f9b1b"
   language="Lua"
   purpose="Splits long Big5 lines"
   date_written="2018-03-22 17:11:23"
   requires="4.97"
   version="1.0"
   >
<description trim="y">
<![CDATA[
Breaks Big5 chinese lines to not be inbetween characters.
]]>
</description>

</plugin>


<!--  Script  -->


<script>
<![CDATA[

local LINE_LIMIT = 40

local big5 = "[\129-\254][\064-\126\161-\254]"
-- handle incoming packet
function OnPluginPacketReceived (sText)
  return (string.gsub (sText, "(" .. string.rep (big5, LINE_LIMIT) .. ")", "%1 "))
end -- function
]]>
</script>


</muclient>


What that does is look for 40 lots of "Big5" characters (0x81 to 0xFE then 0x40 to 0x7E or 0xA1 to 0xFE) and inserts a space after them.

I'm not sure how well that will work for you, if you have non-Chinese characters in the middle then the count will be out. However by inserting a space that should let MUSHclient wrap at the end of a character and not in the middle of it.

Change the "40" in the plugin to be some other number if you want to tweak the splitting point.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #6 on Thu 22 Mar 2018 10:14 AM (UTC)
Message
It might be possible to make the client handle this better, if the wrapping algorithm was improved. At present it wraps at the closest space to the end of the line (when the limit is reached). Conceivably, if no space was found, it could backtrack to the most recent double-byte character.

Would I be correct in thinking:


  • The lines in question have no spaces at all.
  • You are using Big5 or GB2132 encoding?
  • You are not using Unicode? (see the UTF-8 checkbox in the output window configuration)

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #7 on Thu 22 Mar 2018 10:00 PM (UTC)
Message
I think I may have fixed this problem. Don't use the plugin, however download the latest pre-release version of MUSHclient as described here:

http://www.gammon.com.au/forum/?id=13903

Replace your MUSHclient.exe file with the newer one. This causes a line-wrap to occur at the start of a Big5 or GB2132 character rather than in the middle of it.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #8 on Fri 23 Mar 2018 04:40 AM (UTC)
Message
So cool. I have tried the per-release. It do works well for my GB2132 characters.

One comment, seems the space character has higher priority to separate a line which may get an English/non-English mixed line wrap earlier.

I recommend get space and double-byte character using same priority to get a mixed line wrapped at the closest column of line limit in big5/gb2312 environment.
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #9 on Fri 23 Mar 2018 04:52 AM (UTC)
Message
Is this a real problem? You say it "may" happen. It is happening enough to be annoying?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #10 on Fri 23 Mar 2018 05:15 AM (UTC)

Amended on Fri 23 Mar 2018 05:16 AM (UTC) by Zhenzh

Message
Yes, it happened in my environment
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #11 on Fri 23 Mar 2018 05:34 AM (UTC)
Message
Suppose I have an mixed line:
Quote:

English1 English2 English3 Chinese1Chinese2Chinese3

The line limit is specified at the column of "Chinese2".
Note: EnglishX stands for an English word, ChineseX for a Chinese word


The expected result will be:
Quote:

English1 English2 English3 Chinese1Chinese2
Chinese3


The current real result is:
Quote:

English1 English2 English3
Chinese1Chinese2Chinese3
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #12 on Sat 24 Mar 2018 12:33 AM (UTC)
Message
OK, reworked the line splitting somewhat so it should split at the last Big5 character or a space, whichever comes last.

Grab the latest pre-release version as described earlier.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Zhenzh   China  (68 posts)  Bio
Date Reply #13 on Sat 24 Mar 2018 06:20 PM (UTC)
Message
The new version works very well. Thank you for your support.
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #14 on Sun 25 Mar 2018 05:59 AM (UTC)
Message
Just out of curiosity, did you (or anyone) every translate the menus and dialog boxes into Chinese? In particular see this thread:

http://gammon.com.au/forum/?id=7953

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


60,721 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.