Location>code7788 >text

【Travel Series】The result of using Chinese string sorting is incorrect

Popularity:335 ℃/2025-04-25 13:18:55

1. Experience of trapping

Suppose there is such a business scenario that the order quantity of each city needs to be sorted, and the sorting rules are:

First, arrange it in reverse order according to the order quantity, and then arrange it in positive order according to the city name.

Sample code:

import ;
import ;
import ;

@Getter
@Setter
@ToString
public class OrderStatisticsInfo {
    private String cityName;
    private Integer orderCount;

    public OrderStatisticsInfo(String cityName, Integer orderCount) {
         = cityName;
         = orderCount;
    }
}
public static void main(String[] args) {
     List<OrderStatisticsInfo> orderStatisticsInfoList = (
             new OrderStatisticsInfo("Shanghai", 1000),
             new OrderStatisticsInfo("Beijing", 1000),
             new OrderStatisticsInfo("Chengdu", 700),
             new OrderStatisticsInfo("Changzhou", 700),
             new OrderStatisticsInfo("Guangzhou", 900),
             new OrderStatisticsInfo("Shenzhen", 800)
     );

     ((OrderStatisticsInfo::getOrderCount, ())
             .thenComparing(OrderStatisticsInfo::getCityName));
     (::println);
 }

Expected results:

Beijing 1000

Shanghai 1000

Guangzhou 900

Shenzhen 800

Changzhou 700

Chengdu 700

Actual results:

OrderStatisticsInfo(cityName=Shanghai, orderCount=1000)
OrderStatisticsInfo(cityName=Beijing, orderCount=1000)
OrderStatisticsInfo(cityName=Guangzhou, orderCount=900)
OrderStatisticsInfo(cityName=Shenzhen, orderCount=800)
OrderStatisticsInfo(cityName=Changzhou, orderCount=700)
OrderStatisticsInfo(cityName=Chengdu, orderCount=700)

From the above results, it can be seen that there is no problem with the reverse order according to the order quantity, but the positive order according to the city name does not meet expectations:

Shanghai actually ranked ahead of Beijing, but the order between Changzhou and Chengdu is correct.

2. Cause analysis

When sorting string types, the default is to use the natural sort of strings, that is,StringofcompareToMethod, the method is based on

Unicode encoded values ​​are compared, and the language-specific character order (such as Chinese pinyin) is not considered.

Take a look firstStringofcompareToSource code of the method:

public int compareTo(String anotherString) {
    int len1 = ;
    int len2 = ;
    int lim = (len1, len2);
    char v1[] = value;
    char v2[] = ;

    int k = 0;
    while (k < lim) {
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;
}

Taking the comparison between Shanghai and Beijing as an example, first compare the first character, that is, the character and character north. The corresponding Unicode encoding value on the character is 19978, so c1 = 19978,

The Unicode encoding value corresponding to the character north is 21271, so c2 = 21271, because c1 != c2, the return value is -1293,

In other words, Shanghai is smaller than Beijing (that must be ahead of Beijing), which is not in line with expectations.

Taking the comparison between Changzhou and Chengdu as an example, first compare the first character, that is, the characters are often made of characters. The Unicode encoding value corresponding to the characters is 24120, so c1 = 24120,

The corresponding Unicode encoding value of the character is 25104, so c2 = 25104, because c1 != c2, the return value is -984,

In other words, Changzhou is smaller than Chengdu (which must be ranked first in Chengdu), which meets expectations.

Can be passedMethod to get the Unicode encoded value of the character:

// Output: 19978
 (("Shanghai", 0));
 // Output: 21271
 (("Beijing", 0));
 // Output: 24120
 (("Changzhou", 0));
 // Output: 25104
 (("Chengdu", 0));

3. Solution

Java provides localized sorting rules, which can be sorted by specific language rules (such as Chinese pinyin), and the code is as follows:

((OrderStatisticsInfo::getOrderCount, ())
                .thenComparing(OrderStatisticsInfo::getCityName, ()));
(::println);

The output result at this time is:

OrderStatisticsInfo(cityName=Beijing, orderCount=1000)
OrderStatisticsInfo(cityName=Shanghai, orderCount=1000)
OrderStatisticsInfo(cityName=Guangzhou, orderCount=900)
OrderStatisticsInfo(cityName=Shenzhen, orderCount=800)
OrderStatisticsInfo(cityName=Changzhou, orderCount=700)
OrderStatisticsInfo(cityName=Chengdu, orderCount=700)

It can be seen that Beijing ranks ahead of Shanghai, which meets expectations.

The above code specifies(), no longer executed when sorting and comparingStringofcompareTomethod,

But executeCollatorofcompareThe method is actuallyRuleBasedCollatorofcomparemethod.

You can execute the following code to view the comparison results of Shanghai and Beijing separately:

Collator collarator = ();
 // Output: 1, means that Shanghai is greater than Beijing, that is, it must be ranked behind Beijing
 (("Shanghai", "Beijing"));

The article continues to be updated, welcome to follow the WeChat public account "Shencheng Strange People" to read it as soon as possible!