字符串处理是非常常用的技能,但 Python 内置字符串方法太多,常常遗忘,为了便于快速参考,特地依据 Python 3.5.1 给每个内置方法写了示例并进行了归类,便于大家索引。
字符串格式输出str.center(width[, fillchar])
str.ljust(width[, fillchar]); str.rjust(width[, fillchar])
str.format(^args, ^^kwargs)
字符串搜索定位与替换str.count(sub[, start[, end]])
str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])
str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])
str.replace(old, new[, count])
str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])
static str.maketrans(x[, y[, z]]); str.translate(table)
str.partition(sep); str.rpartition(sep)
str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)
字符串条件判断str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])
str.isdecimal(); str.isdigit(); str.isnumeric()
字符串编码str.encode(encoding="utf-8", errors="strict")
大小写转换 str.capitalize()将首字母转换成大写,需要注意的是如果首字没有大写形式,则返回原字符串。
"adi dog".capitalize() # "Adi dog" "abcd 徐".capitalize() # "Abcd 徐" "徐 abcd".capitalize() # "徐 abcd" "ß".capitalize() # "SS"str.lower()
将字符串转换成小写,其仅对 ASCII 编码的字母有效。
"DOBI".lower() # "dobi" "ß".lower() # "ß" 为德语小写字母,其有另一种小写 "ss", lower 方法无法转换 # "ß" "徐 ABCD".lower() # "徐 abcd"str.casefold()
将字符串转换成小写,Unicode 编码中凡是有对应的小写形式的,都会转换。
"DOBI".casefold() # "dobi" "ß".casefold() #德语中小写字母 ß 等同于小写字母 ss, 其大写为 SS # "ss"str.swapcase()
"徐Dobi a123 ß".swapcase() #: "徐dOBI A123 SS" 这里的 ß 被转成 SS 是一种大写
但需要注意的是 s.swapcase().swapcase() == s 不一定为真:
u"xb5" # "µ" u"xb5".swapcase() # "Μ" u"xb5".swapcase().swapcase() # "μ" hex(ord(u"xb5".swapcase().swapcase())) Out[154]: "0x3bc"
这里 "Μ"(是 mu 不是 M) 的小写正好与 "μ" 的写法一致。
"Hello world".title() # "Hello World" "中文abc def 12gh".title() # "中文Abc Def 12Gh" # 但这个方法并不完美: "they"re bill"s friends from the UK".title() # "They"Re Bill"S Friends From The Uk"str.upper()
"中文abc def 12gh".upper() # "中文ABC DEF 12GH"
需要注意的是 s.upper().isupper() 不一定为 True。
字符串格式输出 str.center(width[, fillchar])将字符串按照给定的宽度居中显示,可以给定特定的字符填充多余的长度,如果指定的长度小于字符串长度,则返回原字符串。
"12345".center(10, "*") # "**12345***" "12345".center(10) # " 12345 "str.ljust(width[, fillchar]); str.rjust(width[, fillchar])
返回指定长度的字符串,字符串内容居左(右)如果长度小于字符串长度,则返回原始字符串,默认填充为 ASCII 空格,可指定填充的字符串。
"dobi".ljust(10) # "dobi " "dobi".ljust(10, "~") # "dobi~~~~~~" "dobi".ljust(3, "~") # "dobi" "dobi".ljust(3) # "dobi"str.zfill(width)
用 "0" 填充字符串,并返回指定宽度的字符串。
"42".zfill(5) # "00042" "-42".zfill(5) # "-0042" "dd".zfill(5) # "000dd" "--".zfill(5) # "-000-" " ".zfill(5) # "0000 " "".zfill(5) # "00000" "ffffdffffddd".zfill(5) # "ffffdffffddd"str.expandtabs(tabsize=8)
tab = "1 23 456 7890 1112131415 161718192021" tab.expandtabs() # "1 23 456 7890 1112131415 161718192021" # "123456781234567812345678123456781234567812345678" 注意空格的计数与上面输出位置的关系 tab.expandtabs(4) # "1 23 456 7890 1112131415 161718192021" # "12341234123412341234123412341234"str.format(^args, ^^kwargs)
格式化字符串的语法比较繁多,官方文档已经有比较详细的 examples,这里就不写例子了,想了解的童鞋可以直接戳这里 Format examples.
str.format_map(mapping)类似 str.format(*args, **kwargs) ,不同的是 mapping 是一个字典对象。
People = {"name":"john", "age":56} "My name is {name},i am {age} old".format_map(People) # "My name is john,i am 56 old"字符串搜索定位与替换 str.count(sub[, start[, end]])
text = "outer protective covering" text.count("e") # 4 text.count("e", 5, 11) # 1 text.count("e", 5, 10) # 0str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])
text = "outer protective covering" text.find("er") # 3 text.find("to") # -1 text.find("er", 3) Out[121]: 3 text.find("er", 4) Out[122]: 20 text.find("er", 4, 21) Out[123]: -1 text.find("er", 4, 22) Out[124]: 20 text.rfind("er") Out[125]: 20 text.rfind("er", 20) Out[126]: 20 text.rfind("er", 20, 21) Out[129]: -1str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])
与 find() rfind() 类似,不同的是如果找不到,就会引发 ValueError。
str.replace(old, new[, count])"dog wow wow jiao".replace("wow", "wang") # "dog wang wang jiao" "dog wow wow jiao".replace("wow", "wang", 1) # "dog wang wow jiao" "dog wow wow jiao".replace("wow", "wang", 0) # "dog wow wow jiao" "dog wow wow jiao".replace("wow", "wang", 2) # "dog wang wang jiao" "dog wow wow jiao".replace("wow", "wang", 3) # "dog wang wang jiao"str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])
" dobi".lstrip() # "dobi" "db.kun.ac.cn".lstrip("dbk") # ".kun.ac.cn" " dobi ".rstrip() # " dobi" "db.kun.ac.cn".rstrip("acn") # "db.kun.ac." " dobi ".strip() # "dobi" "db.kun.ac.cn".strip("db.c") # "kun.ac.cn" "db.kun.ac.cn".strip("cbd.un") # "kun.a"static str.maketrans(x[, y[, z]]); str.translate(table)
maktrans 是一个静态方法,用于生成一个对照表,以供 translate 使用。
如果 maktrans 仅一个参数,则该参数必须是一个字典,字典的 key 要么是一个 Unicode 编码(一个整数),要么是一个长度为 1 的字符串,字典的 value 则可以是任意字符串、None或者 Unicode 编码。
a = "dobi" ord("o") # 111 ord("a") # 97 hex(ord("狗")) # "0x72d7" b = {"d":"dobi", 111:" is ", "b":97, "i":"u72d7u72d7"} table = str.maketrans(b) a.translate(table) # "dobi is a狗狗"
如果 maktrans 有两个参数,则两个参数形成映射,且两个字符串必须是长度相等;如果有第三个参数,则第三个参数也必须是字符串,该字符串将自动映射到 None:
a = "dobi is a dog" table = str.maketrans("dobi", "alph") a.translate(table) # "alph hs a alg" table = str.maketrans("dobi", "alph", "o") a.translate(table) # "aph hs a ag"字符串的联合与分割 str.join(iterable)
"-".join(["2012", "3", "12"]) # "2012-3-12" "-".join([2012, 3, 12]) # TypeError: sequence item 0: expected str instance, int found "-".join(["2012", "3", b"12"]) #bytes 为非字符串 # TypeError: sequence item 2: expected str instance, bytes found "-".join(["2012"]) # "2012" "-".join([]) # "" "-".join([None]) # TypeError: sequence item 0: expected str instance, NoneType found "-".join([""]) # "" ",".join({"dobi":"dog", "polly":"bird"}) # "dobi,polly" ",".join({"dobi":"dog", "polly":"bird"}.values()) # "dog,bird"str.partition(sep); str.rpartition(sep)
"dog wow wow jiao".partition("wow") # ("dog ", "wow", " wow jiao") "dog wow wow jiao".partition("dog") # ("", "dog", " wow wow jiao") "dog wow wow jiao".partition("jiao") # ("dog wow wow ", "jiao", "") "dog wow wow jiao".partition("ww") # ("dog wow wow jiao", "", "") "dog wow wow jiao".rpartition("wow") Out[131]: ("dog wow ", "wow", " jiao") "dog wow wow jiao".rpartition("dog") Out[132]: ("", "dog", " wow wow jiao") "dog wow wow jiao".rpartition("jiao") Out[133]: ("dog wow wow ", "jiao", "") "dog wow wow jiao".rpartition("ww") Out[135]: ("", "", "dog wow wow jiao")str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)
"1,2,3".split(","), "1, 2, 3".rsplit() # (["1", "2", "3"], ["1,", "2,", "3"]) "1,2,3".split(",", maxsplit=1), "1,2,3".rsplit(",", maxsplit=1) # (["1", "2,3"], ["1,2", "3"]) "1 2 3".split(), "1 2 3".rsplit() # (["1", "2", "3"], ["1", "2", "3"]) "1 2 3".split(maxsplit=1), "1 2 3".rsplit(maxsplit=1) # (["1", "2 3"], ["1 2", "3"]) " 1 2 3 ".split() # ["1", "2", "3"] "1,2,,3,".split(","), "1,2,,3,".rsplit(",") # (["1", "2", "", "3", ""], ["1", "2", "", "3", ""]) "".split() # [] "".split("a") # [""] "bcd".split("a") # ["bcd"] "bcd".split(None) # ["bcd"]str.splitlines([keepends])
字符串以行界符为分隔符拆分为列表;当 keepends 为True,拆分后保留行界符,能被识别的行界符见官方文档。
"ab c de fg kl ".splitlines() # ["ab c", "", "de fg", "kl"] "ab c de fg kl ".splitlines(keepends=True) # ["ab c ", " ", "de fg ", "kl "] "".splitlines(), "".split(" ") #注意两者的区别 # ([], [""]) "One line ".splitlines() # (["One line"], ["Two lines", ""])字符串条件判断 str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])
text = "outer protective covering" text.endswith("ing") # True text.endswith(("gin", "ing")) # True text.endswith("ter", 2, 5) # True text.endswith("ter", 2, 4) # Falsestr.isalnum()
只要 c.isalpha(), c.isdecimal(), c.isdigit(), c.isnumeric() 中任意一个为真,则 c.isalnum() 为真。
"dobi".isalnum() # True "dobi123".isalnum() # True "123".isalnum() # True "徐".isalnum() # True "dobi_123".isalnum() # False "dobi 123".isalnum() # False "%".isalnum() # Falsestr.isalpha()
Unicode 字符数据库中作为 “Letter”(这些字符一般具有 “Lm”, “Lt”, “Lu”, “Ll”, or “Lo” 等标识,不同于 Alphabetic) 的,均为真。
"dobi".isalpha() # True "do bi".isalpha() # False "dobi123".isalpha() # False "徐".isalpha() # Truestr.isdecimal(); str.isdigit(); str.isnumeric()
三个方法的区别在于对 Unicode 通用标识的真值判断范围不同:
isdecimal: Nd,
isdigit: No, Nd,
isnumeric: No, Nd, Nl
digit 与 decimal 的区别在于有些数值字符串,是 digit 却非 decimal ,具体戳 这里
num = "u2155" print(num) # ⅕ num.isdecimal(), num.isdigit(), num.isnumeric() # (False, False, True) num = "u00B2" print(num) # ² num.isdecimal(), num.isdigit(), num.isnumeric() # (False, True, True) num = "1" #unicode num.isdecimal(), num.isdigit(), num.isnumeric() # (Ture, True, True) num = ""Ⅶ"" num.isdecimal(), num.isdigit(), num.isnumeric() # (False, False, True) num = "十" num.isdecimal(), num.isdigit(), num.isnumeric() # (False, False, True) num = b"1" # byte num.isdigit() # True num.isdecimal() # AttributeError "bytes" object has no attribute "isdecimal" num.isnumeric() # AttributeError "bytes" object has no attribute "isnumeric"str.isidentifier()
"def".isidentifier() # True "with".isidentifier() # True "false".isidentifier() # True "dobi_123".isidentifier() # True "dobi 123".isidentifier() # False "123".isidentifier() # Falsestr.islower()
"徐".islower() # False "ß".islower() #德语大写字母 # False "a徐".islower() # True "ss".islower() # True "23".islower() # False "Ab".islower() # Falsestr.isprintable()
判断字符串的所有字符都是可打印字符或字符串为空。Unicode 字符集中 “Other” “Separator” 类别的字符为不可打印的字符(但不包括 ASCII 的空格(0x20))。
"dobi123".isprintable() # True "dobi123 ".isprintable() Out[24]: False "dobi 123".isprintable() # True "dobi.123".isprintable() # True "".isprintable() # Truestr.isspace()
In [29]: " ".isspace() Out[29]: True In [30]: "".isspace() Out[30]: False In [31]: " ".isspace() Out[31]: Truestr.istitle()
"How Python Works".istitle() # True "How Python WORKS".istitle() # False "how python works".istitle() # False "How Python Works".istitle() # True " ".istitle() # False "".istitle() # False "A".istitle() # True "a".istitle() # False "甩甩Abc Def 123".istitle() # Truestr.isupper()
"徐".isupper() # False "DOBI".isupper() Out[41]: True "Dobi".isupper() # False "DOBI123".isupper() # True "DOBI 123".isupper() # True "DOBI 123".isupper() # True "DOBI_123".isupper() # True "_123".isupper() # False字符串编码 str.encode(encoding="utf-8", errors="strict")
fname = "徐" fname.encode("ascii") # UnicodeEncodeError: "ascii" codec can"t encode character "u5f90"... fname.encode("ascii", "replace") # b"?" fname.encode("ascii", "ignore") # b"" fname.encode("ascii", "xmlcharrefreplace") # b"徐" fname.encode("ascii", "backslashreplace") # b"u5f90"参考资料
