本文主要是介绍正则将段落分割成句子,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
这里分割段落不区分中英文标点,你可以根据需求改
分割后标点跟随句子后面
def split_sentences_keep_delimiter(text):pattern = r'[^。!!??::;;,,]+[。!!??::;;,,]'sentences = re.findall(pattern, text)last_sentence = re.sub(r'[。!!??::;;;,,]', '', text)if last_sentence and not re.search(pattern, last_sentence):sentences.append(last_sentence.strip())return sentences[:len(sentences)-1]
分割后去掉标点只保留文本
import redef split_text_with_punctuation(text):split_sentences = re.split(r'[。.!!??::;;,,]', text)return split_sentencestext = "你好,世界!这是个测试。看看是否有效?当然,它会的。"
print(split_text_with_punctuation(text))
分割后标点和文本分开
import redef split_text_with_punctuation(text):split_sentences = re.split(r'([。.!!??::;;,,])', text)return split_sentencestext = "你好,世界!这是个测试。看看是否有效?当然,它会的。"
print(split_text_with_punctuation(text))
这篇关于正则将段落分割成句子的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!