No description
Find a file
2024-09-18 21:39:21 -03:00
.github/workflows github CI 2023-04-06 20:10:01 -03:00
docs docs 2020-02-15 11:07:04 -03:00
gen unicode version 2020-02-15 11:46:51 -03:00
src unicode 13 word-break tests 2020-03-12 15:41:09 -03:00
tests tests unicode 16 2024-09-18 21:39:21 -03:00
.gitignore tests unicode 16 2024-09-18 21:39:21 -03:00
CHANGELOG.md word break 2020-02-15 10:53:08 -03:00
LICENSE word break 2020-02-15 10:53:08 -03:00
README.md Nim 0.19 and 0.20 2020-02-15 21:15:28 -03:00
segmentation.nimble Nim 0.19 and 0.20 2020-02-15 21:15:28 -03:00

Segmentation

licence

An implementation of Unicode Text Segmentation (tr29). The splitting is made through a fast DFA.

See nim-graphemes for grapheme cluster segmentation

Install

nimble install segmentation

Compatibility

Nim 0.19, 0.20, +1.0.4

Usage

import sequtils
import segmentation

assert toSeq("The (“brown”) fox cant jump 32.3 feet, right?".words) ==
  @["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",
    "cant", " ", "jump", " ", "32.3", " ", "feet", ",", " ",
    "right", "?"]

Docs

Read the docs

Tests

nimble test

LICENSE

MIT