Text Segmentation API
Count and split text the way people actually read it, using Unicode-correct segmentation. The count endpoint returns the number of grapheme clusters — the real, user-perceived characters, so a family emoji counts as 1 (not 7) and an accented letter as 1 — alongside words, sentences, code points, UTF-16 code units (the naive string length that over-counts) and UTF-8 byte length. This is exactly what character-limit fields, tweet/SMS counters and validation need so the count agrees with what the user sees. The segment endpoint splits text into grapheme, word or sentence segments (word segments are flagged word-like versus punctuation and spaces) and is locale-aware, so Japanese, Chinese and Thai word boundaries come out right. Everything is computed locally with no network calls. A Unicode text segmenter — distinct from the Unicode codepoint database (unicode), the case/text-utilities toolkit (text) and string similarity (similarity). No upstream key, no cache.
api.oanor.com/segmenter-api