quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]
Been working on this a while! Should be useful for anyone trying to speed up their tokenization workflows.
quicktok is a fast/exact BPE tokenizer written in C++. Token ids are byte-identical to tiktoken and encoding runs 2–3.6× faster than bpe-openai (the fastest alternative I know of) and 4–11× faster than tiktoken itself. It ships cl100k, o200k, GPT-OSS, Llama-3, and Qwen2.5/3.
Approach. Same algorithm as bpe-openai (exact backtracking BPE) but I apply lots of data structure engineering to cut memory accesses:
- A 2-byte trie is used for the longest-match walk
- Dense exactly-keyed caches are used for merge-validity checks
- A hand-compiled pretokenizer is used instead of a general regex engine
Benchmarks (Apple M1, single thread, MB/s, cl100k_base and every output verified token-for-token before timing):
| encoder | The Pile | Code | Common Crawl |
|---|---|---|---|
| quicktok (native) | 121.7 | 139.2 | 71.3 |
| quicktok (Python) | 77.9 | 83.6 | 49.7 |
| bpe-openai | 36.6 | 38.7 | 28.9 |
| rs-bpe | 30.9 | 34.7 | 23.5 |
| tiktoken-rs | 15.4 | 13.8 | 13.3 |
| tiktoken (Python) | 13.6 | 12.8 | 12.3 |
| TokenDagger | 11.1 | 11.9 | 10.7 |
o200k_base is similar in ratios. Each encoder is called through its own raw API and benchmarks can be reproduced with make bench-compare in the repo.
pip install quicktok-v1
[link] [comments]
Want to read more?
Check out the full article on the original site