Discussion about this post

User's avatar
madf0x's avatar

Super rad. Intuitively I would not have expected a meaningful difference between token efficiency and entropy.

I wonder if other tokenizers would be more or less accurate for calculating token efficiency. Youd probably have to adjust cutoff to 'calibrate' different tokenizers but itd be interesting if accuracy could be pushed even higher.

Henry's avatar

hey great writeup!

just because its also written in the post:

"A quick note on passwords. Token Efficiency does not do well with classifying bad passwords like “password123” or “chibearsfan123”. These passwords are basically natural language which means a high token efficiency value. Pass phrases also don’t do well because those are usually just straight up words."

What do you think is the best way to find these than? or its something to drop in a secret scanner because "who uses such a weak password should be pwned anyway"?

5 more comments...

No posts

Ready for more?