Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry for the laymen question, but, is this language independent or only for English?


Text is just one of the possible types of features supported in Ludwig. For those features, no, you can train models on any language, with a couple little caveats: you can train both character based models and word level ones. For character ones you don't really need anything, and you can train also of languages without explicit word separation like chinese. For word based ones, you need a function to separate your text into words. By default a regular expression is used, which generically kinda works for most languages that separate words, but the tokenizer from the spaCy library can also be used. SpaCy provides models for a few languages, and at the moment we are wrapping just the english ones, but it would be extremely easy to add also the other ones supported in spaCy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: