A collection of basic text processing modules for Indian languages
indicNLP is a collection of common tools used in text based natural language processing for Indian Languages. Many Indian Languages are similar in nature with some differences. Most of them share common or similar solutions to NLP and IRE tasks. And hence, a single framework for that.
It includes
python-crfsuite
indicNLP, IRE, NLP, Indian Languages, Tokenizer, stopwords, POS tagger, Stemmer, NER, Document Classification, Categorization, Spelling Variation Identification, Writing Variation Identification, text processing.
Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi, Tamil, Telugu, Tibetan.
Information Retrieval and Extraction Course, Major Project, IIIT-H.
GitHub repository: https://github.com/nisargjhaveri/indicNLP
Project homepage: http://nisargjhaveri.github.io/indicNLP
Project report: http://nisargjhaveri.github.io/indicNLP/report.pdf
YouTube (Presentation and Demo): https://youtu.be/Pwh1NYAF5Gw
SlideShare (Presentation): http://www.slideshare.net/NisargJhaveri/indicnlp-a-text-processing-framework-for-indian-languages
DropBox: DropBox shared folder