IndicNLP

A collection of basic text processing modules for Indian languages

View the Project on GitHub nisargjhaveri/indicNLP

indicNLP is a collection of common tools used in text based natural language processing for Indian Languages. Many Indian Languages are similar in nature with some differences. Most of them share common or similar solutions to NLP and IRE tasks. And hence, a single framework for that.

It includes

Code quality and Quality assurance

Build Status Coverage Status

Dependencies

Tags

indicNLP, IRE, NLP, Indian Languages, Tokenizer, stopwords, POS tagger, Stemmer, NER, Document Classification, Categorization, Spelling Variation Identification, Writing Variation Identification, text processing.

Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi, Tamil, Telugu, Tibetan.

Information Retrieval and Extraction Course, Major Project, IIIT-H.

Links

GitHub repository: https://github.com/nisargjhaveri/indicNLP
Project homepage: http://nisargjhaveri.github.io/indicNLP

Project report: http://nisargjhaveri.github.io/indicNLP/report.pdf
YouTube (Presentation and Demo): https://youtu.be/Pwh1NYAF5Gw
SlideShare (Presentation): http://www.slideshare.net/NisargJhaveri/indicnlp-a-text-processing-framework-for-indian-languages
DropBox: DropBox shared folder