AI / ML2026completed

Sinhala Character Recognition

A machine learning application that recognizes handwritten Sinhala characters from image input. The system uses a K-Nearest Neighbors (KNN) classifier trained on preprocessed character samples, paired with a graphical interface so users can draw or upload characters and see predictions in real time. The project explores classical ML for script-specific recognition without deep learning.

PythonMachine LearningKNNOpenCVscikit-learn

View on GitHub

2018

Year

Features

Technologies

Problem

·Limited availability of labeled Sinhala character datasets
·Complexity of Sinhala script with diacritical marks
·Choosing effective features and k for similar-looking characters
·Making the tool usable through a clear graphical interface

Solution

·Collected and labeled a custom Sinhala character dataset
·Applied preprocessing tuned for handwritten Sinhala glyphs
·Tuned KNN hyperparameters (k, distance metric) on validation data
·Wrapped inference in a simple GUI for interactive testing

Key Features

✓

K-Nearest Neighbors classifier for Sinhala character recognition

✓

Handwritten character input via drawing canvas or image upload

✓

Image preprocessing and feature extraction before classification

✓

User-friendly graphical interface for live predictions

✓

Configurable k parameter and model evaluation workflow

✓

Support for Sinhala script-specific character classes

Technologies

Pythonscikit-learnOpenCVNumPyTkinter

Learnings

→
Implemented KNN classification for image-based character recognition
→
Built image preprocessing pipelines for handwritten input
→
Learned distance metrics and k-value tuning for classifier performance
→
Designed an accessible GUI for non-technical users to test the model

Highlights

KNN Classifier

Handwritten Recognition

GUI Application

Next Project

PDF RAG Assistant

Production-style RAG pipeline: upload PDFs, query with hybrid retrieval (vector + BM25), cross-encoder reranking, and context-grounded answers via FastAPI and Streamlit.