Suddip Paul Arnab
Analysis of Bangla News Headlines Using LSH and Clustering Techniques

Project Detail

Analysis of Bangla News Headlines Using LSH and Clustering Techniques

This project analyzes Bangla news headlines using scalable text processing, similarity search, and clustering techniques. It applies preprocessing, n-gram feature extraction, and HashingTF to represent text efficiently, while MinHashLSH enables fast and accurate detection of similar headlines. Clustering with KMeans reveals clear thematic patterns across categories, supported by strong evaluation metrics and visualizations, demonstrating an effective approach for large-scale Bangla text analysis.

Apache Spark Big Data Python ANN Search Clustering

August 2025 - September 2025

Project Highlights

  • Scalable Bangla Text Processing Pipeline
  • Efficient Similarity Detection using MinHashLSH
  • Meaningful Feature Engineering with N-grams and HashingTF
  • High-Quality Clustering with Strong Analytical Insights

Related Projects

IoT-Based Weather Monitoring Station Using ESP8266

IoT-Based Weather Monitoring Station Using ESP8266

Real-time IoT weather monitoring system using ESP8266 and cloud dashboard integration for temperature, humidity, …

Open
Campus Event Management System

Campus Event Management System

Web-based campus event management platform built in Oracle APEX with role-based access, reporting, visualization, …

Open

Railway Reservation System

Console-based railway ticket reservation application using core Java OOP principles with user and admin …

Open