Making Hive Squawk like a Real Database
Hive is great for large scale data warehousing applications. In one of my recent projects I was handed over the interesting and challenging task of making Hive behave like an OLTP system i.e., support...
View ArticleBig Web Checkout Abandonment
The topic for this post, is of interest to any online retailer. Shopping cart abandonment is dreaded by online stores. It’s more common in online stores than brick and mortar stores. In this post I...
View ArticleFrom Item Correlation to Rating Prediction
The ultimate goal of any recommendation engine is predict rating for items that an user has not engaged with so far. The set of items to recommended is then based on the predicted ratings. There are...
View ArticleBring some Spark into your life
Hadoop is a great cluster computing framework. But sometimes it may not be a great fit for your particular problem in hand. Or you may be having Hadoop fatigue and want to explore other options. There...
View ArticleRelative Density and Outliers
Recently I did some work on my open source fraud analytic project beymani. I implemented one of the proximity based algorithms using Relative Density of a data point as described in my earlier post....
View ArticleSemantic Matching with Hadoop
Recently, I had a request to support semantic matching in sifarish, my open source matching and recommendation engine. By semantic matching, I mean any algorithm that does not rely on explicit keyword...
View ArticleGet Social with Pearson Correlation
In one of my earlier posts, I discussed about using Pearson correlation for making social recommendation. In this post we will delve deeper into it including the Hadoop map reduce implementation. There...
View ArticleExplore Customer Churn with Cramer Index
Classification problems involve predicting a response variable based on a set of feature variables for some entity. But there is another problem whose solution is a prerequisite for solving...
View ArticleStop the Customer Separation Pain with Bayesian Classifier
In my last post, we did some exploratory analytic for customer churn. We identified the parameters that have most influence on whether a customer account gets closed or not. We performed correlation...
View ArticleAnalytic is your Doctor’s Friend
In this post, I will be venturing into the medical domain and show how big data analytic can play a crucial role in the complex and daunting world of health care. There is a kind of cancer that...
View ArticleSmarter Email Marketing with Markov Model
What does email marketing have to do with Markov model? Let’s explore and find out. Any consumer of product and services has a natural rhythm to his or her purchase history. Regular customers tend to...
View ArticleCustomer Segmentation with Fisher Discriminant Analysis
In this post, I will focus on a time honored machine learning technique called Fisher Discriminant Analysis and will use it for customer segmentation for on an line music store customers. The store...
View ArticleBusiness Goal Infused Recommendation
The output of a recommendation engine, whether based on collaborative filtering or some other techniques reflects consumer’s interest in products or services. However a business may have some goals...
View ArticleA Learning but Greedy Gambler
In multi-armed bandit (MAB) problem, a gambler must decide which arm of K slot machines to pull in sequence of N rounds of pulls to maximize the overall return. Many real life optimization and...
View ArticleStoring Nested Objects in Cassandra with Composite Columns
One of the popular features of MongoDB is the ability to store arbitrarily nested objects and be able to index on any nested field. In this post I will show how to store nested objects in Cassandra...
View ArticleBandits Know the Best Product Price
In an earlier post, I did a survey of a class of reinforcement learning algorithms, known as Multi Arm Bandit(MAB) . Essentially, these algorithms make decisions and learn from rewards received from...
View ArticleIdentifying Duplicate Records with Fuzzy Matching
I was prompted to write this post in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of...
View ArticleBig Road Map for Big Data
The number of choices for big data solutions sometimes makes it overwhelming and confusing. Purpose of this post is to layout a road map for the big data solutions. I will be categorizing the products...
View ArticlePredicting Customer Loyalty Trajectory
Customer loyalty is the strength of the relationship a customer has with a business as manifested by customer purchasing more and at high frequency. There are various signal or events related to a...
View ArticleReal Time Fraud Detection with Sequence Mining
Real time fraud detection is one of the use cases, where multiple components of the Big Data eco system come into play in a significant way, Hadoop batch processing for building the predictive model...
View Article
More Pages to Explore .....