Quantcast
Channel: Mawazo
Browsing all 168 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Making Hive Squawk like a Real Database

Hive is great for large scale data warehousing applications. In one of my recent projects I was handed over the interesting and challenging task of  making Hive behave like an OLTP system i.e., support...

View Article



Image may be NSFW.
Clik here to view.

Big Web Checkout Abandonment

The topic for this post, is of interest to any online retailer. Shopping cart abandonment is dreaded by online stores. It’s more common in online stores than brick and mortar stores. In this post I...

View Article

Image may be NSFW.
Clik here to view.

From Item Correlation to Rating Prediction

The ultimate goal of any recommendation engine is predict rating for items that an user has not engaged with so far. The set of items to recommended is then based on the predicted ratings.  There are...

View Article

Image may be NSFW.
Clik here to view.

Bring some Spark into your life

Hadoop is a great cluster computing framework. But sometimes  it may not be a great fit for your particular problem in hand. Or you may be having Hadoop fatigue and want to explore other options. There...

View Article

Image may be NSFW.
Clik here to view.

Relative Density and Outliers

Recently I did some work on my open source fraud analytic project beymani. I implemented one of the proximity based algorithms  using Relative Density of a data point as described in my earlier post....

View Article


Image may be NSFW.
Clik here to view.

Semantic Matching with Hadoop

Recently, I had a request to support semantic matching in sifarish, my open source matching and recommendation engine. By  semantic matching, I mean any algorithm that does not rely on explicit keyword...

View Article

Image may be NSFW.
Clik here to view.

Get Social with Pearson Correlation

In one of my earlier posts, I discussed about using Pearson correlation for making social recommendation. In this post we will delve deeper into it including the Hadoop map reduce implementation. There...

View Article

Image may be NSFW.
Clik here to view.

Explore Customer Churn with Cramer Index

Classification problems involve predicting a response variable based on  a set of feature variables for some entity. But there is another problem whose solution is a prerequisite for solving...

View Article


Image may be NSFW.
Clik here to view.

Stop the Customer Separation Pain with Bayesian Classifier

In my last post, we did some exploratory analytic for customer churn. We identified the parameters that have most influence on whether a customer account gets closed or not. We performed correlation...

View Article


Image may be NSFW.
Clik here to view.

Analytic is your Doctor’s Friend

In this post, I will be venturing into the medical domain and show how big data analytic can play a crucial role in the complex and daunting world of health care. There is a  kind of cancer that...

View Article

Image may be NSFW.
Clik here to view.

Smarter Email Marketing with Markov Model

What does email marketing have to do with Markov model? Let’s explore and find out. Any consumer of product and services has a natural rhythm to his or her  purchase history. Regular customers tend to...

View Article

Image may be NSFW.
Clik here to view.

Customer Segmentation with Fisher Discriminant Analysis

In this post, I will focus on a time honored machine learning technique called Fisher Discriminant Analysis and will use it for  customer segmentation for on an line music store customers.  The store...

View Article

Image may be NSFW.
Clik here to view.

Business Goal Infused Recommendation

The output of a recommendation engine,  whether based on collaborative filtering or some other techniques reflects consumer’s interest in products or services. However a business may have some goals...

View Article


Image may be NSFW.
Clik here to view.

A Learning but Greedy Gambler

In multi-armed bandit (MAB)  problem, a gambler must decide which arm of K slot machines to pull in sequence of N rounds of pulls to maximize the overall return. Many real life optimization and...

View Article

Image may be NSFW.
Clik here to view.

Storing Nested Objects in Cassandra with Composite Columns

One of the popular features of MongoDB is the ability to store arbitrarily nested objects and be able to index on any nested field. In this post I will show how to store nested objects in Cassandra...

View Article


Image may be NSFW.
Clik here to view.

Bandits Know the Best Product Price

In an earlier post, I did a survey of a class of reinforcement learning  algorithms, known as Multi Arm Bandit(MAB) . Essentially, these algorithms make decisions and learn from rewards received from...

View Article

Image may be NSFW.
Clik here to view.

Identifying Duplicate Records with Fuzzy Matching

I was prompted to write this post  in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of...

View Article


Image may be NSFW.
Clik here to view.

Big Road Map for Big Data

The number of choices for big data solutions sometimes makes it overwhelming and confusing. Purpose of this post is to  layout a road map for the big data solutions. I will be categorizing the products...

View Article

Image may be NSFW.
Clik here to view.

Predicting Customer Loyalty Trajectory

Customer loyalty is the strength of the relationship a customer has with a business as manifested by customer purchasing more and at high frequency. There are various signal or events related to a...

View Article

Image may be NSFW.
Clik here to view.

Real Time Fraud Detection with Sequence Mining

Real time fraud detection  is one of the use cases, where multiple components of the Big Data eco system come into play in a significant way, Hadoop batch processing  for building the predictive model...

View Article
Browsing all 168 articles
Browse latest View live




Latest Images