R Basic querry

While learning R (from scratch) I fall on kaggle. It propose some competition between user to find the best model of prediction.

It's my first Big Data Experience, then I am a little bit lost. Then First  I have to admit that I follow "How To"

I will post here all my generic basic R data manipulation.



Create a list of value

unusual_title<-c('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don', 
                 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer')

Update the value Title by "unusual Title" if the title is in the liste previously created

titanic$title[titanic$title %in% unusual_title]<-'Unusual Title'

Sapply is a function that apply on each item of a dataSet.
on each name of the DataSet we apply a Split on the first ,or , and keep the first part
this part go in surnamecolum,

titanic$surname<-sapply(titanic$Name, function(x) strsplit(x,split='[,.]')[[1]][1])

Give the value "single" to fsizeD if the column famsize == 1

titanic$fsizeD[titanic$famsize == 1] <- 'single'

titanic$fsizeD[titanic$famsize < 5 & titanic$famsize> 1] <- 'small'

Commentaires

Posts les plus consultés de ce blog

Make CRM Faster

Update Email with web API

Quick Create, open created record