R Basic querry
While learning R (from scratch) I fall on kaggle. It propose some competition between user to find the best model of prediction.
It's my first Big Data Experience, then I am a little bit lost. Then First I have to admit that I follow "How To"
I will post here all my generic basic R data manipulation.
Create a list of value
Update the value Title by "unusual Title" if the title is in the liste previously created
Sapply is a function that apply on each item of a dataSet.
on each name of the DataSet we apply a Split on the first ,or , and keep the first part
this part go in surnamecolum,
Give the value "single" to fsizeD if the column famsize == 1
titanic$fsizeD[titanic$famsize == 1] <- 'single'
It's my first Big Data Experience, then I am a little bit lost. Then First I have to admit that I follow "How To"
I will post here all my generic basic R data manipulation.
Create a list of value
unusual_title<-c('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don',
'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer')
Update the value Title by "unusual Title" if the title is in the liste previously created
titanic$title[titanic$title %in% unusual_title]<-'Unusual Title'
Sapply is a function that apply on each item of a dataSet.
on each name of the DataSet we apply a Split on the first ,or , and keep the first part
this part go in surnamecolum,
titanic$surname<-sapply(titanic$Name, function(x) strsplit(x,split='[,.]')[[1]][1])
Give the value "single" to fsizeD if the column famsize == 1
titanic$fsizeD[titanic$famsize == 1] <- 'single'
titanic$fsizeD[titanic$famsize < 5 & titanic$famsize> 1] <- 'small'
Commentaires
Enregistrer un commentaire