R Simplified: Playing with Dataframes (1)

R is well established language in the community and it has a huge fan following.  Big Data being the buzzword of today, more and more people are learning this to add predictive flavor to their data analysis. I believe, learning becomes easier if one is able to relate new things with something which he/she already knows. So assuming that my hypothesis is correct, in this blog, I have discussed R statements/commands corresponding to SQL- Data Manipulation Language (DML).

We may find these if we search internet. However in this blog we will find relevant R commands consolidated at one place along with corresponding SQL statements so that we can relate. As R has a long list of packages so in case of a particular package, syntax may be different.

These commands are from base R package. I am using Rx64 3.0.2 in windows environment. To test the R command discussed is this blog ensure that you have created an R Data frame and SQL Table “Employee” with following records:

ID Name Skill dept salary

1

John Writer

1

100

2

Smith Programmer

1

120

3

Rakesh Designer

1

130

4

Baker Manager

2

150

5

Samantha Programmer

2

120

Following R commands are can be used to create “Employee” Data frame :

> ID = c(1,2,3,4,5)

> Name = c(“John”,”Smith”,”Rakesh”,”Baker”,”Samantha”)

> Skill = c(“Writer”,”Programmer”,”Designer”,”Manager”,”Programmer”)

> dept = c(1,1,1,2,2)

> salary = c(100,120,130,150,120)

>employee = data.frame(ID,Name,Skill,dept,salary)

SELECT STATEMENTS AND CORRESPONDING R COMMANDS:

  • Select all record from SQL table/ R dataframe “employee”
    • SQL> Select * from employee
    • R> employee
  • Select specific records from table/dataframe employee
    • SQL> Select * from employee where id >2
    • R> employee [which(employee$ID>2),] or subset(employee,ID>2)
  • Select specific columns from table/dataframe employee
    • SQL> Select ID, Name from employee
    • R> subset(employee,,select=c(ID,Name))
  • Select specific column and specific rows from table/dataframe employee
    • SQL>Select ID, Name from employee where ID>2
    • R> subset(employee,ID>2,select=c(ID,Name))
  • Select all record from table/dataframe employee in a specific order
    • SQL> Select * from employee order by Name
    • R> employee[order(employee$Name),]
  • Select specific record from table.dataframe employee in specific order
    • SQL> Select ID, Name from employee order by Name
    • R>subset(employee[order(employee$Name),],ID>2,select=c(ID,Name))
  • Aggregation (Count)
    • SQL> Select dept, count(salary) from employee group by dept
    • R> aggregate(employee$salary~employee$dept, data=employee, FUN=”length”)
    • R> aggregate(employee$salary, by=list(employee$dept), FUN=”length”)
  • Aggregation (Average)
    • SQL> Select dept, avg(salary) from employee group by dept
    • R> aggregate(employee$salary~employee$dept, data=employee, FUN=”mean”)
    • R> aggregate(employee$salary, by=list(employee$dept), FUN=”mean”)
  • Aggregation (Sum)
    • SQL> Select dept, sum(salary) from employee group by dept
    • R> aggregate(employee$salary~employee$dept, data=employee, FUN=”sum”)
    • R>aggregate(employee$salary, by=list(employee$dept), FUN=”sum”)
  • Multiple Aggregation (sum, mean, count)
    • SQL> Select dept, sum(salary) , avg(salary), count(salary) from employee group by dept
    • R> aggregate(employee$salary~employee$dept, data=employee, FUN=function(x)c(sum=sum(x), mean=mean(x), count=length(x)))
  • Aggregation of subset of dataset and multiple groupping
    • SQL> Select dept, sum(salary) from employee where ID > 2 group by dept
    • R> aggregate(subset(employee$salary,ID>2), by=list(subset(employee$dept, ID>2)), FUN=”sum”)
    • SQL> Select dept, Skill, sum(salary) from employee where ID>1 group by dept, Skill
    • R> aggregate(subset(employee$salary,ID>1), by=list(subset(employee$dept, ID>1), subset(employee$Skill, ID>1)), FUN=”sum”)

Share your feedback on this.

I plan to write next blog on R commands for more DML statements.

Advertisements

About VASAPTEX

VASAPTEX is a technology service provider committed to simplify the way its customers do business. Our focus area is "Information Management and Analytics" and we are recognized by our customers for unparalleled capability of providing solutions that enable our clients to take full advantage of their data assets. We are passionate about our work and committed to deliver on time.
This entry was posted in R and tagged , , . Bookmark the permalink.

One Response to R Simplified: Playing with Dataframes (1)

  1. Reblogged this on Coran Corbett and commented:
    Fantastic blog post on using R for people who are experience with SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s