Tags

, ,

Working with Data Frames in R

Joining two data frames in R

Here in this tutorial we will learn different ways of joining two (or more dataframe) in R.

We will first create two data frame which will consist the following:

  Data frame 1 (DataFrame1) will have fields:
  1. Student Id
  2. Subject (in which they scored highest marks out of 100 max marks each)
  3. CGPA (upto last semester)
  4. Race (of student)
  
  Data frame 2 (DataFrame2) will have fields:
  1. Student Id
  2. State (native)
  3. Age
  4. Gender

Dataframe 1

DataFrame1 <-  data.frame(StudentId = c(1:20), Subject = c(rep("Physics", 4), rep("Chemistry", 4), rep("Maths", 4), rep("Social Science", 4), rep("Arts", 4)), CGPA = c(random::randomNumbers(20, 70, 100, 1)), Race = c(rep("White", 4), rep("Black", 4), rep("Asian", 4), rep("Latin", 4), rep("Indian", 4)))

DataFrame1
##    StudentId        Subject CGPA   Race
## 1          1        Physics   84  White
## 2          2        Physics   98  White
## 3          3        Physics   95  White
## 4          4        Physics   98  White
## 5          5      Chemistry   87  Black
## 6          6      Chemistry   81  Black
## 7          7      Chemistry   92  Black
## 8          8      Chemistry   75  Black
## 9          9          Maths   92  Asian
## 10        10          Maths   85  Asian
## 11        11          Maths   82  Asian
## 12        12          Maths   95  Asian
## 13        13 Social Science   87  Latin
## 14        14 Social Science   90  Latin
## 15        15 Social Science   72  Latin
## 16        16 Social Science   79  Latin
## 17        17           Arts   92 Indian
## 18        18           Arts   76 Indian
## 19        19           Arts   78 Indian
## 20        20           Arts   78 Indian

Dataframe 2

DataFrame2 <-  data.frame(StudentId = c(1:10), State = c(rep("Arizona", 2), rep("New York", 2), rep("Florida", 2), rep("California", 2), rep("Texas", 2)), Age = c(random::randomNumbers(10, 16, 20, 1)), Gender = c(rep("Male", 3), rep("Female", 3), rep("Male", 2), rep("Female", 2)))

DataFrame2
##    StudentId      State Age Gender
## 1          1    Arizona  20   Male
## 2          2    Arizona  17   Male
## 3          3   New York  17   Male
## 4          4   New York  20 Female
## 5          5    Florida  16 Female
## 6          6    Florida  16 Female
## 7          7 California  18   Male
## 8          8 California  19   Male
## 9          9      Texas  20 Female
## 10        10      Texas  16 Female
So we have two datasets to work with. Dataframe 1 has 20 records while Dataframe 2 has 10 records.

Example 1 : Merge Using Inner Join

First example in which we will merge the just created dataframes using 'Inner Join' method.

This method joins two tables as long as all the columns has same numbers of rows.

Here we have 20 rows in DF1 and 10 rows in DF2, so inner join will return only 10 rows.
i.e. it retains the rows equals to the lowest number of rows in any of the tables (DFs).

Inner Join ....
merge(DataFrame1, DataFrame2)
##    StudentId   Subject CGPA  Race      State Age Gender
## 1          1   Physics   84 White    Arizona  20   Male
## 2          2   Physics   98 White    Arizona  17   Male
## 3          3   Physics   95 White   New York  17   Male
## 4          4   Physics   98 White   New York  20 Female
## 5          5 Chemistry   87 Black    Florida  16 Female
## 6          6 Chemistry   81 Black    Florida  16 Female
## 7          7 Chemistry   92 Black California  18   Male
## 8          8 Chemistry   75 Black California  19   Male
## 9          9     Maths   92 Asian      Texas  20 Female
## 10        10     Maths   85 Asian      Texas  16 Female

merge(DataFrame2, DataFrame1)

Example 2 : Merge Using Outer Join

In this example in which we will merge the dataframes using 'Outer Join' method.

Unlike inner join, this method joins all the rows and if there is a match in no. of rows, the rows are created and filled with NAs.

Outer Join ....
merge(x = DataFrame1, y = DataFrame2, by = "StudentId", all = TRUE)
##    StudentId        Subject CGPA   Race      State Age Gender
## 1          1        Physics   84  White    Arizona  20   Male
## 2          2        Physics   98  White    Arizona  17   Male
## 3          3        Physics   95  White   New York  17   Male
## 4          4        Physics   98  White   New York  20 Female
## 5          5      Chemistry   87  Black    Florida  16 Female
## 6          6      Chemistry   81  Black    Florida  16 Female
## 7          7      Chemistry   92  Black California  18   Male
## 8          8      Chemistry   75  Black California  19   Male
## 9          9          Maths   92  Asian      Texas  20 Female
## 10        10          Maths   85  Asian      Texas  16 Female
## 11        11          Maths   82  Asian       <NA>  NA   <NA>
## 12        12          Maths   95  Asian       <NA>  NA   <NA>
## 13        13 Social Science   87  Latin       <NA>  NA   <NA>
## 14        14 Social Science   90  Latin       <NA>  NA   <NA>
## 15        15 Social Science   72  Latin       <NA>  NA   <NA>
## 16        16 Social Science   79  Latin       <NA>  NA   <NA>
## 17        17           Arts   92 Indian       <NA>  NA   <NA>
## 18        18           Arts   76 Indian       <NA>  NA   <NA>
## 19        19           Arts   78 Indian       <NA>  NA   <NA>
## 20        20           Arts   78 Indian       <NA>  NA   <NA>

Example 3 : Merge Using Left Outer Join

In this example in which we will merge the dataframes using 'Left Outer Join' method.

This method is very much like outer join method above.

A left outer join retains all of the rows of the left table, regardless of whether there is a row that matches on the right table.
All empty rows, if any, on the right table will be filled with NA's.

Simply....
merge(x = DataFrame1, y = DataFrame2, by = "StudentId", all.x = TRUE)
##    StudentId        Subject CGPA   Race      State Age Gender
## 1          1        Physics   84  White    Arizona  20   Male
## 2          2        Physics   98  White    Arizona  17   Male
## 3          3        Physics   95  White   New York  17   Male
## 4          4        Physics   98  White   New York  20 Female
## 5          5      Chemistry   87  Black    Florida  16 Female
## 6          6      Chemistry   81  Black    Florida  16 Female
## 7          7      Chemistry   92  Black California  18   Male
## 8          8      Chemistry   75  Black California  19   Male
## 9          9          Maths   92  Asian      Texas  20 Female
## 10        10          Maths   85  Asian      Texas  16 Female
## 11        11          Maths   82  Asian       <NA>  NA   <NA>
## 12        12          Maths   95  Asian       <NA>  NA   <NA>
## 13        13 Social Science   87  Latin       <NA>  NA   <NA>
## 14        14 Social Science   90  Latin       <NA>  NA   <NA>
## 15        15 Social Science   72  Latin       <NA>  NA   <NA>
## 16        16 Social Science   79  Latin       <NA>  NA   <NA>
## 17        17           Arts   92 Indian       <NA>  NA   <NA>
## 18        18           Arts   76 Indian       <NA>  NA   <NA>
## 19        19           Arts   78 Indian       <NA>  NA   <NA>
## 20        20           Arts   78 Indian       <NA>  NA   <NA>
Let's flip the order of two tables and see what happens when we run 'Left Outer Join'.. 
merge(y = DataFrame1, x = DataFrame2, by = "StudentId", all.x = TRUE)
##    StudentId      State Age Gender   Subject CGPA  Race
## 1          1    Arizona  20   Male   Physics   84 White
## 2          2    Arizona  17   Male   Physics   98 White
## 3          3   New York  17   Male   Physics   95 White
## 4          4   New York  20 Female   Physics   98 White
## 5          5    Florida  16 Female Chemistry   87 Black
## 6          6    Florida  16 Female Chemistry   81 Black
## 7          7 California  18   Male Chemistry   92 Black
## 8          8 California  19   Male Chemistry   75 Black
## 9          9      Texas  20 Female     Maths   92 Asian
## 10        10      Texas  16 Female     Maths   85 Asian
Now, because left table has only 10 rows, the join result comes with 10 rows.

Example 4 : Merge Using Right Outer Join

A 'Right Outer Join' is pretty much similar to a left outer join, except that the rows that are retained in the merged result are from the right table.

This is how it comes out to be ....
merge(x = DataFrame1, y = DataFrame2, by = "StudentId", all.y = TRUE)
##    StudentId   Subject CGPA  Race      State Age Gender
## 1          1   Physics   84 White    Arizona  20   Male
## 2          2   Physics   98 White    Arizona  17   Male
## 3          3   Physics   95 White   New York  17   Male
## 4          4   Physics   98 White   New York  20 Female
## 5          5 Chemistry   87 Black    Florida  16 Female
## 6          6 Chemistry   81 Black    Florida  16 Female
## 7          7 Chemistry   92 Black California  18   Male
## 8          8 Chemistry   75 Black California  19   Male
## 9          9     Maths   92 Asian      Texas  20 Female
## 10        10     Maths   85 Asian      Texas  16 Female
Let's flip the order of two tables and see what happens when we run 'Right Outer Join'.. 
merge(y = DataFrame1, x = DataFrame2, by = "StudentId", all.y = TRUE)
##    StudentId      State Age Gender        Subject CGPA   Race
## 1          1    Arizona  20   Male        Physics   84  White
## 2          2    Arizona  17   Male        Physics   98  White
## 3          3   New York  17   Male        Physics   95  White
## 4          4   New York  20 Female        Physics   98  White
## 5          5    Florida  16 Female      Chemistry   87  Black
## 6          6    Florida  16 Female      Chemistry   81  Black
## 7          7 California  18   Male      Chemistry   92  Black
## 8          8 California  19   Male      Chemistry   75  Black
## 9          9      Texas  20 Female          Maths   92  Asian
## 10        10      Texas  16 Female          Maths   85  Asian
## 11        11       <NA>  NA   <NA>          Maths   82  Asian
## 12        12       <NA>  NA   <NA>          Maths   95  Asian
## 13        13       <NA>  NA   <NA> Social Science   87  Latin
## 14        14       <NA>  NA   <NA> Social Science   90  Latin
## 15        15       <NA>  NA   <NA> Social Science   72  Latin
## 16        16       <NA>  NA   <NA> Social Science   79  Latin
## 17        17       <NA>  NA   <NA>           Arts   92 Indian
## 18        18       <NA>  NA   <NA>           Arts   76 Indian
## 19        19       <NA>  NA   <NA>           Arts   78 Indian
## 20        20       <NA>  NA   <NA>           Arts   78 Indian
So we got the results exact mirror image of Left Outer Join. Great!

Now let's learn another method of joining two tables...

Example 5 : Cross Join

In this method of joining (or merging), each row from Table 1 (DF 1) joins with all the rows in Table 2 (DF 2).

So, if DF1 contains x rows and DF2 contains y rows then this type of merged (cross joined) result will contain x * y rows.

Here it is how....
knitr::kable(merge(x = DataFrame1, y = DataFrame2, by = NULL))
StudentId.x Subject CGPA Race StudentId.y State Age Gender
1 Physics 84 White 1 Arizona 20 Male
2 Physics 98 White 1 Arizona 20 Male
3 Physics 95 White 1 Arizona 20 Male
4 Physics 98 White 1 Arizona 20 Male
5 Chemistry 87 Black 1 Arizona 20 Male
6 Chemistry 81 Black 1 Arizona 20 Male
7 Chemistry 92 Black 1 Arizona 20 Male
8 Chemistry 75 Black 1 Arizona 20 Male
9 Maths 92 Asian 1 Arizona 20 Male
10 Maths 85 Asian 1 Arizona 20 Male
11 Maths 82 Asian 1 Arizona 20 Male
12 Maths 95 Asian 1 Arizona 20 Male
13 Social Science 87 Latin 1 Arizona 20 Male
14 Social Science 90 Latin 1 Arizona 20 Male
15 Social Science 72 Latin 1 Arizona 20 Male
16 Social Science 79 Latin 1 Arizona 20 Male
17 Arts 92 Indian 1 Arizona 20 Male
18 Arts 76 Indian 1 Arizona 20 Male
19 Arts 78 Indian 1 Arizona 20 Male
20 Arts 78 Indian 1 Arizona 20 Male
1 Physics 84 White 2 Arizona 17 Male
2 Physics 98 White 2 Arizona 17 Male
3 Physics 95 White 2 Arizona 17 Male
4 Physics 98 White 2 Arizona 17 Male
5 Chemistry 87 Black 2 Arizona 17 Male
6 Chemistry 81 Black 2 Arizona 17 Male
7 Chemistry 92 Black 2 Arizona 17 Male
8 Chemistry 75 Black 2 Arizona 17 Male
9 Maths 92 Asian 2 Arizona 17 Male
10 Maths 85 Asian 2 Arizona 17 Male
11 Maths 82 Asian 2 Arizona 17 Male
12 Maths 95 Asian 2 Arizona 17 Male
13 Social Science 87 Latin 2 Arizona 17 Male
14 Social Science 90 Latin 2 Arizona 17 Male
15 Social Science 72 Latin 2 Arizona 17 Male
16 Social Science 79 Latin 2 Arizona 17 Male
17 Arts 92 Indian 2 Arizona 17 Male
18 Arts 76 Indian 2 Arizona 17 Male
19 Arts 78 Indian 2 Arizona 17 Male
20 Arts 78 Indian 2 Arizona 17 Male
1 Physics 84 White 3 New York 17 Male
2 Physics 98 White 3 New York 17 Male
3 Physics 95 White 3 New York 17 Male
4 Physics 98 White 3 New York 17 Male
5 Chemistry 87 Black 3 New York 17 Male
6 Chemistry 81 Black 3 New York 17 Male
7 Chemistry 92 Black 3 New York 17 Male
8 Chemistry 75 Black 3 New York 17 Male
9 Maths 92 Asian 3 New York 17 Male
10 Maths 85 Asian 3 New York 17 Male
11 Maths 82 Asian 3 New York 17 Male
12 Maths 95 Asian 3 New York 17 Male
13 Social Science 87 Latin 3 New York 17 Male
14 Social Science 90 Latin 3 New York 17 Male
15 Social Science 72 Latin 3 New York 17 Male
16 Social Science 79 Latin 3 New York 17 Male
17 Arts 92 Indian 3 New York 17 Male
18 Arts 76 Indian 3 New York 17 Male
19 Arts 78 Indian 3 New York 17 Male
20 Arts 78 Indian 3 New York 17 Male
1 Physics 84 White 4 New York 20 Female
2 Physics 98 White 4 New York 20 Female
3 Physics 95 White 4 New York 20 Female
4 Physics 98 White 4 New York 20 Female
5 Chemistry 87 Black 4 New York 20 Female
6 Chemistry 81 Black 4 New York 20 Female
7 Chemistry 92 Black 4 New York 20 Female
8 Chemistry 75 Black 4 New York 20 Female
9 Maths 92 Asian 4 New York 20 Female
10 Maths 85 Asian 4 New York 20 Female
11 Maths 82 Asian 4 New York 20 Female
12 Maths 95 Asian 4 New York 20 Female
13 Social Science 87 Latin 4 New York 20 Female
14 Social Science 90 Latin 4 New York 20 Female
15 Social Science 72 Latin 4 New York 20 Female
16 Social Science 79 Latin 4 New York 20 Female
17 Arts 92 Indian 4 New York 20 Female
18 Arts 76 Indian 4 New York 20 Female
19 Arts 78 Indian 4 New York 20 Female
20 Arts 78 Indian 4 New York 20 Female
1 Physics 84 White 5 Florida 16 Female
2 Physics 98 White 5 Florida 16 Female
3 Physics 95 White 5 Florida 16 Female
4 Physics 98 White 5 Florida 16 Female
5 Chemistry 87 Black 5 Florida 16 Female
6 Chemistry 81 Black 5 Florida 16 Female
7 Chemistry 92 Black 5 Florida 16 Female
8 Chemistry 75 Black 5 Florida 16 Female
9 Maths 92 Asian 5 Florida 16 Female
10 Maths 85 Asian 5 Florida 16 Female
11 Maths 82 Asian 5 Florida 16 Female
12 Maths 95 Asian 5 Florida 16 Female
13 Social Science 87 Latin 5 Florida 16 Female
14 Social Science 90 Latin 5 Florida 16 Female
15 Social Science 72 Latin 5 Florida 16 Female
16 Social Science 79 Latin 5 Florida 16 Female
17 Arts 92 Indian 5 Florida 16 Female
18 Arts 76 Indian 5 Florida 16 Female
19 Arts 78 Indian 5 Florida 16 Female
20 Arts 78 Indian 5 Florida 16 Female
1 Physics 84 White 6 Florida 16 Female
2 Physics 98 White 6 Florida 16 Female
3 Physics 95 White 6 Florida 16 Female
4 Physics 98 White 6 Florida 16 Female
5 Chemistry 87 Black 6 Florida 16 Female
6 Chemistry 81 Black 6 Florida 16 Female
7 Chemistry 92 Black 6 Florida 16 Female
8 Chemistry 75 Black 6 Florida 16 Female
9 Maths 92 Asian 6 Florida 16 Female
10 Maths 85 Asian 6 Florida 16 Female
11 Maths 82 Asian 6 Florida 16 Female
12 Maths 95 Asian 6 Florida 16 Female
13 Social Science 87 Latin 6 Florida 16 Female
14 Social Science 90 Latin 6 Florida 16 Female
15 Social Science 72 Latin 6 Florida 16 Female
16 Social Science 79 Latin 6 Florida 16 Female
17 Arts 92 Indian 6 Florida 16 Female
18 Arts 76 Indian 6 Florida 16 Female
19 Arts 78 Indian 6 Florida 16 Female
20 Arts 78 Indian 6 Florida 16 Female
1 Physics 84 White 7 California 18 Male
2 Physics 98 White 7 California 18 Male
3 Physics 95 White 7 California 18 Male
4 Physics 98 White 7 California 18 Male
5 Chemistry 87 Black 7 California 18 Male
6 Chemistry 81 Black 7 California 18 Male
7 Chemistry 92 Black 7 California 18 Male
8 Chemistry 75 Black 7 California 18 Male
9 Maths 92 Asian 7 California 18 Male
10 Maths 85 Asian 7 California 18 Male
11 Maths 82 Asian 7 California 18 Male
12 Maths 95 Asian 7 California 18 Male
13 Social Science 87 Latin 7 California 18 Male
14 Social Science 90 Latin 7 California 18 Male
15 Social Science 72 Latin 7 California 18 Male
16 Social Science 79 Latin 7 California 18 Male
17 Arts 92 Indian 7 California 18 Male
18 Arts 76 Indian 7 California 18 Male
19 Arts 78 Indian 7 California 18 Male
20 Arts 78 Indian 7 California 18 Male
1 Physics 84 White 8 California 19 Male
2 Physics 98 White 8 California 19 Male
3 Physics 95 White 8 California 19 Male
4 Physics 98 White 8 California 19 Male
5 Chemistry 87 Black 8 California 19 Male
6 Chemistry 81 Black 8 California 19 Male
7 Chemistry 92 Black 8 California 19 Male
8 Chemistry 75 Black 8 California 19 Male
9 Maths 92 Asian 8 California 19 Male
10 Maths 85 Asian 8 California 19 Male
11 Maths 82 Asian 8 California 19 Male
12 Maths 95 Asian 8 California 19 Male
13 Social Science 87 Latin 8 California 19 Male
14 Social Science 90 Latin 8 California 19 Male
15 Social Science 72 Latin 8 California 19 Male
16 Social Science 79 Latin 8 California 19 Male
17 Arts 92 Indian 8 California 19 Male
18 Arts 76 Indian 8 California 19 Male
19 Arts 78 Indian 8 California 19 Male
20 Arts 78 Indian 8 California 19 Male
1 Physics 84 White 9 Texas 20 Female
2 Physics 98 White 9 Texas 20 Female
3 Physics 95 White 9 Texas 20 Female
4 Physics 98 White 9 Texas 20 Female
5 Chemistry 87 Black 9 Texas 20 Female
6 Chemistry 81 Black 9 Texas 20 Female
7 Chemistry 92 Black 9 Texas 20 Female
8 Chemistry 75 Black 9 Texas 20 Female
9 Maths 92 Asian 9 Texas 20 Female
10 Maths 85 Asian 9 Texas 20 Female
11 Maths 82 Asian 9 Texas 20 Female
12 Maths 95 Asian 9 Texas 20 Female
13 Social Science 87 Latin 9 Texas 20 Female
14 Social Science 90 Latin 9 Texas 20 Female
15 Social Science 72 Latin 9 Texas 20 Female
16 Social Science 79 Latin 9 Texas 20 Female
17 Arts 92 Indian 9 Texas 20 Female
18 Arts 76 Indian 9 Texas 20 Female
19 Arts 78 Indian 9 Texas 20 Female
20 Arts 78 Indian 9 Texas 20 Female
1 Physics 84 White 10 Texas 16 Female
2 Physics 98 White 10 Texas 16 Female
3 Physics 95 White 10 Texas 16 Female
4 Physics 98 White 10 Texas 16 Female
5 Chemistry 87 Black 10 Texas 16 Female
6 Chemistry 81 Black 10 Texas 16 Female
7 Chemistry 92 Black 10 Texas 16 Female
8 Chemistry 75 Black 10 Texas 16 Female
9 Maths 92 Asian 10 Texas 16 Female
10 Maths 85 Asian 10 Texas 16 Female
11 Maths 82 Asian 10 Texas 16 Female
12 Maths 95 Asian 10 Texas 16 Female
13 Social Science 87 Latin 10 Texas 16 Female
14 Social Science 90 Latin 10 Texas 16 Female
15 Social Science 72 Latin 10 Texas 16 Female
16 Social Science 79 Latin 10 Texas 16 Female
17 Arts 92 Indian 10 Texas 16 Female
18 Arts 76 Indian 10 Texas 16 Female
19 Arts 78 Indian 10 Texas 16 Female
20 Arts 78 Indian 10 Texas 16 Female
Thanks all folks!! 

Please post your questions and doubts if you have any.

 

 

Advertisements