<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://training-course-material.com/index.php?action=history&amp;feed=atom&amp;title=R_-_Grouping_Data</id>
	<title>R - Grouping Data - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://training-course-material.com/index.php?action=history&amp;feed=atom&amp;title=R_-_Grouping_Data"/>
	<link rel="alternate" type="text/html" href="https://training-course-material.com/index.php?title=R_-_Grouping_Data&amp;action=history"/>
	<updated>2026-05-14T01:15:28Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://training-course-material.com/index.php?title=R_-_Grouping_Data&amp;diff=52345&amp;oldid=prev</id>
		<title>Daniel Rodriguez at 05:24, 14 February 2017</title>
		<link rel="alternate" type="text/html" href="https://training-course-material.com/index.php?title=R_-_Grouping_Data&amp;diff=52345&amp;oldid=prev"/>
		<updated>2017-02-14T05:24:29Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;[[Category:Intro to R|030]]&lt;br /&gt;
&lt;br /&gt;
Find the average salary in the whole company.&lt;br /&gt;
 &amp;gt; mean(emp$SAL)&lt;br /&gt;
 [1] 2073.214&lt;br /&gt;
&lt;br /&gt;
How many employees are in the database?&lt;br /&gt;
 &amp;gt; length(emp$ID)&lt;br /&gt;
 [1] 14&lt;br /&gt;
&lt;br /&gt;
Other salary statistics (minimal, maximal, sum, etc.)&lt;br /&gt;
 &amp;gt; sd(emp$SAL)&lt;br /&gt;
 [1] 1182.503&lt;br /&gt;
 &amp;gt; var(emp$SAL)&lt;br /&gt;
 [1] 1398314&lt;br /&gt;
 &amp;gt; min(emp$SAL)&lt;br /&gt;
 [1] 800&lt;br /&gt;
 &amp;gt; max(emp$SAL)&lt;br /&gt;
 [1] 5000&lt;br /&gt;
 &amp;gt; sum(emp$SAL)&lt;br /&gt;
 [1] 29025&lt;br /&gt;
&lt;br /&gt;
 &amp;gt; summary(emp)&lt;br /&gt;
       ID          ENAME                  JOB         MGR          HIREDATE               SAL            COMM          DEPTNO     &lt;br /&gt;
 Min.   :7369   Length:14          ANALYST  :2   Min.   :7566   Min.   :1980-12-17   Min.   : 800   Min.   :   0   Min.   :10.00  &lt;br /&gt;
 1st Qu.:7588   Class :character   CLERK    :4   1st Qu.:7698   1st Qu.:1981-04-09   1st Qu.:1250   1st Qu.: 225   1st Qu.:20.00  &lt;br /&gt;
 Median :7785   Mode  :character   MANAGER  :3   Median :7698   Median :1981-09-18   Median :1550   Median : 400   Median :20.00  &lt;br /&gt;
 Mean   :7727                      PRESIDENT:1   Mean   :7739   Mean   :1981-09-29   Mean   :2073   Mean   : 550   Mean   :22.14  &lt;br /&gt;
 3rd Qu.:7868                      SALESMAN :4   3rd Qu.:7839   3rd Qu.:1981-12-03   3rd Qu.:2944   3rd Qu.: 725   3rd Qu.:30.00  &lt;br /&gt;
 Max.   :7934                                    Max.   :7902   Max.   :1983-01-12   Max.   :5000   Max.   :1400   Max.   :30.00  &lt;br /&gt;
                                                 NA&amp;#039;s   :1                                          NA&amp;#039;s   :10&lt;br /&gt;
The median for department number is pretty much useless. It is better to treat DEPTNO as a factor&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 &amp;gt; emp$DEPTNO &amp;lt;- as.factor(DEPTNO)&lt;br /&gt;
&lt;br /&gt;
 &amp;gt; summary(emp)&lt;br /&gt;
       ID          ENAME                  JOB         MGR          HIREDATE               SAL            COMM      DEPTNO&lt;br /&gt;
 Min.   :7369   Length:14          ANALYST  :2   Min.   :7566   Min.   :1980-12-17   Min.   : 800   Min.   :   0   10:3  &lt;br /&gt;
 1st Qu.:7588   Class :character   CLERK    :4   1st Qu.:7698   1st Qu.:1981-04-09   1st Qu.:1250   1st Qu.: 225   20:5  &lt;br /&gt;
 Median :7785   Mode  :character   MANAGER  :3   Median :7698   Median :1981-09-18   Median :1550   Median : 400   30:6  &lt;br /&gt;
 Mean   :7727                      PRESIDENT:1   Mean   :7739   Mean   :1981-09-29   Mean   :2073   Mean   : 550         &lt;br /&gt;
 3rd Qu.:7868                      SALESMAN :4   3rd Qu.:7839   3rd Qu.:1981-12-03   3rd Qu.:2944   3rd Qu.: 725         &lt;br /&gt;
 Max.   :7934                                    Max.   :7902   Max.   :1983-01-12   Max.   :5000   Max.   :1400         &lt;br /&gt;
                                                 NA&amp;#039;s   :1                                          NA&amp;#039;s   :10&lt;br /&gt;
&lt;br /&gt;
=== Grouping ===&lt;br /&gt;
Groups contain all rows which have the same values in a column or columns. The grouping columns are called &amp;#039;&amp;#039;factors&amp;#039;&amp;#039; in R.&lt;br /&gt;
&lt;br /&gt;
Find averages in specific jobs&lt;br /&gt;
 &amp;gt; tapply(SAL,JOB,mean)&lt;br /&gt;
  ANALYST     CLERK   MANAGER PRESIDENT  SALESMAN &lt;br /&gt;
 3000.000  1037.500  2758.333  5000.000  1400.000&lt;br /&gt;
&lt;br /&gt;
You can group by more than one factor&lt;br /&gt;
 &amp;gt; tapply(SAL,list(JOB,DEPTNO),mean)&lt;br /&gt;
             10   20   30&lt;br /&gt;
 ANALYST     NA 3000   NA&lt;br /&gt;
 CLERK     1300  950  950&lt;br /&gt;
 MANAGER   2450 2975 2850&lt;br /&gt;
 PRESIDENT 5000   NA   NA&lt;br /&gt;
 SALESMAN    NA   NA 1400&lt;br /&gt;
&lt;br /&gt;
Example with three factors&lt;br /&gt;
 &amp;gt; tapply(SAL,list(JOB,DEPTNO,format(emp$HIREDATE,&amp;quot;%Y&amp;quot;)),mean)&lt;br /&gt;
 , , 1980&lt;br /&gt;
 &lt;br /&gt;
           10  20 30&lt;br /&gt;
 ANALYST   NA  NA NA&lt;br /&gt;
 CLERK     NA 800 NA&lt;br /&gt;
 MANAGER   NA  NA NA&lt;br /&gt;
 PRESIDENT NA  NA NA&lt;br /&gt;
 SALESMAN  NA  NA NA &lt;br /&gt;
 &lt;br /&gt;
 , , 1981&lt;br /&gt;
 &lt;br /&gt;
             10   20   30&lt;br /&gt;
 ANALYST     NA 3000   NA&lt;br /&gt;
 CLERK       NA   NA  950&lt;br /&gt;
 MANAGER   2450 2975 2850&lt;br /&gt;
 PRESIDENT 5000   NA   NA&lt;br /&gt;
 SALESMAN    NA   NA 1400 &lt;br /&gt;
 &lt;br /&gt;
 , , 1982&lt;br /&gt;
 &lt;br /&gt;
             10   20 30&lt;br /&gt;
 ANALYST     NA 3000 NA&lt;br /&gt;
 CLERK     1300   NA NA&lt;br /&gt;
 MANAGER     NA   NA NA&lt;br /&gt;
 PRESIDENT   NA   NA NA&lt;br /&gt;
 SALESMAN    NA   NA NA &lt;br /&gt;
 &lt;br /&gt;
 , , 1983&lt;br /&gt;
 &lt;br /&gt;
           10   20 30&lt;br /&gt;
 ANALYST   NA   NA NA&lt;br /&gt;
 CLERK     NA 1100 NA&lt;br /&gt;
 MANAGER   NA   NA NA&lt;br /&gt;
 PRESIDENT NA   NA NA&lt;br /&gt;
 SALESMAN  NA   NA NA&lt;br /&gt;
&lt;br /&gt;
== Grouping with data.table package ==&lt;br /&gt;
&amp;lt;source lang=&amp;quot;rsplus&amp;quot;&amp;gt;&lt;br /&gt;
 install.packages(&amp;quot;data.table&amp;quot;)&lt;br /&gt;
 library(&amp;quot;data.table&amp;quot;)&lt;br /&gt;
 # Convert&lt;br /&gt;
 emp.dt = data.table(emp)&lt;br /&gt;
 # Group&lt;br /&gt;
 emp.dt[,sum(SAL),by=JOB]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
            JOB   V1&lt;br /&gt;
 [1,] PRESIDENT 5000&lt;br /&gt;
 [2,]   MANAGER 8275&lt;br /&gt;
 [3,]  SALESMAN 5600&lt;br /&gt;
 [4,]     CLERK 4150&lt;br /&gt;
 [5,]   ANALYST 6000&lt;br /&gt;
&lt;br /&gt;
==Exercises==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;23.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Find the minimal, maximal and average salaries in the whole company.&lt;br /&gt;
      Minimal Maximal     Mean&lt;br /&gt;
 [1,]     800    5000 2073.214&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;24.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Find the difference between maximal and minimal salary.&lt;br /&gt;
 [1] 4200&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;25.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Find the average salary for every post.&lt;br /&gt;
  ANALYST     CLERK   MANAGER PRESIDENT  SALESMAN &lt;br /&gt;
 3000.000  1037.500  2758.333  5000.000  1400.000 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;26.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
How many managers work for the company?&lt;br /&gt;
 MANAGER &lt;br /&gt;
       3 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;27.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Find the average annual salaries in departments.&lt;br /&gt;
    10    20    30 &lt;br /&gt;
 35000 26100 18800 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;28.  Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Find departments with more than 3 workers.&lt;br /&gt;
 20 30 &lt;br /&gt;
  5  6 &lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;29.  *Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Check that every employees id is unique.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;30.  *Exercise&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
For each of the managers find a minimum salary of their subordinates.&lt;br /&gt;
&lt;br /&gt;
 +------+----------+&lt;br /&gt;
 | mgr  | min(sal) |&lt;br /&gt;
 +------+----------+&lt;br /&gt;
 | NULL | 5000.00  |&lt;br /&gt;
 | 7566 | 3000.00  |&lt;br /&gt;
 | 7698 | 950.00   |&lt;br /&gt;
 | 7782 | 1300.00  |&lt;br /&gt;
 | 7788 | 1100.00  |&lt;br /&gt;
 | 7839 | 2450.00  |&lt;br /&gt;
 | 7902 | 800.00   |&lt;br /&gt;
 +------+----------+&lt;/div&gt;</summary>
		<author><name>Daniel Rodriguez</name></author>
	</entry>
</feed>