R - Data Processing

From Training Material
Revision as of 07:37, 14 February 2017 by Bernard Szlachta (talk | contribs) (→‎Exercises)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Tables used in the exercises

EMP Table

+------+----------+-----------+------+------------+---------+---------+--------+
| ID   | ENAME    | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+------+----------+-----------+------+------------+---------+---------+--------+
| 7839 | BUSH     | PRESIDENT | NULL | 1981-11-17 | 5000.00 | NULL    | 10     |
| 7698 | BLAIR    | MANAGER   | 7839 | 1981-05-01 | 2850.00 | NULL    | 30     |
| 7782 | MERKEL   | MANAGER   | 7839 | 1981-06-09 | 2450.00 | NULL    | 10     |
| 7566 | PUTIN    | MANAGER   | 7839 | 1981-04-02 | 2975.00 | NULL    | 20     |
| 7654 | CHIRACK  | SALESMAN  | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30     |
| 7499 | BAROSSO  | SALESMAN  | 7698 | 1981-02-20 | 1600.00 | 300.00  | 30     |
| 7844 | GATES    | SALESMAN  | 7698 | 1981-09-08 | 1500.00 | 0.00    | 30     |
| 7900 | BUFFETT  | CLERK     | 7698 | 1981-12-03 | 950.00  | NULL    | 30     |
| 7521 | WALTON   | SALESMAN  | 7698 | 1981-02-22 | 1250.00 | 500.00  | 30     |
| 7902 | TOOSK    | ANALYST   | 7566 | 1981-12-03 | 3000.00 | NULL    | 20     |
| 7369 | THATCHER | CLERK     | 7902 | 1980-12-17 | 800.00  | NULL    | 20     |
| 7788 | CARNEGIE | ANALYST   | 7566 | 1982-12-09 | 3000.00 | NULL    | 20     |
| 7876 | FORD     | CLERK     | 7788 | 1983-01-12 | 1100.00 | NULL    | 20     |
| 7934 | ELISON   | CLERK     | 7782 | 1982-01-23 | 1300.00 | NULL    | 10     |
+------+----------+-----------+------+------------+---------+---------+--------+

DEPT Table

+--------+------------+----------+
| DEPTNO | DNAME      | LOC      |
+--------+------------+----------+
| 10     | ACCOUNTING | NEW YORK |
| 20     | RESEARCH   | LONDON   |
| 30     | SALES      | PARIS    |
| 40     | OPERATIONS | BERLIN   |
+--------+------------+----------+

SALGRADE Table

+-------+-------+-------+
| GRADE | LOSAL | HISAL |
+-------+-------+-------+
| 1     | 700   | 1200  |
| 2     | 1201  | 1400  |
| 3     | 1401  | 2000  |
| 4     | 2001  | 3000  |
| 5     | 3001  | 9999  |
+-------+-------+-------+

Loading data

In order to load data from a text file we use read.table function

 
emp <- read.table("https://training-course-material.com/images/9/97/Emp.txt",header=TRUE)

Show the data

 emp
    ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1  7839     BUSH PRESIDENT   NA 1981-11-17 5000   NA     10
2  7698    BLAIR   MANAGER 7839 1981-05-01 2850   NA     30
3  7782   MERKEL   MANAGER 7839 1981-06-09 2450   NA     10
4  7566    PUTIN   MANAGER 7839 1981-04-02 2975   NA     20
5  7654  CHIRACK  SALESMAN 7698 1981-09-28 1250 1400     30
6  7499  BAROSSO  SALESMAN 7698 1981-02-20 1600  300     30
7  7844    GATES  SALESMAN 7698 1981-09-08 1500    0     30
8  7900  BUFFETT     CLERK 7698 1981-12-03  950   NA     30
9  7521   WALTON  SALESMAN 7698 1981-02-22 1250  500     30
10 7902    TOOSK   ANALYST 7566 1981-12-03 3000   NA     20
11 7369 THATCHER     CLERK 7902 1980-12-17  800   NA     20
12 7788 CARNEGIE   ANALYST 7566 1982-12-09 3000   NA     20
13 7876     FORD     CLERK 7788 1983-01-12 1100   NA     20
14 7934   ELISON     CLERK 7782 1982-01-23 1300   NA     10

Show the structure of the emp data frame

> str(emp)
'data.frame':   14 obs. of  8 variables:
$ ID      : int  7839 7698 7782 7566 7654 7499 7844 7900 7521 7902 ...
$ ENAME   : Factor w/ 14 levels "BAROSSO","BLAIR",..: 4 2 10 11 6 1 9 3 14 13 ...
$ JOB     : Factor w/ 5 levels "ANALYST","CLERK",..: 4 3 3 3 5 5 5 2 5 1 ...
$ MGR     : int  NA 7839 7839 7839 7698 7698 7698 7698 7698 7566 ...
$ HIREDATE: Factor w/ 13 levels "01/05/1981","02/04/1981",..: 8 1 5 2 13 10 4 3 11 3 ...
$ SAL     : int  5000 2850 2450 2975 1250 1600 1500 950 1250 3000 ...
$ COMM    : int  NA NA NA NA 1400 300 0 NA 500 NA ...
$ DEPTNO  : int  10 30 10 20 30 30 30 30 30 20 ...

Converting Data

ENAME column is not really a factor, but rather a character column

Convert factor to a character

emp$ENAME <- as.character(emp$ENAME)
> str(emp)
'data.frame':   14 obs. of  8 variables:
$ ID      : int  7839 7698 7782 7566 7654 7499 7844 7900 7521 7902 ...
$ ENAME   : chr  "BUSH" "BLAIR" "MERKEL" "PUTIN" ...
... 
$ DEPTNO  : int  10 30 10 20 30 30 30 30 30 20 ...

Similar way convert HIREDATE column

emp$HIREDATE <- as.Date(emp$HIREDATE,format="%d/%m/%Y")


Loading data exercises

  1. Load dept and salgrade tabled into data frames using read.table function.

Call the variable dept and salgrade accordingly

Trivia

  • R is case sensitive
  • White charters are ignored
  • Commands end with semicolon ";" or Enter (new line) charter
  • You can comment the rest of the line with the number character (hash) "#"
emp #This is just a comment
  • Operators work the same way as in maths or other programming languages
2+2; 12-3; -(10+2); 4*3; 13/4; 2**4; 2^ 4; 2/0


Selecting Data

Display all columns and rows from emp data frame.

> emp
     ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1  7839     BUSH PRESIDENT   NA 17/11/1981 5000   NA     10
2  7698    BLAIR   MANAGER 7839 01/05/1981 2850   NA     30
3  7782   MERKEL   MANAGER 7839 09/06/1981 2450   NA     10
...

Select three columns: ename, job and sal and all rows from emp table.

> emp[,c("ENAME","JOB","SAL")]
      ENAME       JOB  SAL
1      BUSH PRESIDENT 5000
2     BLAIR   MANAGER 2850
3    MERKEL   MANAGER 2450
...
14   ELISON     CLERK 1300

Select the one column on its own and return a vector

> emp$SAL
[1] 5000 2850 2450 2975 1250 1600 1500  950 1250 3000  800 3000 1100 1300

Combining Data

Select two columns and combine them

> data.frame(SAL,ENAME)
    SAL    ENAME
1  5000     BUSH
2  2850    BLAIR
3  2450   MERKEL
4  2975    PUTIN
... 

You can add labels as well

> data.frame(Salary=emp$SAL,'Employee Name' = emp$ENAME)
   Salary Employee.Name
1    5000          BUSH
2    2850         BLAIR
3    2450        MERKEL
4    2975         PUTIN
5    1250       CHIRACK
6    1600       BAROSSO

Calculate the annual salary

> data.frame(emp$ENAME,AnnualSal=emp$SAL*12)
   emp.ENAME AnnualSal
1       BUSH     60000
2      BLAIR     34200
3     MERKEL     29400
4      PUTIN     35700
5    CHIRACK     15000
...

Attaching dataframe

We can attach the table so we do not have to write the emp$ prefix

> attach(emp)
> data.frame(ENAME,AnnualSal=SAL*12)
   ENAME    AnnualSal
1 "BUSH"     60000
2 "BLAIR"    34200
...

Concatenation

Concatenate two columns

> paste(ENAME,"is",JOB)
[1] "BUSH is PRESIDENT"   "BLAIR is MANAGER"    "MERKEL is MANAGER"   "PUTIN is MANAGER"   
[5] "CHIRACK is SALESMAN" "BAROSSO is SALESMAN" "GATES is SALESMAN"   "BUFFETT is CLERK"   
[9] "WALTON is SALESMAN"  "TOOSK is ANALYST"    "THATCHER is CLERK"   "CARNEGIE is ANALYST"
[13] "FORD is CLERK"       "ELISON is CLERK"

Sorting

Sort by salary in ascending order

 emp[order(SAL),]

Sort by salary in descending order

 emp[order(SAL,decreasing=TRUE),]

Sort by multiple columns

 emp[order(JOB,SAL),]
           ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
TOOSK    7902    TOOSK   ANALYST 7566 03/12/1981 3000   NA     20
CARNEGIE 7788 CARNEGIE   ANALYST 7566 09/12/1982 3000   NA     20
THATCHER 7369 THATCHER     CLERK 7902 17/12/1980  800   NA     20
BUFFETT  7900  BUFFETT     CLERK 7698 03/12/1981  950   NA     30
FORD     7876     FORD     CLERK 7788 12/01/1983 1100   NA     20
ELISON   7934   ELISON     CLERK 7782 23/01/1982 1300   NA     10
MERKEL   7782   MERKEL   MANAGER 7839 09/06/1981 2450   NA     10
BLAIR    7698    BLAIR   MANAGER 7839 01/05/1981 2850   NA     30
PUTIN    7566    PUTIN   MANAGER 7839 02/04/1981 2975   NA     20
BUSH     7839     BUSH PRESIDENT   NA 17/11/1981 5000   NA     10
CHIRACK  7654  CHIRACK  SALESMAN 7698 28/09/1981 1250 1400     30
WALTON   7521   WALTON  SALESMAN 7698 22/02/1981 1250  500     30
GATES    7844    GATES  SALESMAN 7698 08/09/1981 1500    0     30
BAROSSO  7499  BAROSSO  SALESMAN 7698 20/02/1981 1600  300     30

Sort by HIREDATE, but do not show it

> emp[order(HIREDATE),c("ENAME","SAL")]
      ENAME  SAL
11 THATCHER  800
6   BAROSSO 1600
9    WALTON 1250
4     PUTIN 2975
2     BLAIR 2850
3    MERKEL 2450
7     GATES 1500
5   CHIRACK 1250
1      BUSH 5000
8   BUFFETT  950
10    TOOSK 3000
14   ELISON 1300
12 CARNEGIE 3000
13     FORD 1100

Sorting one column in reverse order

emp[order(DEPTNO,-SAL, decreasing=TRUE),]
    ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
8  7900  BUFFETT     CLERK 7698 03/12/1981  950   NA     30
5  7654  CHIRACK  SALESMAN 7698 28/09/1981 1250 1400     30
9  7521   WALTON  SALESMAN 7698 22/02/1981 1250  500     30
7  7844    GATES  SALESMAN 7698 08/09/1981 1500    0     30
6  7499  BAROSSO  SALESMAN 7698 20/02/1981 1600  300     30
2  7698    BLAIR   MANAGER 7839 01/05/1981 2850   NA     30
11 7369 THATCHER     CLERK 7902 17/12/1980  800   NA     20
13 7876     FORD     CLERK 7788 12/01/1983 1100   NA     20
4  7566    PUTIN   MANAGER 7839 02/04/1981 2975   NA     20
10 7902    TOOSK   ANALYST 7566 03/12/1981 3000   NA     20
12 7788 CARNEGIE   ANALYST 7566 09/12/1982 3000   NA     20
14 7934   ELISON     CLERK 7782 23/01/1982 1300   NA     10
3  7782   MERKEL   MANAGER 7839 09/06/1981 2450   NA     10
1  7839     BUSH PRESIDENT   NA 17/11/1981 5000   NA     10

Filtering Results

Select employees earning more than 2000

> emp[SAL>2000,]
     ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1  7839     BUSH PRESIDENT   NA 1981-11-17 5000   NA     10
2  7698    BLAIR   MANAGER 7839 1981-05-01 2850   NA     30
3  7782   MERKEL   MANAGER 7839 1981-06-09 2450   NA     10
4  7566    PUTIN   MANAGER 7839 1981-04-02 2975   NA     20
10 7902    TOOSK   ANALYST 7566 1981-12-03 3000   NA     20
12 7788 CARNEGIE   ANALYST 7566 1982-12-09 3000   NA     20

How does it work?


The SAL>2000 expression returns a vector of boolean values.

> SAL>2000
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE

If the salary is greater than 2000 for a row, the corresponding value will be TRUE, otherwise, FALSE.

Than, the data frame selects only those rows which have true value in the corresponding rows.


Select only managers.

> emp[JOB == "MANAGER",]
   ID  ENAME     JOB  MGR   HIREDATE  SAL COMM DEPTNO
2 7698  BLAIR MANAGER 7839 1981-05-01 2850   NA     30
3 7782 MERKEL MANAGER 7839 1981-06-09 2450   NA     10
4 7566  PUTIN MANAGER 7839 1981-04-02 2975   NA     20

Select people hired before or on 1st of April 1981.

> emp[HIREDATE <= '1981-04-01',]
    ID    ENAME      JOB  MGR   HIREDATE  SAL COMM DEPTNO
6  7499  BAROSSO SALESMAN 7698 1981-02-20 1600  300     30
9  7521   WALTON SALESMAN 7698 1981-02-22 1250  500     30
11 7369 THATCHER    CLERK 7902 1980-12-17  800   NA     20

Select all except clerks.

> emp[JOB != 'CLERK',]
    ID    ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1  7839     BUSH PRESIDENT   NA 1981-11-17 5000   NA     10
2  7698    BLAIR   MANAGER 7839 1981-05-01 2850   NA     30
3  7782   MERKEL   MANAGER 7839 1981-06-09 2450   NA     10
4  7566    PUTIN   MANAGER 7839 1981-04-02 2975   NA     20
5  7654  CHIRACK  SALESMAN 7698 1981-09-28 1250 1400     30
6  7499  BAROSSO  SALESMAN 7698 1981-02-20 1600  300     30
7  7844    GATES  SALESMAN 7698 1981-09-08 1500    0     30
9  7521   WALTON  SALESMAN 7698 1981-02-22 1250  500     30
10 7902    TOOSK   ANALYST 7566 1981-12-03 3000   NA     20
12 7788 CARNEGIE   ANALYST 7566 1982-12-09 3000   NA     20

Select only name and salary of salesmen

> emp[JOB == 'SALESMAN',c("ENAME","SAL")]
    ENAME  SAL
5 CHIRACK 1250
6 BAROSSO 1600
7   GATES 1500
9  WALTON 1250


Complex Expressions

To select people earning from 1000 to 2000 (inclusive) you can type:

 > emp[SAL >= 1000 & SAL <= 2000,]
    ID   ENAME      JOB  MGR   HIREDATE  SAL COMM DEPTNO
5  7654 CHIRACK SALESMAN 7698 1981-09-28 1250 1400     30
6  7499 BAROSSO SALESMAN 7698 1981-02-20 1600  300     30
7  7844   GATES SALESMAN 7698 1981-09-08 1500    0     30
9  7521  WALTON SALESMAN 7698 1981-02-22 1250  500     30
13 7876    FORD    CLERK 7788 1983-01-12 1100   NA     20
14 7934  ELISON    CLERK 7782 1982-01-23 1300   NA     10

Select people working in department no 10 or 20.

> emp[DEPTNO == 10 | DEPTNO == 20,c("ENAME","DEPTNO")]
      ENAME DEPTNO
1      BUSH     10
3    MERKEL     10
4     PUTIN     20
10    TOOSK     20
11 THATCHER     20
12 CARNEGIE     20
13     FORD     20
14   ELISON     10

Select managers, clerks and presidents

> emp[JOB == 'PRESIDENT' | JOB == 'CLERK' | JOB == 'MANAGER',c("ENAME","JOB")]
      ENAME       JOB
1      BUSH PRESIDENT
2     BLAIR   MANAGER
3    MERKEL   MANAGER
4     PUTIN   MANAGER
8   BUFFETT     CLERK
11 THATCHER     CLERK
13     FORD     CLERK
14   ELISON     CLERK

The above can be simplified to:

> emp[JOB %in% c('PRESIDENT','CLERK','MANAGER'),c("ENAME","JOB")]
      ENAME       JOB
1      BUSH PRESIDENT
2     BLAIR   MANAGER
3    MERKEL   MANAGER
4     PUTIN   MANAGER
8   BUFFETT     CLERK
11 THATCHER     CLERK
13     FORD     CLERK
14   ELISON     CLERK

Finding matches

Select people with names starting with “B”.

> emp[grep("B.*",ENAME),]
    ID   ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1 7839    BUSH PRESIDENT   NA 17/11/1981 5000   NA     10
2 7698   BLAIR   MANAGER 7839 01/05/1981 2850   NA     30
6 7499 BAROSSO  SALESMAN 7698 20/02/1981 1600  300     30
8 7900 BUFFETT     CLERK 7698 03/12/1981  950   NA     30

Grep function uses regular expression. A couple of other examples:

Select people with second letter ‘a’.

> emp[grep("^.A",ENAME),]
    ID    ENAME      JOB  MGR   HIREDATE  SAL COMM DEPTNO
6  7499  BAROSSO SALESMAN 7698 20/02/1981 1600  300     30
7  7844    GATES SALESMAN 7698 08/09/1981 1500    0     30
9  7521   WALTON SALESMAN 7698 22/02/1981 1250  500     30
12 7788 CARNEGIE  ANALYST 7566 09/12/1982 3000   NA     20

Saving Results in a variable

Find people working in department 20 and 30, order results by their job.

> empo <- emp[order(JOB),]

The empo variable contains the ordered elements from emp table by JOB.

> empo[empo$DEPTNO %in% c(10,30),]
     ID   ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
8  7900 BUFFETT     CLERK 7698 03/12/1981  950   NA     30
14 7934  ELISON     CLERK 7782 23/01/1982 1300   NA     10
2  7698   BLAIR   MANAGER 7839 01/05/1981 2850   NA     30
3  7782  MERKEL   MANAGER 7839 09/06/1981 2450   NA     10
1  7839    BUSH PRESIDENT   NA 17/11/1981 5000   NA     10
5  7654 CHIRACK  SALESMAN 7698 28/09/1981 1250 1400     30
6  7499 BAROSSO  SALESMAN 7698 20/02/1981 1600  300     30
7  7844   GATES  SALESMAN 7698 08/09/1981 1500    0     30
9  7521  WALTON  SALESMAN 7698 22/02/1981 1250  500     30

Filtering by date ranges

emp[
     HIREDATE >= as.Date("1981-01-01") & HIREDATE <= as.Date("1981-12-31")
     ,c("ENAME","HIREDATE")
   ]
     ENAME   HIREDATE
1     BUSH 1981-11-17
2    BLAIR 1981-05-01
3   MERKEL 1981-06-09
4    PUTIN 1981-04-02
5  CHIRACK 1981-09-28
6  BAROSSO 1981-02-20
7    GATES 1981-09-08
8  BUFFETT 1981-12-03
9   WALTON 1981-02-22
10   TOOSK 1981-12-03

Dealing with NA value

The NA isn’t just zero (0) or an empty string “”. It’s something like a void. Almost every operation with NA returns NA value.

NA*2; NA+2; NA/NA; 12/NA; NA>2; NA==NA;
[1] NA
[1] NA
[1] NA
[1] NA
[1] NA
[1] NA
> data.frame(ENAME,SAL,COMM,SAL+COMM)
      ENAME  SAL COMM SAL...COMM
1      BUSH 5000   NA         NA
2     BLAIR 2850   NA         NA
3    MERKEL 2450   NA         NA
4     PUTIN 2975   NA         NA
5   CHIRACK 1250 1400       2650
6   BAROSSO 1600  300       1900
7     GATES 1500    0       1500
...

Convert the NA values to 0 with ifelse(is.na(COMM),0,COMM) expression

 
 data.frame(
     ENAME,SAL,COMM,
     AnnualSal=
         SAL+ifelse(is.na(COMM),0,COMM)
 )
      ENAME  SAL COMM AnnualSal
1      BUSH 5000   NA      5000
2     BLAIR 2850   NA      2850
3    MERKEL 2450   NA      2450
4     PUTIN 2975   NA      2975
5   CHIRACK 1250 1400      2650
6   BAROSSO 1600  300      1900
7     GATES 1500    0      1500
8   BUFFETT  950   NA       950
...

Select employees with commission more than 300

> emp[COMM > 300,]
       ID   ENAME      JOB  MGR   HIREDATE  SAL COMM DEPTNO
NA     NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.1   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.2   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.3   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
5    7654 CHIRACK SALESMAN 7698 1981-09-28 1250 1400     30
NA.4   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
9    7521  WALTON SALESMAN 7698 1981-02-22 1250  500     30
NA.5   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.6   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.7   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.8   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA
NA.9   NA    <NA>     <NA>   NA       <NA>   NA   NA     NA

Remove all rows with NA values in any column

> na.omit(emp[COMM > 300,])
    ID   ENAME      JOB  MGR   HIREDATE  SAL COMM DEPTNO
5 7654 CHIRACK SALESMAN 7698 1981-09-28 1250 1400     30
9 7521  WALTON SALESMAN 7698 1981-02-22 1250  500     30

Showing Unique Values

levels(emp$JOB) unique(emp$JOB)

Exercises

1. Exercise

  1. Load dept and salgrade tabled into data frames using read.table function from locations below:


2. Exercise

Select all rows from SALGRADE dataframe.

  GRADE LOSAL HISAL
1     1   700  1200
2     2  1201  1400
3     3  1401  2000
4     4  2001  3000
5     5  3001  9999

3. Exercise

Select shown data about employees who earn between 1000 and 2000.

     ENAME DEPTNO  SAL
5  CHIRACK     30 1250
6  BAROSSO     30 1600
7    GATES     30 1500
9   WALTON     30 1250
13    FORD     20 1100
14  ELISON     10 1300

4. Exercise

Select number and name of departments. Order them by department name.

  DEPTNO      DNAME
1     10 ACCOUNTING
4     40 OPERATIONS
2     20   RESEARCH
3     30      SALES

5. Exercise

Select all distinct jobs.

[1] PRESIDENT MANAGER   SALESMAN  CLERK     ANALYST  

6. Exercise

Select the data shown below about employees working in departments 10 and 20. Sort the results by their names in descending order.

      ENAME       JOB DEPTNO
10    TOOSK   ANALYST     20
11 THATCHER     CLERK     20
4     PUTIN   MANAGER     20
3    MERKEL   MANAGER     10
13     FORD     CLERK     20
14   ELISON     CLERK     10
12 CARNEGIE   ANALYST     20
1      BUSH PRESIDENT     10

7. Exercise

Select clerks working in department no 20.

      ENAME   JOB DEPTNO
11 THATCHER CLERK     20
13     FORD CLERK     20

8. Exercise

Select employees who have a boss.

      ENAME  MGR
2     BLAIR 7839
3    MERKEL 7839
4     PUTIN 7839
5   CHIRACK 7698
6   BAROSSO 7698
7     GATES 7698
8   BUFFETT 7698
9    WALTON 7698
10    TOOSK 7566
11 THATCHER 7902
12 CARNEGIE 7566
13     FORD 7788
14   ELISON 7782


9. Exercise

Select manager’s annual remuneration.

        Ename AnnaulSal
1    1  BLAIR     34200
2    2 MERKEL     29400
3    3  PUTIN     35700  

10. Exercise

Select monthly remuneration (salary and commission).

            Ename Remuneration
     1     BUSH         5000
     2    BLAIR         2850
     3   MERKEL         2450
     4    PUTIN         2975
     5  CHIRACK         2650
     6  BAROSSO         1900
     7    GATES         1500
     8  BUFFETT          950
     9   WALTON         1750
    10    TOOSK         3000
    11 THATCHER          800
    12 CARNEGIE         3000
    13     FORD         1100
    14   ELISON         1300


11. Exercise*

Select workers hired in 1982.

      ENAME   HIREDATE
12 CARNEGIE 1982-12-09
14   ELISON 1982-01-23

12. Exercise

Select employees whose commission is greater than their salary.

    ENAME  SAL COMM
5 CHIRACK 1250 1400

13. Exercise

Write a select statement which returns output below (use concatenation).

 [1] "BUSH is PRESIDENT and work in department no 10"  
 [2] "BLAIR is MANAGER and work in department no 30"   
 [3] "MERKEL is MANAGER and work in department no 10"  
 [4] "PUTIN is MANAGER and work in department no 20"   
 [5] "CHIRACK is SALESMAN and work in department no 30"
 [6] "BAROSSO is SALESMAN and work in department no 30"
 [7] "GATES is SALESMAN and work in department no 30"  
 [8] "BUFFETT is CLERK and work in department no 30"   
 [9] "WALTON is SALESMAN and work in department no 30" 
[10] "TOOSK is ANALYST and work in department no 20"   
[11] "THATCHER is CLERK and work in department no 20"  
[12] "CARNEGIE is ANALYST and work in department no 20"
[13] "FORD is CLERK and work in department no 20"      
[14] "ELISON is CLERK and work in department no 10"   

14. Exercise

Find employees with second letter “L” and third “A”.

    ID ENAME     JOB  MGR   HIREDATE  SAL COMM DEPTNO
2 7698 BLAIR MANAGER 7839 1981-05-01 2850   NA     30

15. Exercise

Find employees whose name contains T and ends with N.

[1] "PUTIN"  "WALTON"

16. Exercise

Find employees who either work as managers or work in department no 10, but not both. Hint: use xor() operator

     ID  ENAME       JOB  MGR   HIREDATE  SAL COMM DEPTNO
1  7839   BUSH PRESIDENT   NA 1981-11-17 5000   NA     10
2  7698  BLAIR   MANAGER 7839 1981-05-01 2850   NA     30
4  7566  PUTIN   MANAGER 7839 1981-04-02 2975   NA     20
14 7934 ELISON     CLERK 7782 1982-01-23 1300   NA     10