Lecture 3: Data Analysis

Warning

This page is still building. Please be patient for the coming lectures.

In this lecture, we will learn data analysis as well as presentation in Python including follwoing four packages:

_images/turtles.jpeg

Fig. 15 The “Four Musketeers” for Python data analysis: numPy, matplotlib, pandas and seaborn.

Why you need Python data analysis

Basically, modern office suite like Microsoft Excel provides powerful features to process and present your data. However, they are lacking of flexibility and the features are hidden in their complex UI (user interface). On the other hand, Python gives us more efficient and flexible way to operate data. For example, Microsoft Excel has a maximum row number \(1048576=2^{20}\) in Office 2007 or later while pandas can easily exceed this limitation.

_images/pandas-rows.png

Fig. 16 A table has 100,000,000 rows x 1 columns.

It is also possible to custom a pipeline with help of these packages so you can analyze data automatically. Once you build one, you can just run it every time when you have new data rather than operating data again and again.

Preparation

Warning

This is an advanced lecture so it is recommended to learn lecture 1 and lecture 2 before it if you have no background in Python programming.

Pandas is a data analysis and manipulation tools based on Python. To learn Pandas, the best material is the official cookbook which can be found at the official website. The documentation for pandas is also helpful for your learning.

_images/pandas-docs.png

Fig. 17 A sample entry of pandas docs. DataFrame is the basic data container in pandas.

Before start everything, we need to install Pandas as well as some other packages via pip.

  • numPy: pandas is based on numPy, an efficient numerical computation package written in C++.

  • openpyxl: pandas requires it to read data from excel file.

  • matplotlib: a graphical package to draw figures.

!pip install -q pandas numpy openpyxl matplotlib

Note

  • Here -q means quiet installation which can be ignored if you don not understand.

  • In Jupyter Notebook, command starting with mark ! is NOT native Python code, but a wrapped shell command, i.e. you can run pip install numpy to install numPy in your Anaconda Prompt Shell.

Let us import these two packages with short names:

import numpy as np
import pandas as pd

Data container

Pandas provides two types of container Series and DataFrame which are used to store sequential and tabular data respectively.

_images/series.png

Fig. 18 Series storing sequential data.

_images/dataframe.png

Fig. 19 DataFrame storing tabular data.

Run following code to create two data containers.

s = pd.Series([1,2,3,4,5,6])
d = pd.DataFrame(np.arange(1, 19).reshape(3, -1))

print(s)
print(d)
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
    0   1   2   3   4   5
0   1   2   3   4   5   6
1   7   8   9  10  11  12
2  13  14  15  16  17  18

Note

Here np.arange(1, 19) will generate a arrary from \(1\) to \(18\), and reshape(3, -1) will change the shape of arrary with first dimension as \(3\). Parameter -1 means it will be derivated internally and here will be replaced with \(6\) because \(6=18\div 3\).

shape is a property of container and it is a tuple of dimensions.

print(s.shape)
print(d.shape)
(6,)
(3, 6)

For very large DataFrame, Jupyter notebook will only show a part of the content:

d = pd.DataFrame(np.random.randn(100, 30))
d
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 -0.343684 0.298432 -0.346330 -1.360610 -0.780171 -0.174479 -2.073209 0.741540 0.862682 1.790920 ... 1.427133 -0.330105 -0.833102 -1.314724 0.896119 -0.738813 1.374135 0.440371 1.041022 -0.121798
1 -0.193280 1.257865 0.658521 -0.556523 0.867679 1.392618 1.358895 1.235232 0.598424 0.407585 ... 0.480264 0.024192 0.416871 0.503437 -1.003764 -0.444086 -1.318141 -0.002651 0.091065 0.267475
2 -0.091231 1.268993 0.825904 -0.386833 0.215122 0.575687 -1.589176 1.526072 -1.083706 0.360741 ... -0.404119 0.178404 -0.065295 0.571753 -1.697193 1.665987 -0.487148 0.983719 1.474843 1.010711
3 -1.040651 0.876857 0.090111 -0.774266 0.412296 0.903602 0.168427 -0.690537 0.620906 0.483258 ... -0.225915 0.139047 0.568339 -0.726861 -0.061449 1.187309 1.819904 0.746932 0.246057 -0.622705
4 -0.890623 -1.924294 -1.607911 -0.919474 -0.756320 -0.130709 0.181522 0.337458 0.750356 -0.552621 ... -1.348002 0.687834 -0.597506 -0.185708 0.681879 -1.264852 -1.432979 -0.262975 0.340943 1.115939
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 -0.790644 0.618316 -0.239128 0.223507 -1.972358 1.344895 -0.921793 1.239987 -0.727588 2.143194 ... 0.173822 -0.209904 0.167663 1.023742 -1.487167 -0.950625 -0.448279 -0.726560 1.566739 -2.048781
96 -0.897666 -1.011664 -1.305373 0.292486 0.033395 1.184037 -0.224618 -0.603053 0.583313 1.238475 ... -1.162605 1.927186 0.189962 -0.048620 -0.499449 -1.396445 -0.187596 -0.866826 0.488415 -1.043417
97 -0.386214 -0.737502 -0.724941 2.207187 1.191978 -0.849805 1.267080 0.092453 0.524919 -0.903533 ... -1.127624 0.242240 0.553355 -1.119842 0.132852 -1.981549 1.617266 -1.088464 -1.790250 0.687706
98 0.256366 -1.496072 -0.901937 -0.612122 -0.097003 -0.011705 0.593026 -0.146939 -0.992527 -0.885829 ... 1.609611 1.205030 -0.915260 0.181878 0.419530 2.393224 0.259172 0.533195 0.432964 0.874746
99 -0.000901 -1.145946 0.295250 -0.390829 1.592131 0.339830 1.496009 -0.121397 0.382855 -0.491859 ... -1.915573 -0.667241 1.049157 0.382023 -0.689314 0.989662 0.171298 -0.197036 -0.235648 0.647312

100 rows × 30 columns

Note

Here np.random.randn(n, m) will generate a random matrix with \(n\) rows and \(m\) columns. Each element in the matrix follows the standard normal distribution \( X\sim \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}\)

Try head(n) and it will give you the first \(n\) rows of the DataFrame while tail(n) will give the last \(n\) rows.

d.head(5)
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 -0.343684 0.298432 -0.346330 -1.360610 -0.780171 -0.174479 -2.073209 0.741540 0.862682 1.790920 ... 1.427133 -0.330105 -0.833102 -1.314724 0.896119 -0.738813 1.374135 0.440371 1.041022 -0.121798
1 -0.193280 1.257865 0.658521 -0.556523 0.867679 1.392618 1.358895 1.235232 0.598424 0.407585 ... 0.480264 0.024192 0.416871 0.503437 -1.003764 -0.444086 -1.318141 -0.002651 0.091065 0.267475
2 -0.091231 1.268993 0.825904 -0.386833 0.215122 0.575687 -1.589176 1.526072 -1.083706 0.360741 ... -0.404119 0.178404 -0.065295 0.571753 -1.697193 1.665987 -0.487148 0.983719 1.474843 1.010711
3 -1.040651 0.876857 0.090111 -0.774266 0.412296 0.903602 0.168427 -0.690537 0.620906 0.483258 ... -0.225915 0.139047 0.568339 -0.726861 -0.061449 1.187309 1.819904 0.746932 0.246057 -0.622705
4 -0.890623 -1.924294 -1.607911 -0.919474 -0.756320 -0.130709 0.181522 0.337458 0.750356 -0.552621 ... -1.348002 0.687834 -0.597506 -0.185708 0.681879 -1.264852 -1.432979 -0.262975 0.340943 1.115939

5 rows × 30 columns

In order to make our examples meaningful, we employ a real-world dataset as an example to show how pandas works. World Health Organization provides latest data about the COVID-19 such as vaccination and confirmed cases, which is keeped in CSV-format files. Here we refer to the vaccination data: https://covid19.who.int/who-data/vaccination-data.csv.

vac = pd.read_csv('https://covid19.who.int/who-data/vaccination-data.csv')
vac
COUNTRY ISO3 WHO_REGION DATA_SOURCE DATE_UPDATED TOTAL_VACCINATIONS PERSONS_VACCINATED_1PLUS_DOSE TOTAL_VACCINATIONS_PER100 PERSONS_VACCINATED_1PLUS_DOSE_PER100 PERSONS_FULLY_VACCINATED PERSONS_FULLY_VACCINATED_PER100 VACCINES_USED FIRST_VACCINE_DATE NUMBER_VACCINES_TYPES_USED
0 Afghanistan AFG EMRO REPORTING 2022-03-06 5597130 4952744.0 14.378 12.723 4281934.0 11.000 Beijing CNBG - BBIBP-CorV,Janssen - Ad26.COV 2... 2021-02-22 4.0
1 Albania ALB EURO REPORTING 2022-02-20 2707658 1284034.0 94.100 45.118 1192155.0 41.889 AstraZeneca - Vaxzevria,Gamaleya - Gam-Covid-V... 2021-01-13 5.0
2 Algeria DZA AFRO REPORTING 2022-02-20 13631683 7456361.0 31.086 17.004 6076272.0 13.857 Beijing CNBG - BBIBP-CorV,Gamaleya - Gam-Covid... 2021-01-30 4.0
3 American Samoa ASM WPRO REPORTING 2022-02-16 85050 42212.0 154.084 76.475 37805.0 68.491 Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... 2020-12-21 3.0
4 Andorra AND EURO REPORTING 2022-02-13 142420 57797.0 184.300 75.872 53250.0 69.903 AstraZeneca - Vaxzevria,Moderna - Spikevax,Pfi... 2021-01-20 3.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
223 Viet Nam VNM WPRO REPORTING 2022-02-24 192865986 79469390.0 198.139 81.642 75322542.0 77.382 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-03-08 6.0
224 Wallis and Futuna WLF WPRO REPORTING 2022-02-18 15633 6450.0 139.009 57.354 6399.0 56.900 Moderna - Spikevax 2021-03-19 1.0
225 Yemen YEM EMRO REPORTING 2022-03-06 784792 624837.0 2.631 2.095 384655.0 1.290 Janssen - Ad26.COV 2-S,SII - Covishield,Sinova... 2021-04-20 3.0
226 Zambia ZMB AFRO REPORTING 2022-03-03 2858338 2510296.0 15.548 13.655 1883799.0 10.247 Beijing CNBG - BBIBP-CorV,Janssen - Ad26.COV 2... 2021-04-14 3.0
227 Zimbabwe ZWE AFRO REPORTING 2022-03-04 7904719 4368633.0 53.184 29.393 3403501.0 22.899 Beijing CNBG - BBIBP-CorV,Bharat - Covaxin,Gam... 2021-02-18 4.0

228 rows × 14 columns

Note

read_csv(path) reads a CSV-format file from URL (Web link or local file) and returns a DataFrame. For example, you can use pd.read_csv('./data.csv') to load csv file from your current directory.

Properties index and columns will give the row/column labels:

vac.columns
Index(['COUNTRY', 'ISO3', 'WHO_REGION', 'DATA_SOURCE', 'DATE_UPDATED',
       'TOTAL_VACCINATIONS', 'PERSONS_VACCINATED_1PLUS_DOSE',
       'TOTAL_VACCINATIONS_PER100', 'PERSONS_VACCINATED_1PLUS_DOSE_PER100',
       'PERSONS_FULLY_VACCINATED', 'PERSONS_FULLY_VACCINATED_PER100',
       'VACCINES_USED', 'FIRST_VACCINE_DATE', 'NUMBER_VACCINES_TYPES_USED'],
      dtype='object')

In addition, you may want to sort by values, just like what we usually do in Excel.

vac.sort_values(by='TOTAL_VACCINATIONS', ascending=False)
COUNTRY ISO3 WHO_REGION DATA_SOURCE DATE_UPDATED TOTAL_VACCINATIONS PERSONS_VACCINATED_1PLUS_DOSE TOTAL_VACCINATIONS_PER100 PERSONS_VACCINATED_1PLUS_DOSE_PER100 PERSONS_FULLY_VACCINATED PERSONS_FULLY_VACCINATED_PER100 VACCINES_USED FIRST_VACCINE_DATE NUMBER_VACCINES_TYPES_USED
43 China CHN WPRO REPORTING 2022-02-27 3138003103 1.276556e+09 213.283 86.765 1.235633e+09 83.983 Anhui ZL - Recombinant,Beijing CNBG - BBIBP-Co... 2020-07-22 8.0
94 India IND SEARO REPORTING 2022-02-28 1776718549 9.642345e+08 128.747 69.872 7.923996e+08 57.420 Bharat - Covaxin,Gamaleya - Gam-Covid-Vac,Jans... 2021-01-16 6.0
218 United States of America USA AMRO REPORTING 2022-02-25 537567013 2.531571e+08 162.406 76.482 2.101726e+08 63.496 Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... 2020-12-14 3.0
29 Brazil BRA AMRO REPORTING 2022-02-25 369527744 1.723341e+08 173.847 81.076 1.487775e+08 69.993 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2021-01-17 5.0
95 Indonesia IDN SEARO REPORTING 2022-03-01 345697245 1.909768e+08 126.387 69.821 1.445058e+08 52.831 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-01-13 7.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
172 Saba XCA AMRO REPORTING 2022-02-25 3148 1.582000e+03 162.856 81.842 1.566000e+03 81.014 NaN NaN NaN
186 Sint Eustatius XBA AMRO REPORTING 2022-02-25 3110 1.588000e+03 99.076 50.589 1.522000e+03 48.487 NaN NaN NaN
149 Niue NIU WPRO REPORTING 2021-11-24 2628 1.340000e+03 162.423 82.818 1.288000e+03 79.604 Pfizer BioNTech - Comirnaty 2021-06-08 1.0
206 Tokelau TKL WPRO REPORTING 2021-09-28 1936 9.680000e+02 143.407 71.704 9.680000e+02 71.704 Pfizer BioNTech - Comirnaty 2021-07-20 1.0
162 Pitcairn Islands PCN WPRO REPORTING 2021-11-11 74 3.700000e+01 148.000 74.000 3.700000e+01 74.000 AstraZeneca - Vaxzevria 2021-05-17 1.0

228 rows × 14 columns

T property will return a transpose of DataFrame.

vac.T
0 1 2 3 4 5 6 7 8 9 ... 218 219 220 221 222 223 224 225 226 227
COUNTRY Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla Antigua and Barbuda Argentina Armenia ... United States of America Uruguay Uzbekistan Vanuatu Venezuela (Bolivarian Republic of) Viet Nam Wallis and Futuna Yemen Zambia Zimbabwe
ISO3 AFG ALB DZA ASM AND AGO AIA ATG ARG ARM ... USA URY UZB VUT VEN VNM WLF YEM ZMB ZWE
WHO_REGION EMRO EURO AFRO WPRO EURO AFRO AMRO AMRO AMRO EURO ... AMRO AMRO EURO WPRO AMRO WPRO WPRO EMRO AFRO AFRO
DATA_SOURCE REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING ... REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING REPORTING
DATE_UPDATED 2022-03-06 2022-02-20 2022-02-20 2022-02-16 2022-02-13 2022-03-02 2022-02-25 2022-02-25 2022-02-25 2022-02-13 ... 2022-02-25 2022-02-25 2022-02-20 2022-02-20 2022-02-25 2022-02-24 2022-02-18 2022-03-06 2022-03-03 2022-03-04
TOTAL_VACCINATIONS 5597130 2707658 13631683 85050 142420 16633167 22165 124726 93008081 1971565 ... 537567013 7612199 42121870 197335 37860994 192865986 15633 784792 2858338 7904719
PERSONS_VACCINATED_1PLUS_DOSE 4952744.0 1284034.0 7456361.0 42212.0 57797.0 10780927.0 10370.0 63582.0 40572052.0 1080404.0 ... 253157102.0 2964083.0 18446210.0 114284.0 22157232.0 79469390.0 6450.0 624837.0 2510296.0 4368633.0
TOTAL_VACCINATIONS_PER100 14.378 94.1 31.086 154.084 184.3 50.609 147.747 127.364 205.789 66.5 ... 162.406 219.136 125.9 64.248 133.145 198.139 139.009 2.631 15.548 53.184
PERSONS_VACCINATED_1PLUS_DOSE_PER100 12.723 45.118 17.004 76.475 75.872 32.802 69.124 64.927 89.77 36.46 ... 76.482 85.329 55.114 37.208 77.92 81.642 57.354 2.095 13.655 29.393
PERSONS_FULLY_VACCINATED 4281934.0 1192155.0 6076272.0 37805.0 53250.0 5609044.0 9531.0 61144.0 36237826.0 872336.0 ... 210172612.0 2728709.0 13160671.0 85180.0 14287370.0 75322542.0 6399.0 384655.0 1883799.0 3403501.0
PERSONS_FULLY_VACCINATED_PER100 11.0 41.889 13.857 68.491 69.903 17.066 63.532 62.437 80.18 29.439 ... 63.496 78.553 39.322 27.733 50.244 77.382 56.9 1.29 10.247 22.899
VACCINES_USED Beijing CNBG - BBIBP-CorV,Janssen - Ad26.COV 2... AstraZeneca - Vaxzevria,Gamaleya - Gam-Covid-V... Beijing CNBG - BBIBP-CorV,Gamaleya - Gam-Covid... Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... AstraZeneca - Vaxzevria,Moderna - Spikevax,Pfi... SII - Covishield AstraZeneca - Vaxzevria,Pfizer BioNTech - Comi... AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... ... Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... AstraZeneca - Vaxzevria,Pfizer BioNTech - Comi... Anhui ZL - Recombinant,Gamaleya - Gam-Covid-Va... AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... Beijing CNBG - BBIBP-CorV,CIGB - CIGB-66,Finla... AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... Moderna - Spikevax Janssen - Ad26.COV 2-S,SII - Covishield,Sinova... Beijing CNBG - BBIBP-CorV,Janssen - Ad26.COV 2... Beijing CNBG - BBIBP-CorV,Bharat - Covaxin,Gam...
FIRST_VACCINE_DATE 2021-02-22 2021-01-13 2021-01-30 2020-12-21 2021-01-20 2021-03-10 2021-02-05 2021-03-01 2020-12-29 2021-06-16 ... 2020-12-14 2021-02-27 2021-04-28 2021-06-02 2021-02-22 2021-03-08 2021-03-19 2021-04-20 2021-04-14 2021-02-18
NUMBER_VACCINES_TYPES_USED 4.0 5.0 4.0 3.0 3.0 1.0 2.0 6.0 7.0 5.0 ... 3.0 3.0 7.0 3.0 6.0 6.0 1.0 3.0 3.0 4.0

14 rows × 228 columns

You can use describe() function to have a quick look on DataFrame:

vac.describe()
TOTAL_VACCINATIONS PERSONS_VACCINATED_1PLUS_DOSE TOTAL_VACCINATIONS_PER100 PERSONS_VACCINATED_1PLUS_DOSE_PER100 PERSONS_FULLY_VACCINATED PERSONS_FULLY_VACCINATED_PER100 NUMBER_VACCINES_TYPES_USED
count 2.280000e+02 2.270000e+02 228.000000 227.000000 2.270000e+02 227.000000 225.000000
mean 4.694756e+07 2.187060e+07 122.156500 55.765784 1.928435e+07 50.121868 3.928889
std 2.442809e+08 1.090964e+08 71.896713 27.044030 9.958222e+07 26.417687 1.928247
min 7.400000e+01 3.700000e+01 0.097000 0.082000 3.700000e+01 0.077000 1.000000
25% 3.534380e+05 1.733685e+05 62.841000 34.350500 1.458095e+05 29.097500 2.000000
50% 2.721102e+06 1.536112e+06 127.515000 63.069000 1.192155e+06 53.974000 4.000000
75% 1.670141e+07 7.843871e+06 178.022250 77.008000 6.861035e+06 72.275500 5.000000
max 3.138003e+09 1.276556e+09 335.811000 124.567000 1.235633e+09 121.448000 11.000000

A better way to present data is using plot() function to draw all columns:

import matplotlib.pyplot as plt

vac = vac.sort_values(by='TOTAL_VACCINATIONS', ascending=False)
top20 = vac[['TOTAL_VACCINATIONS']][:20]
countries = vac['COUNTRY'][:20]
top20.index = countries

top20.plot.pie(y='TOTAL_VACCINATIONS', figsize=(15,15))
<AxesSubplot:ylabel='TOTAL_VACCINATIONS'>
_images/lec-3_23_1.png

Tip

plot() is a convinient way to visualize data in pandas, but it is not perfect. We will learn more about the data visualization in following lectures.

Select

Pandas provides different ways to select elements in the DataFrame.

iloc

iloc[n] will give the \(n-1\) rows of the DataFrame, i.e., a Series.

vac.iloc[0]
COUNTRY                                                                             China
ISO3                                                                                  CHN
WHO_REGION                                                                           WPRO
DATA_SOURCE                                                                     REPORTING
DATE_UPDATED                                                                   2022-02-27
TOTAL_VACCINATIONS                                                             3138003103
PERSONS_VACCINATED_1PLUS_DOSE                                                1276556393.0
TOTAL_VACCINATIONS_PER100                                                         213.283
PERSONS_VACCINATED_1PLUS_DOSE_PER100                                               86.765
PERSONS_FULLY_VACCINATED                                                     1235633241.0
PERSONS_FULLY_VACCINATED_PER100                                                    83.983
VACCINES_USED                           Anhui ZL - Recombinant,Beijing CNBG - BBIBP-Co...
FIRST_VACCINE_DATE                                                             2020-07-22
NUMBER_VACCINES_TYPES_USED                                                            8.0
Name: 43, dtype: object

Tip

iloc index starts from \(0\), instead of \(1\). So the \(n\)-rows has a index \(n-1\).

In order to select multiple rows, just feed a list containing indices to the iloc:

vac.iloc[[0, 1, 2]]
COUNTRY ISO3 WHO_REGION DATA_SOURCE DATE_UPDATED TOTAL_VACCINATIONS PERSONS_VACCINATED_1PLUS_DOSE TOTAL_VACCINATIONS_PER100 PERSONS_VACCINATED_1PLUS_DOSE_PER100 PERSONS_FULLY_VACCINATED PERSONS_FULLY_VACCINATED_PER100 VACCINES_USED FIRST_VACCINE_DATE NUMBER_VACCINES_TYPES_USED
43 China CHN WPRO REPORTING 2022-02-27 3138003103 1.276556e+09 213.283 86.765 1.235633e+09 83.983 Anhui ZL - Recombinant,Beijing CNBG - BBIBP-Co... 2020-07-22 8.0
94 India IND SEARO REPORTING 2022-02-28 1776718549 9.642345e+08 128.747 69.872 7.923996e+08 57.420 Bharat - Covaxin,Gamaleya - Gam-Covid-Vac,Jans... 2021-01-16 6.0
218 United States of America USA AMRO REPORTING 2022-02-25 537567013 2.531571e+08 162.406 76.482 2.101726e+08 63.496 Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... 2020-12-14 3.0

Also, you can select both rows and columns at the same time:

vac.iloc[[0, 1, 2], [0, 4, 5]]
COUNTRY DATE_UPDATED TOTAL_VACCINATIONS
43 China 2022-02-27 3138003103
94 India 2022-02-28 1776718549
218 United States of America 2022-02-25 537567013

Or just access single element:

vac.iloc[0, 4]
'2022-02-27'

See also

You can also use slice operations to access elements, try vac.iloc[0:2, 0:3]

loc

As you can see, iloc may be confused to read because it uses numbers as index. However, loc provides another way to access by index and columns in DataFrame.

vac.loc[0:3, ["COUNTRY", "DATE_UPDATED", "TOTAL_VACCINATIONS"]]
COUNTRY DATE_UPDATED TOTAL_VACCINATIONS
0 Afghanistan 2022-03-06 5597130
134 Mongolia 2022-02-25 5568712
114 Lebanon 2022-02-28 5452304
50 Croatia 2022-02-20 5179921
87 Guinea 2022-03-03 5159979
... ... ... ...
151 Northern Mariana Islands (Commonwealth of the) 2022-02-04 105295
68 Faroe Islands 2022-02-18 103894
132 Micronesia (Federated States of) 2022-02-16 97185
82 Grenada 2022-02-25 85891
3 American Samoa 2022-02-16 85050

103 rows × 3 columns

Important

Becareful with the index names in iloc and loc. They are different.

Filter

Both loc and iloc accept boolean parameters, which can be used to filter particular elements in the DataFrame.

vac[vac['TOTAL_VACCINATIONS'] > 100000000]
COUNTRY ISO3 WHO_REGION DATA_SOURCE DATE_UPDATED TOTAL_VACCINATIONS PERSONS_VACCINATED_1PLUS_DOSE TOTAL_VACCINATIONS_PER100 PERSONS_VACCINATED_1PLUS_DOSE_PER100 PERSONS_FULLY_VACCINATED PERSONS_FULLY_VACCINATED_PER100 VACCINES_USED FIRST_VACCINE_DATE NUMBER_VACCINES_TYPES_USED
43 China CHN WPRO REPORTING 2022-02-27 3138003103 1.276556e+09 213.283 86.765 1.235633e+09 83.983 Anhui ZL - Recombinant,Beijing CNBG - BBIBP-Co... 2020-07-22 8.0
94 India IND SEARO REPORTING 2022-02-28 1776718549 9.642345e+08 128.747 69.872 7.923996e+08 57.420 Bharat - Covaxin,Gamaleya - Gam-Covid-Vac,Jans... 2021-01-16 6.0
218 United States of America USA AMRO REPORTING 2022-02-25 537567013 2.531571e+08 162.406 76.482 2.101726e+08 63.496 Janssen - Ad26.COV 2-S,Moderna - Spikevax,Pfiz... 2020-12-14 3.0
29 Brazil BRA AMRO REPORTING 2022-02-25 369527744 1.723341e+08 173.847 81.076 1.487775e+08 69.993 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2021-01-17 5.0
95 Indonesia IDN SEARO REPORTING 2022-03-01 345697245 1.909768e+08 126.387 69.821 1.445058e+08 52.831 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-01-13 7.0
103 Japan JPN WPRO REPORTING 2022-02-24 223820819 1.016722e+08 176.966 80.388 1.001786e+08 79.207 AstraZeneca - Vaxzevria,Moderna - Spikevax,Nov... 2021-02-17 4.0
155 Pakistan PAK EMRO REPORTING 2022-02-28 213532343 1.243097e+08 96.668 56.276 1.000965e+08 45.315 Beijing CNBG - BBIBP-CorV,Bharat - Covaxin,Can... 2021-02-03 8.0
16 Bangladesh BGD SEARO REPORTING 2022-03-01 213176935 1.246919e+08 129.442 75.713 8.461175e+07 51.377 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-01-27 8.0
223 Viet Nam VNM WPRO REPORTING 2022-02-24 192865986 7.946939e+07 198.139 81.642 7.532254e+07 77.382 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-03-08 6.0
131 Mexico MEX AMRO REPORTING 2022-02-25 179274307 8.508365e+07 139.045 65.991 7.875005e+07 61.078 AstraZeneca - Vaxzevria,CanSino - Convidecia,G... 2020-12-24 7.0
77 Germany DEU EURO REPORTING 2022-02-20 169071638 6.338051e+07 203.300 76.209 6.250088e+07 75.151 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2020-12-23 4.0
170 Russian Federation RUS EURO OWID 2022-02-17 160816996 7.828428e+07 110.200 53.643 7.160191e+07 49.064 SRCVB - EpiVacCorona, Gamaleya - Sputnik V NaN 2.0
71 France FRA EURO REPORTING 2022-02-20 153013512 5.470059e+07 227.300 81.254 5.368728e+07 79.749 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2020-12-30 4.0
210 Turkey TUR EURO OWID 2022-02-23 144850157 5.762345e+07 171.700 69.296 5.271558e+07 63.394 Pfizer BioNTech - Comirnaty, Sinovac - CoronaV... NaN 3.0
96 Iran (Islamic Republic of) IRN EMRO REPORTING 2022-03-06 142195819 6.311054e+07 169.295 75.138 5.571737e+07 66.336 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-02-09 6.0
203 The United Kingdom GBR EURO REPORTING 2022-02-22 139482283 5.257359e+07 205.500 77.246 4.888589e+07 71.828 AstraZeneca - Vaxzevria,Moderna - Spikevax,Pfi... 2020-12-21 3.0
161 Philippines PHL WPRO REPORTING 2022-02-24 135251295 6.866160e+07 123.426 62.658 6.301683e+07 57.507 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-03-01 11.0
101 Italy ITA EURO REPORTING 2022-02-20 131126289 5.019838e+07 219.900 84.171 4.679558e+07 78.461 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2020-12-23 4.0
202 Thailand THA SEARO REPORTING 2022-02-28 123738218 5.358496e+07 177.275 76.769 4.967325e+07 71.165 AstraZeneca - Vaxzevria,Beijing CNBG - BBIBP-C... 2021-02-28 7.0
167 Republic of Korea KOR WPRO REPORTING 2022-02-27 118895241 4.477671e+07 231.904 87.336 4.428542e+07 86.378 AstraZeneca - Vaxzevria,Janssen - Ad26.COV 2-S... 2021-02-26 5.0

Modify