Introducing il.cbs.muni - An R Package for Handling Israeli CBS Municipal Data

I’m excited to announce that il.cbs.muni 0.1.0 is now available on CRAN! This package provides analyst-oriented utility functions to work with Israeli Central Bureau of Statistics (CBS) municipal data, handling the different quirks and inconsistencies across years and data sources.

You can install it from CRAN:

install.packages("il.cbs.muni")

What Does il.cbs.muni Do?

The Israeli CBS publishes extensive municipal data covering cities, local councils, and regional councils. However, this data comes with challenges:

Inconsistent formats across different years
Different Excel structures for various data domains
Multiple ID systems used by different government agencies
Separate sheets for different municipality types (pre-2016)
Complex headers that need careful processing

il.cbs.muni solves these problems by providing a consistent interface to read, process, and combine Israeli municipal data from different sources and years.

Key Features

1. Read Municipal Data Files

Israeli CBS provides municipal data in Excel files with different header structures between the years. Trying to combine and load multiple years requires identifying the quirks for each year making it harder to read them.

Let’s take a look at the plain old municipal Excel data file, specifically from 2021:

We can observe multiple problems:

The headers start from the 3rd row, not the first;
The headers span across multiple rows, making it harder to read them;

If we toggle a bit along the columns, we can see another problem:

Some column headers need to be filled horizontally, and merged with the cell below them.

The read_cbs_muni() function reads municipal data files with pre-configured parameters for each year’s specific quirks:

library(il.cbs.muni)
library(dplyr)

# Read physical and demographic data for 2021
df1 <- read_cbs_muni(
  path = "p_libud_2021.xlsx",
  year = 2021,
  data_domain = "physical"
)

df1

Supported data domains include:

"physical" - Physical and population data
"budget" - Municipal budget data
"summary" - Summary statistics
"labor_force_survey" - Labor force data
"social_survey" - Social survey data

Now, let’s say you want budget data from the same year. Perhaps you would like to know the municipal expenditure on education. This is on a different data domain (i.e., worksheet). We could access it directly, and select the relevant columns with the framework:

df2 <- read_cbs_muni(
  "p_libud_2021.xlsx",
  2021,
  data_domain = "budget",
  cols = c(2, 50)
)

df2

Now, if we want to join the two data frames, we can simply join by municipal ID, either by using its Hebrew variable name or by possibly renaming it:

df <- df1 |> 
  left_join(df2, join_by(`כללי_סמל הרשות` == `סמל הרשות`)) |> 
  relocate(contains("שירותים ממלכתיים: חינוך"))

df

Be aware that in order to avoid wrong assumptions, all data types are configured to be character, and need to be parsed to your liking.

2. Combine Data from Different Municipality Types

For years 2003-2015, CBS published data on separate sheets for cities/local councils and regional councils.

Use combine_cbs_muni() to merge them:

# Combine data from both sheets
df_combined <- combine_cbs_muni(
  path = "2009.xls",
  year = 2009,
  cols_city = c(1:7, 11),
  cols_rc = c(1:7, 25),
  data_domain = "physical"
)

df_combined

Please notice that you have to select the specific columns and match them by yourself. Unfortunately, harmonizing the entire dataset is too big of a task to handle, and would hopefully be achieved by widening the scope of the CBS API.

How does il.cbs.muni know how many rows to skip, which columns to fill row-wise, and what are the sheet numbers for each data domain within each year? I have taken the time to parameterize all of this data, so il.cbs.muni becomes a one-stop-shop for your data reading tasks. If you want to look under the hood, you can check out il.cbs.muni:::df_cbs_muni_params. Additionally, I wrote row_to_names_fill() as a helper function to fill row-wise some of the header rows, but it can be used also outside the scope of municipal data.

Going back to our data, you can see that the 10th row municipal name is littered with an asterisk (“*“). We can clean all names with clean_name():

df_combined[[10,1]]

[1] "באקה-ג'ת*"

df_combined <- df_combined |> 
  mutate(
    `שם הרשות` = clean_name(`שם הרשות`)
  )

df_combined[[10,1]]

[1] "באקה-ג'ת"

3. Read Yishuv (Settlement) Data

A yishuv is a geographically defined place where people live. Use read_cbs_yishuv() to read settlement-level data:

# Read yishuv data with selected columns
df_yishuv <- read_cbs_yishuv(
  path = "bycode2021.xlsx",
  cols = c(1, 2, 5, 13)
)

df_yishuv

As you can see, the yishuv ID isn’t following the convention of a 4-digit ID. This might cause collisions with regional councils when we will use the yishuv ID for municipal level analysis, so we should change it with pad_yishuv_id():

df_yishuv <- df_yishuv |> 
  mutate(
    `סמל יישוב` = pad_yishuv_id(`סמל יישוב`)
  )

  df_yishuv

4. Read CBS Index Data

The package handles both socio-economic status (SES) and peripherality indices at different geographic levels:

# Read socio-economic index at municipality level
df_ses <- read_cbs_index(
  path = "T02.xlsx",
  year = 2019,
  index_type = "ses",
  unit_type = "muni"
)

df_ses

Available options:

Index types: "ses" (socio-economic status) or "peri" (peripherality)
Unit types: "muni" (municipality), "yishuv" (settlement), or "sa" (statistical area)

As in read_cbs_muni(), here too we have parameters stored in il.cbs.muni:::df_cbs_index_params.

Now, we have two problems with this data set:

The municipal column is for “status” rather than for ID;
The yishuv ID column does not follow the 4-digit convention.

We can fix these issues with modify_muni_id() and pad_yishuv_id():

df_ses <- df_ses |> 
  mutate(
    muni_id = modify_muni_id(`מעמד מוניציפלי_MUNICIPAL STATUS`, `סמל יישוב_CODE OF LOCALITY`),
    yishuv_id = pad_yishuv_id(`סמל יישוב_CODE OF LOCALITY`),
    .before = 1
  )

  df_ses

5. Harmonize Municipal IDs

Israeli municipalities have different IDs across government agencies. The read_muni_id() function provides a lookup table:

# Get IDs and names from multiple agencies
muni_ids <- read_muni_id(
  id_types = c("muni", "edu", "tax"),
  include_names = TRUE
)

muni_ids

This returns:

"muni" - CBS municipal IDs and names
"edu" - Ministry of Education symbols
"tax" - Israel Tax Authority IDs (H.P. numbers)

You can use this table when you are working with data from The Ministry of Education or non-profits data from Guidestar.

Resources

Documentation: https://matanhakim.github.io/il.cbs.muni/
Source code: https://github.com/matanhakim/il.cbs.muni
Bug reports: https://github.com/matanhakim/il.cbs.muni/issues
CRAN page: https://cran.r-project.org/package=il.cbs.muni

Future Plans

I’m hoping to add:

More helper functions for common municipal data transformations
Support for additional CBS data sources
Vignettes with detailed workflows
A hex sticker!
Anything that the Israeli open source community might be interested in

Acknowledgments

This package builds on the excellent work of the Israeli CBS in publishing municipal data. Special thanks to the tidyverse team for creating the tools that make this package possible. Specifically, this package was able to get submitted to CRAN with full tests and input validation thanks to Positron Assistant, which has been a huge blessing.

Happy analyzing! 📊