1. Introduction
Humanitarian needs always grossly outweigh available funding; however, it remains an industry-wide challenge to respond adequately to gaps in coverage and reallocate resources accordingly. Too often, once committed to a course of action, clusters and their humanitarian partners do not re-examine or re-evaluate their interventions. This results in responses with glaring gaps that are either not resolved in a timely manner or go completely unaddressed.
This automated reported is intended to serve as a template for a coverage and gaps analysis. It provides examples of the analyses necessary to identify populations in need who are not served by humanitarian action and provide recommendations on how partners may best reach them. This document assumes that basic reporting has already occurred; and a more generic 5W report template may be found here.
Coverage and gaps analyses are key documents, but are also rarely taken into account during operational planning or referenced during revisions of major strategic documents, such as Humanitarian Response Plans (HRPs). Neither are they mentioned in OCHA’s HRP guidance, and their usage remains quite uncommon. This document intends to show that coverage and gaps analyses are not complex, impenetrable tools; rather, they contain, and are principally concerned with, practical, actionable information.
A note on the data
- Most of the data originates from the Education, Health, Nutrition, Protection and WASH Clusters, from May to October 2019 – and any conclusions or analysis are bounded by this time period and are illustrative of the response as it was in November 2019. Partner data has been anonymised. Other data originate from the census dataset of Venezuela that was maintained by UNICEF. Unlike the document of 5W reporting and cleaning, we will not be exploring the cleaning process. But the source code of each chunk will be displayed when the
Code
button is clicked. The two chunks on the right contain code for data cleaning and preparation.
# reading and cleaning -- you really should break it into parts
ven1 <- read_csv("consolidation 191209 1636.csv") %>%
clean_names() %>%
# removing unused columns
select(-c(codigodeestablecimientoocentro, loc_id, hrp_sitre_p_indicator,
tipoderespuesta, comentarios, coordeadas_gps_x, coordeadas_gps_y,
fechade_inicio, fecha_previstade_finalizacion)) %>%
# renaming unwieldy columns
rename(ubicacion = comunidadonombredelestablecimiento_centro,
sector = sector_areade_responsabiliad,
beneficiarios_meta = beneficiarios_meta_numerodepersonas,
estatus = estatusdeprogramacion) %>%
# mutating the date to the right format
mutate(month = as.factor(recode(month,
`4` = "30/04/2019",
`5` = "31/05/2019",
`6` = "30/06/2019",
`7` = "31/07/2019",
`8` = "31/08/2019",
`9` = "30/09/2019",
`10` = "31/10/2019"))) %>%
mutate(month = as.Date(month %>% strptime(., format = "%d/%m/%Y"))) %>%
mutate(org_lider = coalesce(org_lider, org_implementadora)) %>%
# correcting sector names
mutate(sector = str_replace_all(sector, c(
"Agua_saneamiento_higiene" = "WASH",
"educacion" = "Educacion",
"Nutricion" = "Nutricion",
"protección_Niños_Niñas_Adolescentes" = "Proteccion_NNA",
"Protección_Niños_Niñas_Adolescentes" = "Proteccion_NNA",
"Protección_Violencia_Género" = "Proteccion_GBV"))) %>%
# renaming beneficiary disaggregation columns
rename(f_0_18 = f_18,
m_0_18 = m_18,
f_18plus = f_18_2,
m_18plus = m_18_2) %>%
mutate(estado = rm_accent(str_to_upper(estado)),
municipio = rm_accent(str_to_upper(municipio)),
parroquia = rm_accent(str_to_upper(parroquia)),
ubicacion = rm_accent(str_to_upper(ubicacion)),
actividad = rm_accent(str_to_upper(actividad)),
categoria = rm_accent(str_to_upper(categoriadeactividad))) %>%
# recoding the estatus column
mutate(estatus = str_replace_all(estatus,
c("En ejecucion" = "ejecucion",
"en ejecución" = "ejecucion",
"en Ejecución" = "ejecucion",
"En ejecución" = "ejecucion",
"En Ejecución" = "ejecucion",
"Enejecución" = "ejecucion",
"43741" = "ejecucion",
"finalizada" = "finalizada",
"Finalizada" = "finalizada",
"Planeada" = "planeada",
"planeada con financiamiento" = "planeada",
"planeada sin financiamiento" = "planeada"))) %>%
replace_na(list(estatus = "ejecucion")) %>%
# removing all planned activities
filter(estatus != "planeada") %>%
filter(str_detect(pcode3, "^VE")) %>% # decide if you want to do this here or later
select(-c(23:92))
# I'm kinda doubting the use of u_ben, ya I think take it out? since you're only using it once
# Am I just making these out of habit? I could make them inside the
# code chunk for parr, but maybe I can find some justification for their existence,
# maybe the disaggregations?
u_ben <- ven1 %>%
pivot_longer(f_0_18:m_18plus, names_to = "desagregacion", values_to = "beneficiarios") %>%
filter(categoria != "VACUNACION") %>% # Vaccination activities filtered out
filter(beneficiarios != 0) %>%
# for some reason, there are entries with the same ubicacion but different pcodes?
# maybe it's things like "centro municipal" or something
group_by(pcode3, ubicacion, desagregacion) %>%
slice(which.max(beneficiarios)) %>%
ungroup()
act_ben <- ven1 %>%
pivot_longer(f_0_18:m_18plus, names_to = "desagregacion", values_to = "beneficiarios") %>%
filter(categoria != "VACUNACION") %>% # Vaccination activities filtered out
filter(beneficiarios != 0) %>%
group_by(pcode3, ubicacion, desagregacion, actividad) %>%
slice(which.max(beneficiarios)) %>%
ungroup() %>%
mutate(sector = ifelse(str_detect(sector, "Proteccion_GBV|Proteccion_General|Proteccion_NNA"),
"Proteccion", sector))
# I think this is a gigantic chunk -- cannot decide if I would rather have less things in the
# environment or if I want more readable chunks. The benefit here I guess is that if I want to change something, I just have to come to this chunk
parr <- u_ben %>%
group_by(pcode3) %>%
summarise(beneficiarios = sum(beneficiarios)) %>%
ungroup() %>%
# count of organisations per pcode3
right_join(act_ben %>%
group_by(pcode3) %>%
summarise(org_count = n_distinct(org_implementadora))) %>%
# getting beneficiary frequencies, sector count and maximum multi-sector beneficiaries
right_join(
act_ben %>%
group_by(pcode3, desagregacion, sector) %>%
pivot_wider(names_from = sector, values_from = beneficiarios) %>%
rename(nutricion_ben = Nutricion, proteccion_ben = Proteccion, wash_ben = WASH,
salud_ben = Salud, educacion_ben = Educacion, sa_ben = Seguridad_Alimentaria) %>%
replace_na(list(nutricion_ben = 0, educacion_ben = 0, wash_ben = 0, salud_ben = 0,
sa_ben = 0, proteccion_ben = 0)) %>%
summarise(across(c(nutricion_ben, proteccion_ben, wash_ben,
educacion_ben, salud_ben, sa_ben), sum), .groups = "drop") %>%
mutate(ben_freq = nutricion_ben + proteccion_ben + wash_ben + salud_ben + educacion_ben + sa_ben) %>%
group_by(pcode3) %>%
summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben,
salud_ben, sa_ben, ben_freq), sum)) %>%
mutate(sec_ben_max = pmax(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben, sa_ben),
ms_ben_max = ifelse(sec_ben_max >= ben_freq - sec_ben_max,
ben_freq - sec_ben_max,
sec_ben_max),
sector_count = rowSums(select(., ends_with("_ben")) != 0))) %>%
# right_join to the census data
right_join(read_excel("census data 20191122.xlsx", sheet = "data") %>%
clean_names() %>%
# selecting variables and renaming them with select
select(estado, pcode1, municipio, pcode2, parroquia, pcode3,
fo = field_office,
poblacion_2019 = x_2019_poblacion_parroquial_total,
hogares_2011 = numero_de_hogares,
ham_2019_ambitos_ge,
percent_pobre = ham_2019_xx_pobreza_env_por_parroquia,
pob_pobre = ham_2019_xx_poblacion_pobre_por_parroquia,
poblacion_total_2011,
poblacion_infantil_menor_de_12_anos, poblacion_adolescentes_de_12_a_17_anos,
poblacion_de_18_anos_y_mas,
percent_urbana = poblacion_urbana_percent,
area_km2,
densidad_ppl_km2 = densidad_poblacional_ppl_km2,
matricula_2017_educacion_inicial, matricula_2017_educacion_primaria,
matricula_2017_educacion_media, razon_de_dependencia_total,
razon_de_dependencia_de_menores_de_15_anos,
percent_sin_agua_segura = x_abast_agua2_percent_sin_agua_segura,
percent_sin_saneamiento_mejorado =
x_saneamiento_percent_sin_saneamiento_mejorado,
percent_analfabeto = percent_poblacion_10_anos_y_mas_analfabeta,
promedio_de_personas_por_vivienda,
percent_hogares_jefatura_femenina = percent_de_hogares_con_jefatura_femenina,
percent_sin_servicio_electrico =
servicio_electrico_percent_no_tiene_servicio_electrico,
ham_2019_x_violencia_envelope, ham_2019_x_mortalidad_y_salud_envelope,
ham_2019_x_pobreza_envelope, promedio_de_edad,
relacion_de_masculinidad) %>%
mutate(estado = rm_accent(str_to_upper(estado)), # just to make sure
municipio = rm_accent(str_to_upper(municipio)),
parroquia = rm_accent(str_to_upper(parroquia))) %>%
# creating new disaggregation variables
mutate(pob_menor_de_18 = (poblacion_infantil_menor_de_12_anos +
poblacion_adolescentes_de_12_a_17_anos) /poblacion_total_2011 *
poblacion_2019,
pob_18_y_mas = poblacion_de_18_anos_y_mas / poblacion_total_2011 * poblacion_2019,
hogares_2019 = hogares_2011 * poblacion_2019 / poblacion_total_2011,
matricula_total = matricula_2017_educacion_inicial +
matricula_2017_educacion_primaria +
matricula_2017_educacion_media) %>%
# dividing columns by 100 so that they're between 0 and 1
mutate_at(vars(percent_analfabeto, percent_sin_servicio_electrico,
percent_sin_agua_segura,
percent_sin_saneamiento_mejorado,
percent_hogares_jefatura_femenina, percent_urbana,
razon_de_dependencia_total), ~(. / 100)) %>%
# mutating new columns with populations
mutate(pob_analfabeto = percent_analfabeto * poblacion_2019,
pob_sin_agua_segura = percent_sin_agua_segura * poblacion_2019,
pob_sin_servicio_electrico = percent_sin_servicio_electrico * poblacion_2019,
pob_sin_saneamiento_mejorado = percent_sin_saneamiento_mejorado * poblacion_2019,
pob_urbana = percent_urbana * poblacion_2019) %>%
select(-c(matricula_2017_educacion_inicial, matricula_2017_educacion_primaria,
matricula_2017_educacion_media, poblacion_total_2011, hogares_2011,
poblacion_infantil_menor_de_12_anos, poblacion_adolescentes_de_12_a_17_anos,
poblacion_de_18_anos_y_mas)),
by = "pcode3") %>%
# mutating new variables and making sure NAs become 0s
mutate(beneficiarios = ifelse(is.na(beneficiarios), 0, beneficiarios),
org_count = ifelse(is.na(org_count), 0, org_count),
sector_count = ifelse(is.na(sector_count), 0, sector_count),
educacion_ben = ifelse(is.na(educacion_ben), 0, educacion_ben),
nutricion_ben = ifelse(is.na(nutricion_ben), 0, nutricion_ben),
proteccion_ben = ifelse(is.na(proteccion_ben), 0, proteccion_ben),
salud_ben = ifelse(is.na(salud_ben), 0, salud_ben),
sa_ben = ifelse(is.na(sa_ben), 0, sa_ben),
wash_ben = ifelse(is.na(wash_ben), 0, wash_ben),
ms_ben_max = ifelse(is.na(ms_ben_max), 0, ms_ben_max),
ben_freq = ifelse(is.na(ben_freq), 0, ben_freq),
not_reached = pob_pobre - beneficiarios,
coverage_percent = beneficiarios / pob_pobre,
percent_total_ben_u = beneficiarios / sum(beneficiarios),
multisector_percent = ms_ben_max / ben_freq,
org_present = ifelse(beneficiarios > 0, TRUE, FALSE),
pob_pobre_score = rescale(pob_pobre, to = c(0,1)),
percent_pobre_score = rescale(percent_pobre, to = c(0,1)),
poverty_score = (pob_pobre_score + percent_pobre_score) / 2)
# taking a subset of parr to only get parrishes where the
# number of beneficiaries does not exceed the number of poor persons
parr0 <- parr %>%
filter(not_reached >= 1) %>%
mutate(gap_score = (rescale(not_reached, to = c(0,1)) + percent_pobre_score) / 2)
2. Summary of coverage and gaps
2a. Map of parrishes by gaps
# parrishes with negative poor persons are recoded as "0" so they won't mess up the scale
# even though this means that their tooltips are dropped
gaps_map <- parr %>%
right_join(pcode3_shape, by = "pcode3") %>%
st_as_sf() %>%
mutate(not_reached = ifelse(not_reached < 0.1, 0, not_reached)) %>%
mutate(not_reached = round(not_reached, digits = 0)) %>%
mutate_at(vars(percent_pobre, percent_urbana), ~(round(., digits = 2))) %>%
ggplot() +
geom_sf(aes(fill = not_reached,
text = paste0(parroquia,",", "\n",
municipio, ",", "\n",
estado, "\n",
"not reached: ", not_reached, "\n",
"org count: ", org_count, "\n",
"poverty incidence: ", percent_pobre, "\n",
"percent urban: ", percent_urbana)),
size = 0.1) +
scale_fill_viridis_c(option = "turbo", trans = "log10") +
theme_void() +
theme(legend.title = element_text(size = 7),
legend.text = element_text(size = 7),
plot.title = element_text(size = 11)) +
labs(fill = "Poor persons \nnot reached") +
ggtitle("Map of parrishes by gaps in population reached")
# so are you saying that if I change the fill to viridis in the later plot, I can use hoveron = fill?
# no you can't.
ggplotly(gaps_map, tooltip = c("text")) %>%
layout(showlegend = TRUE, legend = list(font = list(size = 6))) %>%
plotly::style(hoveron = "fill") %>%
layout(title = list(text = paste0("Map of parrishes by number poor persons not reached",
"<br>",
"<sup>",
"mouse over for details; drag and click to select and zoom","</sup>")))
Nationwide, 11,614,406 poor persons have not been covered by response activities – this means that 94.5% of all poor persons in the country have yet to be reached. This population, its distribution and its characteristics are some of the main concerns of this analysis.
2b. Grouping parrishes by coverage type
As a starting point, all 1109 parrishes (admin level 3) have been split into three groups – over, where the number of unique beneficiaries reached exceeds the number of poor persons in that parrish; under, where the coverage is less than the number of poor persons; and no coverage, comprising a total of 508 parrishes, where no activities have occurred.
parr %>%
mutate(coverage_type = case_when(not_reached <= 0 ~ "over",
not_reached > 0 & beneficiarios >= 1 ~ "under",
beneficiarios == 0 ~ "no_coverage")) %>%
group_by(coverage_type) %>%
summarise(parrishes = n(),
beneficiaries = sum(beneficiarios),
not_reached = sum(not_reached),
avg_org_count = mean(org_count),
percent_poor = round(sum(pob_pobre) / sum(poblacion_2019) * 100, digits = 1),
percent_urban = round(sum(pob_urbana) /sum(poblacion_2019) * 100, digits = 1),
percent_wo_safe_water = round(sum(pob_sin_agua_segura) / sum(poblacion_2019) * 100, digits = 1),
percent_wo_improved_sanitation = round(sum(pob_sin_saneamiento_mejorado)/
sum(poblacion_2019) * 100, digits = 1),
percent_illiterate = round(sum(pob_analfabeto) / sum(poblacion_2019) * 100, digits = 1),
avg_sector_count = mean(sector_count)) %>%
pivot_longer(cols = -coverage_type, names_to = "variable") %>%
pivot_wider(names_from = coverage_type, values_from = value) %>%
relocate(no_coverage, .after = under) %>%
pander(big.mark = ",", caption = "Parrish characteristics by coverage type", style = "rmarkdown")
Parrish characteristics by coverage type
parrishes |
11 |
590 |
508 |
beneficiaries |
660,732 |
677,752 |
0 |
not_reached |
-465,811 |
8,941,939 |
2,672,468 |
avg_org_count |
7.636 |
2.278 |
0 |
percent_poor |
20.3 |
37.4 |
46.5 |
percent_urban |
99.5 |
92.9 |
66.9 |
percent_wo_safe_water |
2 |
13.2 |
23.5 |
percent_wo_improved_sanitation |
1 |
7 |
18.6 |
percent_illiterate |
2 |
4.5 |
8.7 |
avg_sector_count |
4 |
1.742 |
0 |
A total of 2,672,468 poor persons reside in the 508 parrishes that have not been reached; this is only 24% of the 11,148,595 poor persons not covered by response activities. This indicates that:
- there is much room to expand in the parrishes where we are already present and that
- sparely populated, remote and, consequently, poorer parrishes have, so far, been left out of the response.
Additionally, the 11 parrishes in the over category are much less poor and much more urban despite having 49% of all beneficiaries. As can be seen from the row not_reached
, the number beneficiaries in the over category has greatly exceeded the number of poor persons.These parrishes are shown in the table in the next section.
2c. Overallocation in the top parrishes by coverage
The 11 parrishes below (from the over category) will largely be excluded in the remainder of this report as it is clear that no further resources should be allocated to them:
parr %>%
mutate(coverage_type = case_when(not_reached <= 0 ~ "over",
not_reached > 0 & beneficiarios >= 1 ~ "under",
beneficiarios == 0 ~ "not_reached")) %>%
filter(coverage_type == "over") %>%
select(state = estado, municipality = municipio, parrish = parroquia,
beneficiaries = beneficiarios, poor_persons = pob_pobre) %>%
mutate(coverage_percent = beneficiaries / poor_persons * 100) %>%
arrange(desc(beneficiaries)) %>%
pander(big.mark = ",", caption = "Top 11 parrishes by coverage", style = "rmarkdown")
Top 11 parrishes by coverage
DISTRITO CAPITAL |
LIBERTADOR |
ALTAGRACIA |
321,597 |
6,244 |
5,151 |
MIRANDA |
SUCRE |
PETARE |
179,716 |
82,058 |
219 |
BOLIVAR |
CARONI |
ONCE DE ABRIL |
69,443 |
54,118 |
128.3 |
BOLIVAR |
HERES |
CATEDRAL |
32,319 |
14,336 |
225.4 |
BOLIVAR |
HERES |
VISTA HERMOSA |
18,477 |
13,919 |
132.7 |
MIRANDA |
CHACAO |
CHACAO |
13,780 |
5,481 |
251.4 |
TACHIRA |
ANDRES BELLO |
CAPITAL CORDERO |
8,603 |
7,766 |
110.8 |
MIRANDA |
SUCRE |
LEONCIO MARTINEZ |
7,554 |
4,792 |
157.6 |
CARABOBO |
VALENCIA |
URBANA CANDELARIA |
5,061 |
4,589 |
110.3 |
TACHIRA |
BOLIVAR |
GENERAL JUAN VICENTE GOMEZ |
2,253 |
362.1 |
622.3 |
DELTA AMACURO |
TUCUPITA |
SAN JOSE |
1,929 |
1,256 |
153.6 |
As a note, it is likely that partners have reported activities which occurred in other parts of the capital in Altagracia, as the total number of beneficaries reached in the whole of Distrito Capital is only 416,275. It is necessary to check back with partners about this; nevertheless, this is the information we have on hand.
2d. Sex ratios by cluster
act_ben %>%
mutate(sex_ben = case_when(str_detect(desagregacion, "^m") ~ "male",
str_detect(desagregacion, "^f") ~ "female")) %>%
group_by(sector) %>%
rename(cluster = sector) %>%
summarise(ben_freq = sum(beneficiarios),
male = sum(beneficiarios[sex_ben == "male"]),
female = sum(beneficiarios[sex_ben == "female"]),
sex_ratio = round(male / female, digits = 2)) %>%
mutate(`%_ben_freq` = round((ben_freq / sum(ben_freq)) * 100, digits = 1)) %>%
relocate(`%_ben_freq`, .after = ben_freq) %>%
arrange(sex_ratio) %>%
pander(caption = "Sex ratio of beneficiary frequencies by cluster")
Sex ratio of beneficiary frequencies by cluster
Salud |
54,800 |
2.6 |
12,383 |
42,417 |
0.29 |
Proteccion |
184,189 |
8.9 |
52,270 |
131,919 |
0.4 |
Nutricion |
385,665 |
18.6 |
141,675 |
243,990 |
0.58 |
Seguridad_Alimentaria |
9,338 |
0.5 |
3,865 |
5,473 |
0.71 |
Educacion |
627,985 |
30.3 |
296,053 |
331,932 |
0.89 |
WASH |
809,949 |
39.1 |
386,483 |
423,466 |
0.91 |
All of the clusters reached more women than men, with Health and Protection being particularly heavily skewed in this regard. For Health, the disproportionality is a bit more understandable as it has a focus on antenatal and obstetric care as well as preventing mother-to-child HIV transmission; Nutrition similarly has a focus on pregnant and lactating women. However, for Protection, some investigation is necessary:
act_ben %>%
mutate(sex_ben = case_when(str_detect(desagregacion, "^m") ~ "male",
str_detect(desagregacion, "^f") ~ "female"),
activity = str_to_sentence(actividad)) %>%
filter(sector == "Proteccion") %>%
group_by(activity) %>%
summarise(ben_freq = sum(beneficiarios),
male = sum(beneficiarios[sex_ben == "male"]),
female = sum(beneficiarios[sex_ben == "female"])) %>%
arrange(desc(ben_freq)) %>% select(-ben_freq) %>% head(5) %>%
pander(caption = "Top 5 Protection activities by beneficiaries",
justify = c("left", "centre", "centre"))
Top 5 Protection activities by beneficiaries
P2.27 apoyar la emision de certificados de nacimiento a nna |
180 |
56,797 |
P2.08 apoyar a nna en riesgo con asistencia legal especializada |
16,304 |
20,866 |
P3.05 sensibilizar a nna, hombres y mujeres sobre prevencion y respuesta a la violencia, abuso, explotacion de nna |
11,660 |
19,208 |
P3.04 sensibilizar a nna, hombres y mujeres sobre prevencion de separacion familiar |
3,990 |
8,813 |
P2.06 nna participan en actividades grupales estructuradas de apoyo psicosocial |
5,222 |
5,356 |
None of the top 5 – the issue of birth certificates; legal assistance; awareness raising on violence, exploitation, abuse and family separation; or psychosocial support – should be predisposed to reach females over males (and in the case of psychosocial support, it is not). The cluster needs to verify these figures and revisit its targetting and beneficiary selection strategies.
For reference, the sex ratio of the country as a whole was 0.99 in the last census – this figure should not have been affected too much by the migrants and refugees who have left the country, as IOM’s Displacement Tracking Matrix reports that population as being 49% female and 51% percent male.
3. Geographical analysis of gaps
3a. Barplot of coverage and gaps by state
# ordering the states
state_ord <- parr %>%
group_by(estado) %>%
summarise(beneficiarios = sum(beneficiarios),
total = sum(pob_pobre)) %>%
mutate(percent_reached = beneficiarios / total) %>%
arrange(percent_reached) %>% select(estado) %>% pull()
stack_text <- parr %>%
group_by(estado) %>%
summarise(beneficiarios = sum(beneficiarios),
total = sum(pob_pobre),
not_reached = sum(not_reached)) %>%
mutate(percent_reached = round(beneficiarios / total * 100, digits = 1)) %>%
arrange(percent_reached) %>%
mutate(percent_reached = paste0(percent_reached,"%"))
state_stack <- parr %>%
select(estado, beneficiarios, not_reached) %>%
group_by(estado) %>%
summarise(beneficiaries = round(sum(beneficiarios), digits = 0),
not_reached = round(sum(not_reached), digits = 0)) %>%
pivot_longer(c(beneficiaries, not_reached),
names_to = "pob_type", values_to = "total") %>%
ggplot(aes(x = total, y = estado)) +
geom_col(aes(fill = pob_type), colour = "grey70", size = 0.03) +
scale_x_continuous(label = comma, breaks = seq(0, 1800000, 200000)) +
scale_y_discrete(limits = state_ord) +
scale_fill_manual(values = c("#DE7065FF", "#403891FF")) +
geom_text(data = stack_text, aes(label = percent_reached),
size = 2.5, colour = "white", fontface = "bold",
position = position_stack(vjust = 0.5)) +
ylab("") + xlab("Number of poor persons") +
labs(fill = "", colour = "",
title = "Barplot of poor persons by state by reached/not reached") +
theme(plot.title = element_text(size = 11),
axis.text.x = element_text(size = 7, angle = 30),
axis.title.x = element_text(size = 9))
ggplotly(state_stack) %>%
layout(legend = list(font = list(size = 7))) %>%
config(displayModeBar = FALSE) %>%
layout(title = list(text = paste0(
"Barplot of poor persons by state by not reached/reached",
"<br>",
"<sup>",
"mouse over for details; figures show percent reached","</sup>")))
86.5% of all beneficiaries are from the states of Distrito Capital, Miranda, Tachira, Bolivar and Zulia, largely corresponding to the locations of UNICEF offices. These states, with the addition of Delta Amacuro, are also where the highest percentages of poor persons have been reached. Barinas has the lowest percentage of its poor population covered.
On average, after the exclusion of the top 11 parrishes, 10.7% of poor persons have been reached countrywide. However, at the state level, this average is 4.6%.
Whilst there are many poor persons yet to be reached in states where we have relatively high coverage, there is a need to ensure that our operational footprint and the consequent resources allocated are equitable and this type of overallocation is avoided – the crisis in Venezuela is nationwide, and unlike an earthquake or a typhoon where there is an epicentre or a stormpath, there is no programmatic rationale to only focus on a few areas.
Let us now move down to a lower level of granularity as state-level analysis is still too superficial. Parrishes will be the main administrative unit of reference in this analysis. Unlike in the 5W cleaning and reporting document, where we focused on municipalities to display achievements for external audiences, greater precision is needed for a coverage and gaps analysis.
3b. Scatterplot of gaps by parrish
From the scatterplot below – where each point is a parrish – we see that there is great variation both in the number of poor persons not covered (size and x-axis) as well as how concentrated they are in a given parrish (y-axis, poverty incidence); both these factors weigh heavily in programming strategies as well as in the ease of beneficiary selection.
The greatest numbers of not reached are found in parrishes between the ranges of poor persons: 10,000-100,000 and poverty incidence: 0.25-0.50 (marked by the yellow box); however, these parrishes also have a much higher than average number of organisations present (more red). This means that operational barriers are much lower in accessing these populations than the parrishes in light blue found in the middle of the plot.
parrplot <- parr0 %>%
mutate_at(vars(pob_pobre, not_reached, org_count), ~(round(.))) %>%
mutate(percent_pobre = round(percent_pobre, digits = 2))%>%
ggplot(aes(x = not_reached, y = percent_pobre,
colour = org_count,
text = paste0(parroquia,",", "\n",
municipio, ",", "\n",
estado))) +
geom_rect(aes(xmin = 10000, xmax = 100000, ymin = 0.25, ymax = 0.50),
fill = "transparent", colour = "gold", size = 0.2) +
geom_jitter(aes(size = not_reached), alpha = 0.75) +
scale_colour_gradientn(
colours = c("cornflowerblue", "tomato", "firebrick")) +
scale_x_continuous(trans = "log10", labels = comma) +
scale_size_continuous(range = c(0.3, 5)) +
xlab("Not covered poor") + ylab("Poverty incidence") +
labs(colour = "Number of \norganisations",
title = "Scatterplot of parrishes by poor persons not covered and poverty incidence") +
theme(legend.title = element_text(size = 7),
legend.text = element_text(size = 7),
plot.title = element_text(size = 11),
axis.title = element_text(size = 8.5))
ggplotly(parrplot, tooltip = c("y", "x", "size", "text", "colour")) %>%
layout(showlegend = TRUE, legend = list(font = list(size = 7))) %>%
config(displayModeBar = FALSE) %>%
layout(title = list(text = paste0(
"Scatterplot of parrishes by number poor persons not reached and poverty incidence",
"<br>",
"<sup>",
"size: number of poor persons not reached; colour: number of organisations present; drag and click to select and zoom","</sup>")))
For agencies truly unable to expand outside their current footprints, there are still many beneficiaries who are not covered or – as we will discuss in the next chapter – have only been reached with the interventions of one sector.
4. Multi-sector programming
Humanitarian emergencies are multidimensional and needs affected persons are not delineated by cluster. Phenomena such as displacement or food insecurity result from the complex interplay between numerous underlying factors and the shocks and stresses of the hazard. Multi-sector or integrated programming, consists of implementing layers of individual, household and community-level interventions to comprehensively meet the needs of a target population. It is often held up as a key marker of programme quality in strategy documents and humanitarian standards, but rarely achieved in practice.
4a. Summary table of multi-sector coverage
However, just because two Clusters operate in the same area do not mean their beneficiaries coincide. As an estimate, we calculated a theoretical maximum number of multi-sector beneficiaries, expressed below as multi_sector_ben
(explanation and calculation in the code chunk below). But, to elaborate with an example, we make the charitable assumption that that females under 18 who are beneficiaries of Nutrition and females under 18 in that same parrish who are beneficiaries of Protection are the same people – we then sum the various age and sex disaggregation subtotals by parrish to determine the maximum number of persons who could have received multi-sector support. The actual number of multi-sector beneficiaries is likely much lower.
This data has been summarised below, according to the number of sectors present in a parrish:
# Notes on calculation for multi-sector beneficiaries.
# Basically, beneficiaries per parrish are aggregated into
# sector subtotals and a beneficiary frequency total.
# The maximum value of the sector subtotals is compared against the beneficiary frequency total,
# if the maximum value is equal to the frequency total, then there is only one sector,
# if the maximum value is less than the frequency total,
# the difference between the two (or the sum of all other sector subtotals)
# becomes the theoretical maximum number of multisector beneficiaries.
# Performing this calculation at admin level 3 makes sense as a parrish is small enough
# that there it is realistic to assume that overlaps in beneficiaries between sectors exist --
# i.e. that females under 18 in a parrish who are beneficiaries of nutrition
# and females under 18 in that same parrish who are beneficiaries of WASH are the same people.
# Although I do feel this calculation to be very charitable
# We can't really do much more unless there is a beneficiary register.
# The real number of multisector beneficiaries is likely MUCH LOWER
# but that can only be verified through sampled large-scale post-distribution/post-intervention monitoring,
# which is extremely rare.
# I actually could have raised this with the third-party monitors that UNICEF,
# so it's an oversight on my part as well.
# These operations were already performed in "A note on the data" and are part of `parr` and are
# therefore, commented out here
# act_ben %>%
# group_by(pcode3, desagregacion, sector) %>%
# pivot_wider(names_from = sector, values_from = beneficiarios) %>%
# rename(nutricion_ben = Nutricion, proteccion_ben = Proteccion, wash_ben = WASH,
# salud_ben = Salud, educacion_ben = Educacion, sa_ben = Seguridad_Alimentaria) %>%
# replace_na(list(nutricion_ben = 0, educacion_ben = 0, wash_ben = 0, salud_ben = 0,
# sa_ben = 0, proteccion_ben = 0)) %>%
# summarise(across(c(nutricion_ben, proteccion_ben, wash_ben,
# educacion_ben, salud_ben, sa_ben), sum)) %>%
# mutate(ben_freq = nutricion_ben + proteccion_ben + wash_ben + salud_ben + educacion_ben + sa_ben,
# sec_ben_max = pmax(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben, sa_ben),
# ms_ben_max = ifelse(sec_ben_max >= ben_freq - sec_ben_max,
# ben_freq - sec_ben_max,
# sec_ben_max)) %>%
# group_by(pcode3) %>%
# summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben,
# salud_ben, sa_ben, ben_freq, sec_ben_max, ms_ben_max), sum)) %>%
# mutate(sector_count = rowSums(select(., ends_with("_ben")) != 0))
# using parr means that all parrishes are included, but they are needed for the percentage calculations.
parr %>%
filter(ben_freq != 0) %>%
group_by(sector_count) %>%
summarise(parrishes = n(),
multi_sector_ben = sum(ms_ben_max),
one_sector_ben = sum(ben_freq) - sum(ms_ben_max),
`multisector_%` = round(sum(ms_ben_max) / sum(ben_freq) * 100, digits = 2)) %>%
pander(style = "rmarkdown", caption = "Summary of multisector coverage")
Summary of multisector coverage
1 |
365 |
0 |
242,522 |
0 |
2 |
103 |
40,938 |
148,208 |
21.64 |
3 |
64 |
49,475 |
149,788 |
24.83 |
4 |
42 |
154,723 |
625,430 |
19.83 |
5 |
21 |
163,159 |
372,180 |
30.48 |
6 |
6 |
51,743 |
68,611 |
42.99 |
Overall, the results are not encouraging – a maximum of 22.3% of beneficiaries (outside of the top 11 parrishes) could potentially covered by multi-sector support. When vaccinations are included, this percentage drops to 12.2%. But, as mentioned, vaccinations will be excluded from this analysis as government partners were not able to accurately provide records at the parrish level, many times defaulting to municipal or state-level reporting; additionally, the footprint for vaccination activities is determined by federal government priorities (that may not align with the humanitarian imperative), which UNICEF is also unable to influence.
As the leader of the Education, Nutrition, WASH and Child Protection Clusters, UNICEF supported activities that reached 81.4% of all beneficiaries. Meaning that this the low percentage of multi-sector support could largely be resolved by better internal coordination and better programmatic oversight within UNICEF – these issues will be even more apparent in sections 4c and 4d.
4b. Parrish-level gaps in multi-sector programming
A total of 961,893 beneficiaries received support from only one sector – this is 77.7% of all beneficiary frequencies. Parrishes below have been plotted according to their multi-sector coverage and their total number of beneficiary frequencies; larger sizes indicate parrishes where there are higher numbers of beneficiaries benefitting from only one sector:
ms_scatter <- parr0 %>%
mutate(multi_sector_percent = round(ms_ben_max / ben_freq * 100, digits = 1),
one_sector_percent = round((ben_freq - ms_ben_max) / ben_freq * 100, digits = 1),
multi_sector_ben = round(ms_ben_max, digits = 0),
one_sector_ben = round(ben_freq - ms_ben_max, digits = 0),
ben_freq = round(ben_freq, digits = 0)) %>%
ggplot(aes(x = ben_freq,
y = multi_sector_percent,
text = paste0(parroquia,",", "\n",
municipio, ",", "\n",
estado, ",", "\n",
"sector count: ", sector_count),
colour = estado)) +
geom_point(aes(size = one_sector_ben),
alpha = 0.8) +
scale_x_continuous(trans = "log10", labels = label_comma(accuracy = 1)) +
scale_size_continuous(range = c(0.1, 5)) +
scale_colour_manual(values = c(rep("coral",24))) +
xlab("Beneficiary frequencies") + ylab("Percentage received multi-sector support") +
labs(title = "Scatterplot of parrishes by beneficiary frequencies and multi-sector coverage",
size = "", colour = "") +
theme(plot.title = element_text(size = 11),
axis.title = element_text(size = 8.5))
ggplotly(ms_scatter, tooltip = c("x", "y", "size", "text")) %>%
layout(legend = list(font = list(size = 6))) %>%
config(displayModeBar = FALSE) %>%
layout(title = list(text = paste0(
"Scatterplot of parrishes by beneficiary frequencies and multi-sector coverage",
"<br>",
"<sup>",
"size: number of beneficiaries supported by only one sector; double-click state to toggle view; mouse over for details","</sup>")))
We note only a very loose relationship (r-squared = 0.183) between the number of beneficiary frequencies and the percentage of those beneficiary frequencies reached by interventions from multiple sectors. There is little else it is correlated with and, as can be seen above, there is no discernible pattern to multisector coverage. There are the 365 parrishes at the bottom of the plot with only one sector present in each – this is 61.9% of all parrishes we are responding in.
parr0 %>%
filter(beneficiarios != 0) %>%
select(estado, sector_count) %>%
group_by(estado) %>%
summarise(avg_sector = round(mean(sector_count), digits = 2)) %>%
arrange(desc(avg_sector)) %>%
filter(avg_sector > 2 | avg_sector < 1.15) %>%
pivot_wider(names_from = estado, values_from = avg_sector) %>%
mutate(`|` = c("|")) %>% relocate(`|`, .after = ZULIA) %>%
pander(caption = "Top 5 and bottom 5 states, average number of sectors per parrish", missing = "")
Top 5 and bottom 5 states, average number of sectors per parrish
3.38 |
2.6 |
2.43 |
2.37 |
2.36 |
| |
1.14 |
1.12 |
1.08 |
1.05 |
1.02 |
However, we do note that in the states where UNICEF has offices (Bolivar, Tachira, Zulia and Distrito Capital), there is much higher multi-sector coverage than in other states, perhaps indicating that a more decentralised approach where field offices have greater say in prioritisation might lead to more coordination in multi-sector programming. Though, that this greater multi-sector coverage has not expanded outside of these states to the other areas under their purview is indicative of the level of planning and coordination capacity field offices are capable of.
4c. Cluster combinations
This section will examine the parrishes that have multi-sector coverage and the types of inter-cluster combinations that can be found in them. Let us begin with an overview of the geographic reach of each cluster – we use frequencies here, as each individual might have benefitted from multiple combinations of clusters:
parr %>%
select(pcode3, educacion_ben, nutricion_ben, salud_ben, wash_ben, proteccion_ben) %>%
rename(Educacion = educacion_ben, Nutricion = nutricion_ben, Salud = salud_ben,
WASH = wash_ben, Proteccion = proteccion_ben) %>%
pivot_longer(cols = 2:6, names_to = "cluster", values_to = "beneficiary_frequencies") %>%
filter(beneficiary_frequencies != 0) %>%
group_by(cluster) %>%
summarise(parrishes = n(),
beneficiary_frequencies = sum(beneficiary_frequencies)) %>%
pander(caption = "Cluster coverage summary, excluding vaccination", big.mark = ",", style = "rmarkdown",
justify = c("left", "right", "right"))
Cluster coverage summary, excluding vaccination
Educacion |
187 |
623,052 |
Nutricion |
513 |
385,614 |
Proteccion |
151 |
184,084 |
Salud |
61 |
54,739 |
WASH |
126 |
809,949 |
Next, we will summarise the inter-cluster combinations according to:
- combination, referring to the various cluster-wise pairs that exist;
- parrishes, indicating the number of parrishes each combination is present in;
- cluster1 and cluster2, which show the number of beneficiary frequencies reached by both clusters in each pair, in the order that they appear in combination.
- pair_sum, which shows the total number of beneficiary frequencies in that pair i.e. the pair_sum for edu_nut would be the sum of education and nutrition beneficiaries.
- %ms_max, which shows the maximum percentage of multisector beneficiaries of each pair i.e. if the pair edu-nut has 10 education beneficiaries and 30 nutrition beneficiaries, the maximum number of beneficiaries which received support from both sectors is 10, resulting in a %ms_max of 25%. But, as mentioned in the notes for section 4a, this is just a theoretical maximum and the actual level of coincidence is likely much lower.
# creation of reference df for the cluster combinations
clust_com <- parr %>%
filter(ben_freq != 0) %>%
select(pcode3, ben_freq, educacion_ben, nutricion_ben, salud_ben, wash_ben, proteccion_ben) %>%
# mutate a new column for each combination of sectors -- if edu is the first cluster in the combination,
# only education beneficiaries will be used to fill values in the column
mutate(edu_only = ifelse(educacion_ben == ben_freq, educacion_ben, 0),
edu_nut = ifelse(educacion_ben > 0 & nutricion_ben > 0, educacion_ben + nutricion_ben, 0),
edu_sal = ifelse(educacion_ben > 0 & salud_ben > 0, educacion_ben + salud_ben, 0),
edu_wash = ifelse(educacion_ben > 0 & wash_ben > 0, educacion_ben + wash_ben, 0),
edu_prot = ifelse(educacion_ben > 0 & proteccion_ben > 0, educacion_ben + proteccion_ben, 0),
nut_only = ifelse(nutricion_ben == ben_freq, nutricion_ben, 0),
nut_sal = ifelse(nutricion_ben > 0 & salud_ben > 0, nutricion_ben + salud_ben, 0),
nut_wash = ifelse(nutricion_ben > 0 & wash_ben > 0, nutricion_ben + wash_ben, 0),
nut_prot = ifelse(nutricion_ben > 0 & proteccion_ben > 0, nutricion_ben + proteccion_ben, 0),
sal_only = ifelse(salud_ben == ben_freq, salud_ben, 0),
sal_wash = ifelse(salud_ben > 0 & wash_ben > 0, salud_ben + wash_ben, 0),
sal_prot = ifelse(proteccion_ben > 0 & salud_ben > 0, salud_ben + proteccion_ben, 0),
wash_only = ifelse(wash_ben == ben_freq, wash_ben, 0),
prot_wash = ifelse(wash_ben > 0 & proteccion_ben > 0, wash_ben + proteccion_ben, 0),
prot_only = ifelse(proteccion_ben == ben_freq, proteccion_ben, 0)) %>%
# pivot_longer to the clust_freq column
pivot_longer(names_to = "combination", values_to = "pair_sum", 8:22) %>%
filter(pair_sum != 0) %>%
group_by(pcode3, combination) %>%
summarise(educacion_ben = mean(educacion_ben),
nutricion_ben = mean(nutricion_ben),
salud_ben = mean(salud_ben),
wash_ben = mean(wash_ben),
proteccion_ben = mean(proteccion_ben),
pair_sum = sum(pair_sum)) %>%
# calculating the sum of frequencies in each pair
mutate(cluster1 =
case_when(
str_detect(combination, "edu_nut") ~ educacion_ben,
str_detect(combination, "edu_sal") ~ educacion_ben,
str_detect(combination, "edu_wash") ~ educacion_ben,
str_detect(combination, "edu_prot") ~ educacion_ben,
str_detect(combination, "nut_sal") ~ nutricion_ben,
str_detect(combination, "nut_wash") ~ nutricion_ben,
str_detect(combination, "nut_prot") ~ nutricion_ben,
str_detect(combination, "sal_wash") ~ salud_ben,
str_detect(combination, "sal_prot") ~ salud_ben,
str_detect(combination, "prot_wash") ~ proteccion_ben,
str_detect(combination, "edu_only") ~ educacion_ben,
str_detect(combination, "nut_only") ~ nutricion_ben,
str_detect(combination, "sal_only") ~ salud_ben,
str_detect(combination, "wash_only") ~ wash_ben,
str_detect(combination, "prot_only") ~ proteccion_ben)) %>%
mutate(cluster2 =
case_when(
str_detect(combination, "edu_nut") ~ nutricion_ben,
str_detect(combination, "edu_sal") ~ salud_ben,
str_detect(combination, "edu_wash") ~ wash_ben,
str_detect(combination, "edu_prot") ~ proteccion_ben,
str_detect(combination, "nut_sal") ~ salud_ben,
str_detect(combination, "nut_wash") ~ wash_ben,
str_detect(combination, "nut_prot") ~ proteccion_ben,
str_detect(combination, "sal_wash") ~ wash_ben,
str_detect(combination, "sal_prot") ~ proteccion_ben,
str_detect(combination, "prot_wash") ~ wash_ben,
str_detect(combination, "only$") ~ 0)) %>%
select(pcode3, combination, cluster1, cluster2, pair_sum)
# pander table cluster combinations
rbind(
clust_com %>%
filter(str_detect(combination, "only$")) %>%
group_by(combination) %>%
summarise(parrishes = n(),
cluster1 = round(sum(cluster1), digits = 0),
cluster2 = round(sum(cluster2), digits = 0),
pair_sum = round(sum(pair_sum), digits = 0)) %>%
mutate(`%ms_max` = pmin(cluster1, cluster2) / pair_sum * 100,
`%ms_max` = round(ifelse(is.nan(`%ms_max`), 0, `%ms_max`), digits = 1)) %>%
arrange(desc(pair_sum)) %>%
rbind(NA),
clust_com %>%
filter(!str_detect(combination, "only$")) %>%
group_by(combination) %>%
summarise(parrishes = n(),
cluster1 = round(sum(cluster1), digits = 0),
cluster2 = round(sum(cluster2), digits = 0),
pair_sum = round(sum(pair_sum), digits = 0)) %>%
mutate(`%ms_max` = pmin(cluster1, cluster2) / pair_sum * 100,
`%ms_max` = round(ifelse(is.nan(`%ms_max`), 0, `%ms_max`), digits = 1)) %>%
arrange(desc(pair_sum))
) %>%
kable(caption = "Cluster combinations, sorted by pair_sum", format.args = list(big.mark = ",")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Cluster combinations, sorted by pair_sum
combination
|
parrishes
|
cluster1
|
cluster2
|
pair_sum
|
%ms_max
|
nut_only
|
301
|
156,945
|
0
|
156,945
|
0.0
|
edu_only
|
34
|
75,671
|
0
|
75,671
|
0.0
|
wash_only
|
12
|
6,912
|
0
|
6,912
|
0.0
|
prot_only
|
15
|
2,852
|
0
|
2,852
|
0.0
|
sal_only
|
3
|
142
|
0
|
142
|
0.0
|
|
|
|
|
|
|
edu_wash
|
83
|
437,925
|
776,361
|
1,214,286
|
36.1
|
nut_wash
|
98
|
138,115
|
775,836
|
913,951
|
15.1
|
prot_wash
|
68
|
128,857
|
756,684
|
885,541
|
14.6
|
edu_nut
|
135
|
514,997
|
171,906
|
686,903
|
25.0
|
edu_prot
|
82
|
391,334
|
134,992
|
526,326
|
25.6
|
sal_wash
|
32
|
41,986
|
317,023
|
359,009
|
11.7
|
nut_prot
|
125
|
152,291
|
168,652
|
320,943
|
47.5
|
edu_sal
|
38
|
163,499
|
44,714
|
208,213
|
21.5
|
nut_sal
|
49
|
64,471
|
48,515
|
112,986
|
42.9
|
sal_prot
|
34
|
44,916
|
60,055
|
104,971
|
42.8
|
The most common pairing was between the Education and Nutrition clusters – they coincide in 135 parrishes, followed by Nutrition and Protection. In the next section, we will investigate whether the substantial overlap between Nutrition and Protection at the parrish level was coordinated or – as in the case with its overlap with Education, where there are no concrete programmatic links – due more to its wide operational presence.
Protection and WASH, however, both do have explicit programmatic links (in the logframe) with Education and co-occur with it in 82 and 83 parrishes respectively. A fruitful avenue of investigation would be how many beneficiaries of Education also benefitted from Protection interventions and how close it is to the 25.6% theoretical maximum.
Nutrition operates alone in 301 parrishes out of the 514 that it is present in, this is the most out of any of the other clusters – it is necessary to evaluate the extent to which other clusters can make use of the footholds established by Nutrition.
Whilst Nutrition and Health have excellent programmatic complementarity, especially with Health’s focus on obstetric, antenatal and neonatal care, this combination has the second-lowest number of beneficiary frequencies of all the combinations.
WASH overall has excellent programmatic overlap with all other clusters; and was the only cluster to programme specific multi-sector interventions – WASH in schools and WASH in health/nutrition centres. And almost none of WASH’s beneficiary frequencies occurred in parrishes where no other clusters were present. Its great reach and blanket coverage (especially water supply and other community-level activities) mean that other clusters operating in the same areas as WASH are “guaranteed” to reach beneficiaries with multi-sector programming – the challenges of these combinations being
- the intentionality of the multi-sector coverage and
- matching the scale of WASH activities.
Similar to WASH and Health, Protection has very limited beneficiary frequencies in parrishes where it operates alone. Protection coincides the most with Nutrition – this should serve as an impulse for the creation of referral pathways between the two since both carry out screening activities, made easier by the fact that both manage some form of beneficiary-level database. Protection has the most explicit progammatic links to Education in the logframe.
4d. Activity categories
To close chapter 4, the state of multi-sector programming is poor: as we have noted, there is little intentionality in deciding which areas have multi-sector coverage and which areas do not – multi-sector links do exist at the activity level, but this is a poor approximation of integrated programming. And we see the weakness of this approach in the table below, which lists the most common inter-cluster activity category combinations at the parrish level.
act_ben %>%
select(pcode3, categoria) %>%
distinct() %>%
filter(!str_detect(categoria, "OTRO")) %>%
pairwise_count(categoria, pcode3, upper = FALSE) %>%
arrange(desc(n)) %>%
left_join(act_ben %>% select(sector1 = sector, categoria) %>% distinct(),
by = c("item1" = "categoria")) %>%
left_join(act_ben %>% select(sector2 = sector, categoria) %>% distinct(),
by = c("item2" = "categoria")) %>%
filter(sector1 != sector2) %>%
mutate_at(vars(item1, item2), ~str_to_title(.)) %>%
select(actvity_category1 = item1, activity_category2 = item2, count = n) %>%
head(15) %>%
kbl(caption = "Most common inter-cluster activity category combinations") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Most common inter-cluster activity category combinations
actvity_category1
|
activity_category2
|
count
|
Prevencion_desnutricion_aguda
|
Acceso_permanencia_escolar
|
114
|
Prevencion_desnutricion_aguda
|
Resilencia_educacion
|
108
|
Fortalecimiento_capacidad_educacion
|
Prevencion_desnutricion_aguda
|
102
|
Provision_de_servicios
|
Prevencion_desnutricion_aguda
|
91
|
Acceso_permanencia_escolar
|
Tratamiento_desnutricion_aguda
|
79
|
Resilencia_educacion
|
Tratamiento_desnutricion_aguda
|
75
|
Fortalecimiento_capacidad_educacion
|
Tratamiento_desnutricion_aguda
|
74
|
Prevencion_desnutricion_aguda
|
Capacitaciones_proteccion
|
69
|
Prevencion_desnutricion_aguda
|
Fortalecimiento_institucional
|
56
|
Provision_de_servicios
|
Tratamiento_desnutricion_aguda
|
54
|
Prevencion_desnutricion_aguda
|
Agua_en_comunidades
|
53
|
Provision_de_servicios
|
Acceso_permanencia_escolar
|
52
|
Prevencion_desnutricion_aguda
|
Wash_en_salud_nutricion
|
50
|
Provision_de_servicios
|
Resilencia_educacion
|
49
|
Fortalecimiento_capacidad_educacion
|
Provision_de_servicios
|
47
|
Of the 15 most common activity category combinations, we see only four combinations which might actually convey multi-sector benefits:
The 4th and 11th entries, Prevention of acute malnutrition / Provision of protection services and Provision of protection services / Treatment of acute malnutrition where referrals between the Nutrition and Protection clusters might feasibly result in vulnerable populations receiving a combination of protection services, micronutrient supplementation, de-worming, treatment and counselling that would reduce their vulnerability in a multidimensional manner; the
9th entry, Water supply in communities / Prevention of acute malnutrition, as improved water supply is a core component of preventing malnutrition; and the
11th, Access to education and student retention / Provision of protection services, as the population of children with poor access to education and are in danger of dropping out would, presumably, be more in need of protection services (which include referrals to the formal social welfare system, legal aid, support for GBV survivors and UASC).
However, as mentioned previously, these multi-sector benefits are theoretical and it has not been established that there is communication between the implementing partners of these activities. Project monitoring is required to establish this.
Combinations like Access to education and student retention / Prevention of acute malnutrition (1st) deal with almost entirely separate populations – children of schoolgoing age are outside of the target population for Nutrition; though it is feasible that Education and Nutrition could both be dealing with the same young mother who has dropped out of school.
Furthermore, combinations like Prevention of acute malnutrition / Resilience in education (2nd) and Capacity building in education / Prevention of acute malnutrition (3rd) have no overlap as malnutrition prevention has no programmatic link to safe school strategies, DRR in schools or teacher training.
As it stands, cluster footprints seem to have much more to do with opportunistic expansion strategies dependent on partners’ preferred areas than a needs-based approach which targets the most vulnerable. The next chapter is an effort to examine and correct this.
5. Decision trees
5a. Introduction to trees – organisational presence
To prioritise between the 1109 parrishes in Venezuela – that is, to determine where we should be working – it is necessary to split them up into more easily digestible groups, and we will use decision trees to do this. A prioritisation or vulnerability score is another commonly-used prioritisation tool, but, as we will see, collapsing a number of variables down into one score is not always helpful.
To understand how a decision tree functions, let us construct one to predict whether or not there is a humanitarian agency present in a parrish (org_present
– this is our dependent variable). We have supplied our model with a basket of census indicators from which it will construct a tree to predict our dependent variable – unhide the code below to see the full model. The decision tree printed below is the result:
# just to show the decision tree of how partners seem to have chosen locations.
# using full parr dataset for the tree
# no doubt there are other factors, but this is the data I have --
# looking at specific partner characteristics would be interesting.
set.seed(3000)
tree2 <- parr %>%
rpart(org_present ~ percent_pobre + percent_urbana + densidad_ppl_km2 +
razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +
percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina,
promedio_de_personas_por_vivienda, data = ., minbucket = 100)
fancyRpartPlot(tree2, digits = -3, sub = "", palettes = "Blues", type = 2)

To understand the plot above, all parrishes have been split into four groups (the terminal nodes at the bottom marked [4], [5], [6] and [7]) based on the percentage of parrishes in each node where humanitarian agencies are present. Each node has three figures – for instance, the root, at the top, and marked [1], shows that on average, 0.544 or 54.4% of all parrishes have humanitarian agencies present in them. The next numbers, “n = 1109” shows that 1109 parrishes are in that group and next to it is the percentage of parrishes it contains, which, since it is the root, is 100%.
We see that [7] (in dark blue) is the node with the highest concentration of parrishes with agencies present (84.3%); it consists of parrishes more than 79.4% urbanised and denser than 156 ppl/km2. And [4], the node with the lowest concentration of parrishes with agencies (25.6%) is less than 79.4% urbanised and less than 21.8% urbanised.
This is, of course, not to imply that this actually depicts partners’ decision-making process, just that these are the factors towards which we, as a response, are predisposed. Perhaps it is understandable that the most heavily populated parrishes are more likely to have organisations present, though population density and urban population are both negatively correlated with poverty incidence. The largest determinants of the number of beneficiaries reached per parrish are population density and urbanisation, as beneficiary numbers tend to scale in line with larger populations.
5b. Prioritisation tree
Now let us focus on future decisions:
Several trees were built and trialled to split parrishes into targetting groups. As mentioned, the independent variables come from a pool of indicators from the census dataset, with some originating from the 2019 UNICEF Municipal Prioritisation Tool, which was a Principal Components Analysis of key variables related to poverty, health and mortality and violence and insecurity. After numerous iterations, tree3
was chosen; it splits parrishes into groups according to the:
- The poverty score, which is the a rescaled average of the number of poor persons and the poverty incidence of each parrish.
To see the specific variables and formulae used for each of the major iterations, as well as additional notes on the development and application of decision trees, unhide the source code below.
# As opposed to a prioritisation score -- typically the weighted average of several
# demographic and socioeconomic indicators --
# a tree is much better at accounting for the variations across geographic areas.
# A partner might not have the capacity to work outside of urban areas or
# might have specific geographic biases and decision trees are a good tool
# to make the best possible targetting decisions within one's constraints.
# With that in mind, tree3 was developed to aid future prioritisation.
# The independent variable it strives to predict is the poverty score, which, as mentioned,
# is just the rescaled average of number of poor persons and poverty incidence.
# The performance of tree3 was considered superior to both tree1
# (whose indendepent variable is just the absolute number of poor persons)
# and tree4 (which considered parrish-level gaps) due to its ability to
# clearly distinguish its groups of parrishes and because it is not dependent on gaps data --
# meaning it will not shift when the 5Ws are updated.
set.seed(3000)
# number of not covered poor persons
tree1 <- parr0 %>%
rpart(not_reached ~ estado + percent_pobre + percent_urbana +
densidad_ppl_km2 + razon_de_dependencia_de_menores_de_15_anos +
razon_de_dependencia_total +
percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina,
promedio_de_personas_por_vivienda, data = ., cp = 0.038)
# tree based on poverty_score
tree3 <- parr0 %>%
rpart(poverty_score ~ estado + percent_urbana + densidad_ppl_km2 +
razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +
percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina,
promedio_de_personas_por_vivienda, data = ., cp = 0.044)
# tree based on gap score -- let's not use this
# as tree3 is more stable and will not change based on new 5W data
# tree4 <- parr0 %>%
# rpart(gap_score ~ estado + percent_urbana + densidad_ppl_km2 +
# razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +
# percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
# percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina,
# promedio_de_personas_por_vivienda, data = ., cp = 0.045)
# plotcp(tree3)
# printcp(tree3)
# adding tree1 and tree3 rules to the dataset
parr0 <- parr0 %>%
mutate(rule1 = row.names(tree1$frame)[tree1$where]) %>%
left_join(rpart.rules.table(tree1) %>%
filter(Leaf == TRUE) %>%
rename(rule1 = Rule) %>%
group_by(rule1) %>%
summarise(subrules1 = paste(Subrule, collapse = ","))) %>%
mutate(rule3 = row.names(tree3$frame)[tree3$where]) %>%
left_join(rpart.rules.table(tree3) %>%
filter(Leaf == TRUE) %>%
rename(rule3 = Rule) %>%
group_by(rule3) %>%
summarise(subrules3 = paste(Subrule, collapse = ",")))
5c. Sub-groups of decision tree3
Below is a plot of tree3
– the 1098 parrishes where the number of beneficiaries does not exceed the number of poor persons (corresponding to the under and no coverage categories) have been split into four terminal nodes: [4], [5], [6] and [7]. The manner in which they have been split is meaningful for targetting decisions and this section will compare the characteristics of each. Please note that this is a different tree than in section 5a – the two models have the same numbering because they have the same number of splits.
fancyRpartPlot(tree3, digits = -3, sub = "", palettes = "Blues", type = 2)

Summary and overview of the terminal nodes of tree3
# will they be confused that the terminal nodes have the same codes?
# Should I explain in the text that this is because they have the same number of splits
# or will that just make them more confused?
parr0 %>%
group_by(rule3) %>%
summarise(parr_no_ben = n_distinct(pcode3[beneficiarios == 0]),
beneficiaries = sum(beneficiarios),
avg_beneficiaries = sum(beneficiarios) / n(),
not_reached = sum(not_reached),
avg_not_reached = sum(not_reached) / n(),
avg_org_count = mean(org_count),
avg_population = mean(poblacion_2019),
percent_poor = round(sum(pob_pobre) / sum(poblacion_2019) * 100, digits = 1),
percent_urban = round(sum(pob_urbana) / sum(poblacion_2019) * 100, digits = 1),
density_ppl_km2 = sum(poblacion_2019) / sum(area_km2, na.rm = TRUE),
parrishes = n()) %>%
gather(key = variable, value = value, 2:ncol(.)) %>%
spread_(key = names(.)[1], value = 'value') %>%
# reordering the table instead of having it be alphabetical
arrange(factor(variable, levels = c("not_reached", "avg_not_reached", "avg_population",
"beneficiaries",
"avg_beneficiaries", "avg_org_count",
"percent_poor", "percent_urban", "density_ppl_km2",
"parrishes", "parr_no_ben"))) %>%
pander(big.mark = ",", caption = "Summary table of the terminal nodes of tree3")
Summary table of the terminal nodes of tree3
not_reached |
1,087,505 |
6,361,254 |
3,644,252 |
521,395 |
avg_not_reached |
8,631 |
12,877 |
9,641 |
5,214 |
avg_population |
46,126 |
35,806 |
19,120 |
7,174 |
beneficiaries |
179,904 |
363,258 |
116,270 |
18,320 |
avg_beneficiaries |
1,428 |
735.3 |
307.6 |
183.2 |
avg_org_count |
2.286 |
1.427 |
0.8386 |
0.34 |
percent_poor |
21.8 |
38 |
52 |
75.2 |
percent_urban |
99 |
92.5 |
75.4 |
23.4 |
density_ppl_km2 |
996.1 |
147.4 |
18.07 |
1.855 |
parrishes |
126 |
494 |
378 |
100 |
parr_no_ben |
36 |
202 |
195 |
75 |
[4] consists of population centres which are easy to reach, but with only 21.8% of the population being poor, careful targetting and beneficiary selection is required – blanket coverage will only result in excessive inclusion errors. It also has the highest average number of organisations present per parrish (avg_org_count). There are 126 parrishes in this group. These parrishes should not be prioritised – resources should be allocated elsewhere.
[5] is probably the best option for expansion for most partners – it has the highest concentration of poor persons not covered per parrish (avg_not_reached), is substantially poorer than [4], with a poverty incidence of 38%. Additionally, these parrishes are still very urbanised (92.5%), meaning that access to these populations will not be challenging. The coverage of organisations is still fairly high and partners should consider expanding into parrishes to the ones they currently cover. This is the largest group, with 494 parrishes.
[6] is where access starts to get more challenging – though these parrishes have an average poverty incidence of 52%, the rate of urbanisation drops to 75.4% and the population density is only 18 ppl/km2. But there are still more poor persons not covered per parrish in this group than in [4]. There are 378 parrishes in this group.
[7] consists of the poorest, most vulnerable and most remote parrishes. Working in these areas will incur significant operational and logistical costs. However, with an average poverty incidence of 75.2%, blanket coverage will be warranted in many cases – if the challenge of reaching all of the population can be met. Additionally, they also have the lowest average number of poor persons not covered, given their extremely low population density of 1.8 ppl/km2. Humanitarian agencies have the lowest presence in these parrishes. It is advisable for donors to incentivise activities in these areas as they are very underserved. There are 100 parrishes in this group.
5d. Map of parrishes by decision tree node
# just one note for this map -- I still can't figure out how to get the tooltip to appear when
# you're hovering over the centroid instead of at the border; hoveron fill doesn't work.
# I think you should just ask stackoverflow GIS
# hex for Set2 "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#FFFFFF"
# hex for Dark2 "#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#FFFFFF"
# hex for Accent "#7FC97F", "#BEAED4", "#FDC086", "#FFFF99", "#FFFFFF"
# scale_fill_viridis_d()
# scale_fill_manual(values = c())
parrmap_org <- parr %>%
left_join(parr0 %>%
select(pcode3, rule3), by = "pcode3") %>%
right_join(pcode3_shape, by = "pcode3") %>%
st_as_sf() %>%
mutate(not_reached = round(not_reached, digits = 0),
tree_node = rule3) %>%
mutate_at(vars(percent_pobre, percent_urbana), ~(round(., digits = 2))) %>%
ggplot() +
geom_sf(size = 0.1,
aes(fill = tree_node,
text = paste0(parroquia,",", "\n",
municipio, ",", "\n",
estado, "\n",
"not covered: ", not_reached, "\n",
"poverty incidence: ", percent_pobre, "\n",
"percent urban: ", percent_urbana, "\n",
"org present :", org_present),
alpha = org_present)) +
theme_void() +
scale_fill_manual(values = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3")) +
scale_alpha_discrete(range = c(1, 0.7)) +
theme(legend.title = element_text(size = 7),
legend.text = element_text(size = 7),
plot.title = element_text(size = 11)) +
guides(alpha = FALSE) +
labs(fill = "Tree node",
alpha = "") +
ggtitle('Map of parrishes by decision tree node (colour) & if organisations present (alpha)')
ggplotly(parrmap_org, tooltip = c("text", "fill")) %>%
layout(title = list(text = paste0(
"Map of parrishes by decision tree node (colour) & if organisations present (alpha)",
"<br>",
"<sup>",
"mouse over for details; drag and click to select and zoom; double-click legend select/deselect","</sup>")))
Above is a map of parrishes by their decision tree node (denoted by colour), we have also decreased the alpha for parrishes where there are already organisations present, meaning that they appear more transparent. Looking at areas with the greatest concentrations of [4] and [5], we can see that they conform to the the Venezuela Costal Range and the Venezuelan Andes, where most of the country’s population is located; as a reminder, parrishes in node [5] are excellent candidates for expansion.
We also see three large clusters of parrishes from node [7] – the poorest and most-sparsely populated areas – in Amazonas and Bolivar (at the bottom of the map), in Delta Amacuro (at the extreme right) and in Lara and Falcon (top-left). Double-click on each legend item to toggle the view.
As a final note for this chapter, the prioritisation tree was limited to four terminal nodes because they captured the vast majority of the variance amongst the parrishes – any more splits would only have diminishing returns. But, should a partner want to see a more complex tree with more nodes, it would be very easy to supply them with it. But I feel these more complex trees would mostly serve as references, rather than actual prioritisation tools – I already question partners’ ability to deal with 4 separate groups, each requiring their own strategies, much less 8 or 10.
- Chapter 5 annex: Additional notes on tree1, which was not selected – unhide code to see
# the main problem I see is that each of the leaves has little variance in terms of poverty incidence
# but let me know if you want maps or products focused on this tree, it's pretty easy to do.
# [15] is very, very attractive. Maybe I can do something with it.
# 6 is dense, urban and highest operational presence,
# 2 is just too big. 800 parrishes is just too many. The low end is distinguished much better in tree3
# 14 is just rich, urban and not a priority. It's also a really small leaf.
# 15 is actually a really good leaf -- really high nc_per_parr, few parrishes,
# very dense, very urban and 42% poor and such an immensely low coverage percent.
# Good low-hanging fruit. I almost want to keep tree1 just because of this leaf.
# Maybe I will make one map just for this. 59,508 nc_per_parr is massive.
parr0 %>%
group_by(rule1) %>%
summarise(parr_no_ben = n_distinct(pcode3[beneficiarios == 0]),
beneficiarios = sum(beneficiarios),
ben_per_parr = sum(beneficiarios) / n(),
not_reached = sum(not_reached),
nr_per_parr = sum(not_reached) / n(),
nr_per_mun = sum(not_reached) / n_distinct(pcode2),
avg_org_count = mean(org_count),
coverage_percent = sum(beneficiarios) / sum(poblacion_2019),
percent_pobre = sum(pob_pobre) / sum(poblacion_2019),
percent_urbana = sum(pob_urbana) / sum(poblacion_2019),
densidad_ppl_km2 = sum(poblacion_2019) / sum(area_km2, na.rm = TRUE),
parroquias = n(),
municipios = n_distinct(pcode2),
parr_per_mun = n() / n_distinct(pcode2)) %>%
gather(key = variable, value = value, 2:ncol(.)) %>%
spread_(key = names(.)[1], value = 'value') %>%
arrange(factor(variable, levels = c("not_reached", "nr_per_parr", "nr_per_mun", "beneficiarios",
"ben_per_parr", "avg_org_count", "coverage_percent",
"percent_pobre", "percent_urbana", "densidad_ppl_km2",
"parroquias", "parr_no_ben", "municipios",
"parr_per_mun"))) %>% pander(big.mark = ",")
7. Reference table
type or use slider to filter by categories or values; use arrows to sort; table preliminarily arranged in descending order of poor persons of not reached
parr0 %>%
mutate(educacion_only = ifelse(educacion_ben > 0, "educacion", ""),
nutricion_only = ifelse(nutricion_ben > 0, "nutricion", ""),
salud_only = ifelse(salud_ben > 0, "salud", ""),
wash_only = ifelse(wash_ben > 0, "wash", ""),
proteccion_only = ifelse(proteccion_ben > 0, "proteccion", ""),
sectors_present = str_trim(paste0(educacion_only, " ", nutricion_only, " ", proteccion_only, " ",
salud_only, " ", wash_only))) %>%
mutate_at(vars(pob_pobre, not_reached, org_count, beneficiarios), ~(round(., digits = 0))) %>%
mutate_at(vars(percent_pobre, percent_urbana, coverage_percent), ~(round(.*100, digits = 1))) %>%
mutate(rule3 = as.numeric(rule3),
sectors_present = ifelse(sectors_present == "", "none", sectors_present)) %>%
select(state = estado, municipality = municipio, parrish = parroquia,
percent_poor = percent_pobre, percent_urban = percent_urbana, not_reached,
beneficiaries = beneficiarios, coverage_percent, tree_node = rule3,
org_count, sector_count, sectors_present, pcode3) %>%
arrange(desc(not_reached)) %>%
# the js is adjusting the font size for the whole container -- there doesn't seem to be another way
datatable(filter = "top", options = list(pageLength = 10, scrollX = TRUE,
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "8.5pt", "'});"),
"}")
)
)
---
title: "Coverage and gaps in the humanitarian response in Venezuela in 2019"
author: "Sean Ng"
date: "24/11/2021"
output: 
  html_document:
    code_download: true
    code_folding: hide
    theme: readable
    toc: true
    toc_depth: 4
    toc_float: true
    number_sections: false
    collapsed: false
    
---

<style>
    body .main-container {
        max-width: 1280px;
    }
</style>

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width=9, message = FALSE, warning=FALSE)
library(tidyverse)
library(readxl)
library(lubridate)
library(janitor)
library(stringi)
library(stringr)
library(tidytext)
library(widyr)

library(pander)
library(DT)
library(knitr)
library(kableExtra)

library(ggplot2)
library(plotly)
library(scales)
library(ggforce)
library(ggpubr)
library(forcats)
library(patchwork)

library(rattle)
library(rpart)
library(rpart.plot)
library(rpart.utils)
library(partykit)
library(corrplot)
library(factoextra)
library(shiny)

library(ggmap)
library(sf)
library(rmapshaper)
library(viridis)

# disabling scientific notation
options(scipen = 100)

# pander tables all in one row
panderOptions('table.split.table', Inf)

# pander thousands separator
panderOptions("big.mark", ",")

`%out%` <- Negate(`%in%`)

# function to remove accents 
rm_accent <- function(colns){
  colns <- stri_trans_general(colns, "Latin-ASCII")
}

# for kable to treat NA as blank
options(knitr.kable.NA = '')

```

## 1. Introduction

Humanitarian needs always grossly outweigh available funding; however, it remains an industry-wide challenge to respond adequately to gaps in coverage and reallocate resources accordingly. Too often, once committed to a course of action, clusters and their humanitarian partners do not re-examine or re-evaluate their interventions. This results in responses with glaring gaps that are either not resolved in a timely manner or go completely unaddressed.

This automated reported is intended to serve as a template for a coverage and gaps analysis. It provides examples of the analyses necessary to identify populations in need who are not served by humanitarian action and provide recommendations on how partners may best reach them. This document assumes that basic reporting has already occurred; and a more generic 5W report template may be found [here](https://seanywng.github.io/5W/#B_Reporting_on_the_5W_data).  

Coverage and gaps analyses are key documents, but are also rarely taken into account during operational planning or referenced during revisions of major strategic documents, such as Humanitarian Response Plans (HRPs). Neither are they mentioned in OCHA's HRP guidance, and their usage remains quite uncommon. This document intends to show that coverage and gaps analyses are not complex, impenetrable tools; rather, they contain, and are principally concerned with, practical, actionable information. 


#### A note on the data

* Most of the data originates from the Education, Health, Nutrition, Protection and WASH Clusters, from May to October 2019 -- and any conclusions or analysis are bounded by this time period and are illustrative of the response as it was in November 2019. Partner data has been anonymised. Other data originate from the census dataset of Venezuela that was maintained by UNICEF. Unlike the document of 5W reporting and cleaning, we will not be exploring the cleaning process. But the source code of each chunk will be displayed when the `Code` button is clicked. The two chunks on the right contain code for data cleaning and preparation. 

```{r reading-and-cleaning-and-intermedite-ouputs}

# reading and cleaning -- you really should break it into parts
ven1 <- read_csv("consolidation 191209 1636.csv") %>% 
  clean_names() %>% 
  # removing unused columns
  select(-c(codigodeestablecimientoocentro, loc_id, hrp_sitre_p_indicator, 
            tipoderespuesta, comentarios, coordeadas_gps_x, coordeadas_gps_y,
            fechade_inicio, fecha_previstade_finalizacion)) %>% 
  # renaming unwieldy columns 
  rename(ubicacion          = comunidadonombredelestablecimiento_centro, 
         sector             = sector_areade_responsabiliad,
         beneficiarios_meta = beneficiarios_meta_numerodepersonas,
         estatus            = estatusdeprogramacion) %>% 
      # mutating the date to the right format
  mutate(month = as.factor(recode(month,
                        `4` = "30/04/2019",
                        `5` = "31/05/2019",
                        `6` = "30/06/2019",
                        `7` = "31/07/2019",
                        `8` = "31/08/2019",
                        `9` = "30/09/2019",
                        `10` = "31/10/2019"))) %>% 
  mutate(month = as.Date(month %>% strptime(., format = "%d/%m/%Y"))) %>% 
  mutate(org_lider = coalesce(org_lider, org_implementadora)) %>% 
  # correcting sector names
  mutate(sector = str_replace_all(sector, c(
    "Agua_saneamiento_higiene"            = "WASH",
    "educacion"                           = "Educacion",
    "Nutricion"                           = "Nutricion",
    "protección_Niños_Niñas_Adolescentes" = "Proteccion_NNA",
    "Protección_Niños_Niñas_Adolescentes" = "Proteccion_NNA",
    "Protección_Violencia_Género"         = "Proteccion_GBV"))) %>% 
  # renaming beneficiary disaggregation columns 
  rename(f_0_18 = f_18,
         m_0_18 = m_18,
         f_18plus = f_18_2,
         m_18plus = m_18_2) %>% 
  mutate(estado    = rm_accent(str_to_upper(estado)), 
         municipio = rm_accent(str_to_upper(municipio)),
         parroquia = rm_accent(str_to_upper(parroquia)),
         ubicacion = rm_accent(str_to_upper(ubicacion)),
         actividad = rm_accent(str_to_upper(actividad)),
         categoria = rm_accent(str_to_upper(categoriadeactividad))) %>% 
  # recoding the estatus column 
  mutate(estatus = str_replace_all(estatus, 
                  c("En ejecucion" = "ejecucion", 
                    "en ejecución" = "ejecucion", 
                    "en Ejecución" = "ejecucion",
                    "En ejecución" = "ejecucion",
                    "En Ejecución" = "ejecucion",
                    "Enejecución"  = "ejecucion",
                    "43741"        = "ejecucion",
                    "finalizada" = "finalizada",
                    "Finalizada" = "finalizada",
                    "Planeada" = "planeada",
                    "planeada con financiamiento" = "planeada",
                    "planeada sin financiamiento" = "planeada"))) %>% 
  replace_na(list(estatus = "ejecucion")) %>% 
  # removing all planned activities 
  filter(estatus != "planeada") %>% 
  filter(str_detect(pcode3, "^VE")) %>% # decide if you want to do this here or later
  select(-c(23:92))

# I'm kinda doubting the use of u_ben, ya I think take it out? since you're only using it once
# Am I just making these out of habit? I could make them inside the 
# code chunk for parr, but maybe I can find some justification for their existence, 
# maybe the disaggregations? 

u_ben <- ven1 %>% 
  pivot_longer(f_0_18:m_18plus, names_to = "desagregacion", values_to = "beneficiarios") %>% 
  filter(categoria != "VACUNACION") %>% # Vaccination activities filtered out
  filter(beneficiarios != 0) %>% 
  # for some reason, there are entries with the same ubicacion but different pcodes? 
  # maybe it's things like "centro municipal" or something
  group_by(pcode3, ubicacion, desagregacion) %>% 
  slice(which.max(beneficiarios)) %>% 
  ungroup()

act_ben <- ven1 %>% 
  pivot_longer(f_0_18:m_18plus, names_to = "desagregacion", values_to = "beneficiarios") %>% 
  filter(categoria != "VACUNACION") %>% # Vaccination activities filtered out
  filter(beneficiarios != 0) %>% 
  group_by(pcode3, ubicacion, desagregacion, actividad) %>% 
  slice(which.max(beneficiarios)) %>% 
  ungroup() %>% 
  mutate(sector = ifelse(str_detect(sector, "Proteccion_GBV|Proteccion_General|Proteccion_NNA"), 
                         "Proteccion", sector))

```

```{r printing-u-ben-act-ben-diff, include=FALSE}
rbind(sum(u_ben$beneficiarios), 
      sum(act_ben$beneficiarios), 
      sum(u_ben$beneficiarios) - sum(act_ben$beneficiarios))

```

```{r making-all-parr-and-parr0}
# I think this is a gigantic chunk -- cannot decide if I would rather have less things in the 
# environment or if I want more readable chunks. The benefit here I guess is that if I want to change something, I just have to come to this chunk

parr <- u_ben %>% 
  group_by(pcode3) %>% 
  summarise(beneficiarios = sum(beneficiarios)) %>% 
  ungroup() %>% 
  # count of organisations per pcode3
  right_join(act_ben %>%
             group_by(pcode3) %>% 
             summarise(org_count = n_distinct(org_implementadora))) %>% 
  # getting beneficiary frequencies, sector count and maximum multi-sector beneficiaries
  right_join(
    act_ben %>%
      group_by(pcode3, desagregacion, sector) %>% 
      pivot_wider(names_from = sector, values_from = beneficiarios) %>% 
      rename(nutricion_ben = Nutricion, proteccion_ben = Proteccion, wash_ben = WASH, 
             salud_ben = Salud, educacion_ben = Educacion, sa_ben = Seguridad_Alimentaria) %>% 
      replace_na(list(nutricion_ben = 0, educacion_ben = 0, wash_ben = 0, salud_ben = 0,
                      sa_ben = 0, proteccion_ben = 0)) %>%
      summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, 
                         educacion_ben, salud_ben, sa_ben), sum), .groups = "drop") %>%
      mutate(ben_freq = nutricion_ben + proteccion_ben + wash_ben + salud_ben + educacion_ben + sa_ben) %>% 
          group_by(pcode3) %>% 
          summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben,
                                salud_ben, sa_ben, ben_freq), sum)) %>% 
      mutate(sec_ben_max = pmax(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben, sa_ben),
             ms_ben_max  = ifelse(sec_ben_max >= ben_freq - sec_ben_max, 
                                  ben_freq - sec_ben_max, 
                                  sec_ben_max),
            sector_count = rowSums(select(., ends_with("_ben")) != 0))) %>% 
  # right_join to the census data
  right_join(read_excel("census data 20191122.xlsx", sheet = "data") %>% 
        clean_names() %>% 
        # selecting variables and renaming them with select
        select(estado, pcode1, municipio, pcode2, parroquia, pcode3, 
               fo = field_office,
               poblacion_2019 = x_2019_poblacion_parroquial_total,
               hogares_2011 = numero_de_hogares, 
               ham_2019_ambitos_ge, 
               percent_pobre = ham_2019_xx_pobreza_env_por_parroquia, 
               pob_pobre = ham_2019_xx_poblacion_pobre_por_parroquia, 
               poblacion_total_2011,
               poblacion_infantil_menor_de_12_anos, poblacion_adolescentes_de_12_a_17_anos,
               poblacion_de_18_anos_y_mas, 
               percent_urbana = poblacion_urbana_percent, 
               area_km2, 
               densidad_ppl_km2 = densidad_poblacional_ppl_km2,
               matricula_2017_educacion_inicial, matricula_2017_educacion_primaria, 
               matricula_2017_educacion_media, razon_de_dependencia_total,
               razon_de_dependencia_de_menores_de_15_anos, 
               percent_sin_agua_segura = x_abast_agua2_percent_sin_agua_segura,
               percent_sin_saneamiento_mejorado =
                 x_saneamiento_percent_sin_saneamiento_mejorado,
               percent_analfabeto = percent_poblacion_10_anos_y_mas_analfabeta,
               promedio_de_personas_por_vivienda,
               percent_hogares_jefatura_femenina = percent_de_hogares_con_jefatura_femenina,
               percent_sin_servicio_electrico =
                 servicio_electrico_percent_no_tiene_servicio_electrico,
               ham_2019_x_violencia_envelope, ham_2019_x_mortalidad_y_salud_envelope, 
               ham_2019_x_pobreza_envelope, promedio_de_edad, 
               relacion_de_masculinidad) %>% 
        mutate(estado     = rm_accent(str_to_upper(estado)), # just to make sure 
               municipio  = rm_accent(str_to_upper(municipio)),
               parroquia  = rm_accent(str_to_upper(parroquia))) %>% 
        # creating new disaggregation variables 
        mutate(pob_menor_de_18 = (poblacion_infantil_menor_de_12_anos +
                                 poblacion_adolescentes_de_12_a_17_anos) /poblacion_total_2011 *
                                 poblacion_2019, 
               pob_18_y_mas    = poblacion_de_18_anos_y_mas / poblacion_total_2011 * poblacion_2019, 
               hogares_2019    = hogares_2011 * poblacion_2019 / poblacion_total_2011, 
               matricula_total = matricula_2017_educacion_inicial + 
                                 matricula_2017_educacion_primaria + 
                                 matricula_2017_educacion_media) %>% 
        # dividing columns by 100 so that they're between 0 and 1
        mutate_at(vars(percent_analfabeto, percent_sin_servicio_electrico, 
                       percent_sin_agua_segura,
                       percent_sin_saneamiento_mejorado,
                       percent_hogares_jefatura_femenina, percent_urbana,
                       razon_de_dependencia_total), ~(. / 100)) %>% 
        # mutating new columns with populations
        mutate(pob_analfabeto               = percent_analfabeto * poblacion_2019,
               pob_sin_agua_segura          = percent_sin_agua_segura * poblacion_2019, 
               pob_sin_servicio_electrico   = percent_sin_servicio_electrico * poblacion_2019,
               pob_sin_saneamiento_mejorado = percent_sin_saneamiento_mejorado * poblacion_2019,
               pob_urbana                   = percent_urbana * poblacion_2019) %>% 
        select(-c(matricula_2017_educacion_inicial, matricula_2017_educacion_primaria, 
               matricula_2017_educacion_media, poblacion_total_2011, hogares_2011,
               poblacion_infantil_menor_de_12_anos, poblacion_adolescentes_de_12_a_17_anos, 
               poblacion_de_18_anos_y_mas)),
            by = "pcode3") %>% 
  # mutating new variables and making sure NAs become 0s 
  mutate(beneficiarios  = ifelse(is.na(beneficiarios), 0, beneficiarios),
         org_count      = ifelse(is.na(org_count), 0, org_count),
         sector_count   = ifelse(is.na(sector_count), 0, sector_count), 
         educacion_ben  = ifelse(is.na(educacion_ben), 0, educacion_ben),
         nutricion_ben  = ifelse(is.na(nutricion_ben), 0, nutricion_ben),
         proteccion_ben = ifelse(is.na(proteccion_ben), 0, proteccion_ben),
         salud_ben      = ifelse(is.na(salud_ben), 0, salud_ben),
         sa_ben         = ifelse(is.na(sa_ben), 0, sa_ben),
         wash_ben       = ifelse(is.na(wash_ben), 0, wash_ben),
         ms_ben_max     = ifelse(is.na(ms_ben_max), 0, ms_ben_max),
         ben_freq       = ifelse(is.na(ben_freq), 0, ben_freq),
         not_reached         = pob_pobre - beneficiarios,
         coverage_percent    = beneficiarios / pob_pobre,
         percent_total_ben_u = beneficiarios / sum(beneficiarios),
         multisector_percent = ms_ben_max / ben_freq, 
         org_present         = ifelse(beneficiarios > 0, TRUE, FALSE),
         pob_pobre_score     = rescale(pob_pobre, to = c(0,1)), 
         percent_pobre_score = rescale(percent_pobre, to = c(0,1)), 
         poverty_score       = (pob_pobre_score + percent_pobre_score) / 2)

# taking a subset of parr to only get parrishes where the 
# number of beneficiaries does not exceed the number of poor persons

parr0 <- parr %>% 
  filter(not_reached >= 1) %>% 
  mutate(gap_score = (rescale(not_reached, to = c(0,1)) + percent_pobre_score) / 2)

```

```{r write-csv-parr0, include = FALSE, eval = FALSE}
# evalled out -- change this if you want the csv for tableau or whatever
write_csv(parr0, "parr0.csv")
```

<br><br><br>  

## 2. Summary of coverage and gaps


### 2a. Map of parrishes by gaps

```{r MAP-REF, include = FALSE}
# reading in the shapefile
pcode3_shape <- st_read("C:/Users/Sean Ng/Documents/R/coverage_gaps_venezuela/ven_admbnda_adm3_20180502/ven_admbnda_adm3_20180502.shp",
                        quiet = TRUE) %>% 
  rename(pcode1 = ADM1_PCODE,
         pcode2 = ADM2_PCODE,
         pcode3 = ADM3_PCODE) %>% 
  mutate(pcode3 = recode(pcode3,
                         "VE030102" = "VE030101", # only Anaco exists, according to the census
                         "VE031901" = "VE031900", # somehow the census and the shapefiles conflict
                         "VE070301" = "VE070300")) %>%  # here too
# simplying to remove slivers -- let's see if this works 
# it's a bit abstract at 0.05, but the slivers ARE gone and it looks not bad 
# figuring this out was maddening, I can't believe you spent so long on cosmetic items 
  ms_simplify(keep = 0.05, keep_shapes = TRUE)

```

```{r coverage-map-not-covered-pobre}
# parrishes with negative poor persons are recoded as "0" so they won't mess up the scale
# even though this means that their tooltips are dropped 

gaps_map <- parr %>% 
  right_join(pcode3_shape, by = "pcode3") %>% 
  st_as_sf() %>% 
  mutate(not_reached = ifelse(not_reached < 0.1, 0, not_reached)) %>% 
  mutate(not_reached = round(not_reached, digits = 0)) %>%
  mutate_at(vars(percent_pobre, percent_urbana), ~(round(., digits = 2))) %>% 
  ggplot() +
  geom_sf(aes(fill = not_reached,
              text = paste0(parroquia,",", "\n", 
                           municipio, ",", "\n",
                           estado, "\n", 
                           "not reached: ", not_reached, "\n",
                           "org count: ", org_count, "\n",
                           "poverty incidence: ", percent_pobre, "\n",
                           "percent urban: ", percent_urbana)),
          size = 0.1) +
  scale_fill_viridis_c(option = "turbo", trans = "log10") +
  theme_void() +
  theme(legend.title = element_text(size = 7),
        legend.text = element_text(size = 7),
        plot.title = element_text(size = 11)) +
  labs(fill = "Poor persons \nnot reached") +
  ggtitle("Map of parrishes by gaps in population reached")

# so are you saying that if I change the fill to viridis in the later plot, I can use hoveron = fill?
# no you can't. 
ggplotly(gaps_map, tooltip = c("text")) %>%
  layout(showlegend = TRUE, legend = list(font = list(size = 6))) %>% 
  plotly::style(hoveron = "fill") %>% 
  layout(title = list(text = paste0("Map of parrishes by number poor persons not reached",
                                    "<br>",
                                    "<sup>",
                                    "mouse over for details; drag and click to select and zoom","</sup>")))

```

Nationwide, **`r format(round(sum(parr0$not_reached), digits = 0), big.mark = ",")`** poor persons have not been covered by response activities -- this means that **`r round(sum(parr0$not_reached) / sum(parr0$pob_pobre) * 100, digits = 1)`%** of all poor persons in the country have yet to be reached. This population, its distribution and its characteristics are some of the main concerns of this analysis. 

<br><br>

### 2b. Grouping parrishes by coverage type

As a starting point, all `r nrow(parr)` parrishes (admin level 3) have been split into three groups -- _over_, where the number of unique beneficiaries reached exceeds the number of poor persons in that parrish; _under_, where the coverage is less than the number of poor persons; and _no coverage_, comprising a total of **`r nrow(filter(parr, beneficiarios == 0))`** parrishes, where no activities have occurred. 



```{r}
parr %>% 
  mutate(coverage_type = case_when(not_reached <= 0 ~ "over",
                                   not_reached > 0 & beneficiarios >= 1 ~ "under", 
                                   beneficiarios == 0 ~ "no_coverage")) %>% 
  group_by(coverage_type) %>% 
  summarise(parrishes = n(),
            beneficiaries = sum(beneficiarios),
            not_reached = sum(not_reached), 
            avg_org_count = mean(org_count),
            percent_poor = round(sum(pob_pobre) / sum(poblacion_2019) * 100, digits = 1),
            percent_urban = round(sum(pob_urbana) /sum(poblacion_2019) * 100, digits = 1),
            percent_wo_safe_water = round(sum(pob_sin_agua_segura) / sum(poblacion_2019) * 100, digits = 1),
            percent_wo_improved_sanitation = round(sum(pob_sin_saneamiento_mejorado)/
              sum(poblacion_2019)  * 100, digits = 1),
            percent_illiterate = round(sum(pob_analfabeto) / sum(poblacion_2019) * 100, digits = 1),
            avg_sector_count = mean(sector_count)) %>% 
  pivot_longer(cols = -coverage_type, names_to = "variable") %>% 
  pivot_wider(names_from = coverage_type, values_from = value) %>% 
  
  relocate(no_coverage, .after = under) %>% 
  pander(big.mark = ",", caption = "Parrish characteristics by coverage type", style = "rmarkdown")

```

A total of **`r round(filter(parr, beneficiarios == 0) %>% {sum(.$pob_pobre)}, digits = 0) %>% format(big.mark = ",")`** poor persons reside in the **`r nrow(filter(parr, beneficiarios == 0))`** parrishes that have not been reached; this is only **`r round(filter(parr, beneficiarios == 0) %>% {sum(.$pob_pobre)} / sum(parr$not_reached) *100, digits = 0)`%** of the **`r round(sum(parr$not_reached), digits = 0) %>% format(big.mark = ",")`** poor persons not covered by response activities. This indicates that:

1. there is much room to expand in the parrishes where we are already present and that 
2. sparely populated, remote and, consequently, poorer parrishes have, so far, been left out of the response.

Additionally, the **`r nrow(parr[parr$not_reached <= 0,])`** parrishes in the _over_ category are much less poor and much more urban despite having **`r round(filter(parr, not_reached <= 0) %>% {sum(.$beneficiarios)} / sum(parr$beneficiarios) * 100, digits = 0)`%** of all beneficiaries. As can be seen from the row `not_reached`, the number beneficiaries in the _over_ category has greatly exceeded the number of poor persons.These parrishes are shown in the table in the next section.  


<br><br>

### 2c. Overallocation in the top parrishes by coverage

The **`r nrow(parr[parr$not_reached <= 0,])`** parrishes below (from the _over_ category) will largely be excluded in the remainder of this report as it is clear that no further resources should be allocated to them: 

```{r}
parr %>% 
  mutate(coverage_type = case_when(not_reached <= 0 ~ "over",
                                   not_reached > 0 & beneficiarios >= 1 ~ "under", 
                                   beneficiarios == 0 ~ "not_reached")) %>%  
  filter(coverage_type == "over") %>% 
  select(state = estado, municipality = municipio, parrish = parroquia, 
         beneficiaries = beneficiarios, poor_persons = pob_pobre) %>%
  mutate(coverage_percent = beneficiaries / poor_persons * 100) %>% 
  arrange(desc(beneficiaries)) %>% 
  pander(big.mark = ",", caption = "Top 11 parrishes by coverage", style = "rmarkdown")

```

As a note, it is likely that partners have reported activities which occurred in other parts of the capital in Altagracia, as the total number of beneficaries reached in the whole of Distrito Capital is only `r filter(parr, municipio == "LIBERTADOR") %>% {sum(.$beneficiarios)} %>% format(big.mark = ",")`. It is necessary to check back with partners about this; nevertheless, this is the information we have on hand. 

<br><br>

### 2d. Sex ratios by cluster

```{r}
act_ben %>% 
  mutate(sex_ben = case_when(str_detect(desagregacion, "^m") ~ "male",
                             str_detect(desagregacion, "^f") ~ "female")) %>% 
  group_by(sector) %>% 
  rename(cluster = sector) %>% 
  summarise(ben_freq = sum(beneficiarios),
            male = sum(beneficiarios[sex_ben == "male"]), 
            female = sum(beneficiarios[sex_ben == "female"]),
            sex_ratio = round(male / female, digits = 2)) %>% 
  mutate(`%_ben_freq` = round((ben_freq / sum(ben_freq)) * 100, digits = 1)) %>% 
  relocate(`%_ben_freq`, .after = ben_freq) %>% 
  arrange(sex_ratio) %>% 
  pander(caption = "Sex ratio of beneficiary frequencies by cluster")
```

All of the clusters reached more women than men, with Health and Protection being particularly heavily skewed in this regard. For Health, the disproportionality is a bit more understandable as it has a focus on antenatal and obstetric care as well as preventing mother-to-child HIV transmission; Nutrition similarly has a focus on pregnant and lactating women. However, for Protection, some investigation is necessary: 

```{r}
act_ben %>% 
  mutate(sex_ben = case_when(str_detect(desagregacion, "^m") ~ "male",
                             str_detect(desagregacion, "^f") ~ "female"),
         activity = str_to_sentence(actividad)) %>% 
  filter(sector == "Proteccion") %>% 
  group_by(activity) %>% 
  summarise(ben_freq = sum(beneficiarios), 
            male = sum(beneficiarios[sex_ben == "male"]), 
            female = sum(beneficiarios[sex_ben == "female"])) %>% 
  arrange(desc(ben_freq)) %>% select(-ben_freq) %>% head(5) %>% 
  pander(caption = "Top 5 Protection activities by beneficiaries", 
         justify = c("left", "centre", "centre"))
```

<br>

None of the top 5 -- the issue of birth certificates; legal assistance; awareness raising on violence, exploitation, abuse and family separation; or psychosocial support -- should be predisposed to reach females over males (and in the case of psychosocial support, it is not). The cluster needs to verify these figures and revisit its targetting and beneficiary selection strategies. 

For reference, the sex ratio of the country as a whole was 0.99 in the last census -- this figure should not have been affected too much by the migrants and refugees who have left the country, as IOM's [Displacement Tracking Matrix](https://reliefweb.int/sites/reliefweb.int/files/resources/1-demographic%28V2%29ML.pdf) reports that population as being 49% female and 51% percent male.  


<br><br><br><br>

## 3. Geographical analysis of gaps

### 3a. Barplot of coverage and gaps by state


```{r parr0-state-PLOT}
# ordering the states 
state_ord <- parr %>% 
  group_by(estado) %>% 
  summarise(beneficiarios = sum(beneficiarios), 
            total = sum(pob_pobre)) %>% 
  mutate(percent_reached = beneficiarios / total) %>% 
  arrange(percent_reached) %>% select(estado) %>% pull()
  
stack_text <- parr %>% 
  group_by(estado) %>% 
  summarise(beneficiarios = sum(beneficiarios),
            total = sum(pob_pobre),
            not_reached = sum(not_reached)) %>% 
  mutate(percent_reached = round(beneficiarios / total * 100, digits = 1)) %>% 
  arrange(percent_reached) %>% 
  mutate(percent_reached = paste0(percent_reached,"%")) 

state_stack <- parr %>% 
  select(estado, beneficiarios, not_reached) %>% 
  group_by(estado) %>%
  summarise(beneficiaries = round(sum(beneficiarios), digits = 0), 
            not_reached = round(sum(not_reached), digits = 0)) %>% 
  pivot_longer(c(beneficiaries, not_reached),
               names_to = "pob_type", values_to = "total") %>% 
  
  ggplot(aes(x = total, y = estado)) +
  geom_col(aes(fill = pob_type), colour = "grey70", size = 0.03) +
  scale_x_continuous(label = comma, breaks = seq(0, 1800000, 200000)) +
  scale_y_discrete(limits = state_ord) +
  scale_fill_manual(values = c("#DE7065FF", "#403891FF")) +
  geom_text(data = stack_text, aes(label = percent_reached), 
            size = 2.5, colour = "white", fontface = "bold",
            position = position_stack(vjust = 0.5)) +
  ylab("") + xlab("Number of poor persons") + 
  labs(fill = "", colour = "",
       title = "Barplot of poor persons by state by reached/not reached") +
  theme(plot.title   = element_text(size = 11), 
        axis.text.x = element_text(size = 7, angle = 30),
        axis.title.x = element_text(size = 9)) 

ggplotly(state_stack) %>% 
  layout(legend = list(font = list(size = 7))) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0(
    "Barplot of poor persons by state by not reached/reached",
                                    "<br>",
                                    "<sup>",
                                    "mouse over for details; figures show percent reached","</sup>")))

```

**`r round(filter(parr, estado %in% c("DISTRITO CAPITAL", "MIRANDA", "ZULIA", "TACHIRA", "BOLIVAR")) %>% {sum(.$beneficiarios)} / sum(parr$beneficiarios) * 100, digits = 1)`%** of all beneficiaries are from the states of Distrito Capital, Miranda, Tachira, Bolivar and Zulia, largely corresponding to the locations of UNICEF offices. These states, with the addition of Delta Amacuro, are also where the highest percentages of poor persons have been reached. Barinas has the lowest percentage of its poor population covered.

On average, after the exclusion of the top `r nrow(parr[parr$not_reached <= 0,])` parrishes, **`r round(sum(parr$beneficiarios) / sum(parr$pob_pobre) * 100, digits = 1)`%** of poor persons have been reached countrywide. However, at the state level, this average is **`r parr0 %>% group_by(estado) %>% summarise(percent_reached = beneficiarios / pob_pobre, .groups = "drop") %>% ungroup() %>% summarise(avg = mean(percent_reached) * 100, .groups = "drop") %>% pull() %>% round(digits = 1)`%**. 

Whilst there are many poor persons yet to be reached in states where we have relatively high coverage, there is a need to ensure that our operational footprint and the consequent resources allocated are equitable and this type of overallocation is avoided -- the crisis in Venezuela is nationwide, and unlike an earthquake or a typhoon where there is an epicentre or a stormpath, there is no programmatic rationale to only focus on a few areas. 

Let us now move down to a lower level of granularity as state-level analysis is still too superficial. Parrishes will be the main administrative unit of reference in this analysis. Unlike in the [5W cleaning and reporting document](https://seanywng.github.io/5W/#B_Reporting_on_the_5W_data), where we focused on municipalities to display achievements for external audiences, greater precision is needed for a coverage and gaps analysis.

<br><br>

### 3b. Scatterplot of gaps by parrish 

From the scatterplot below -- where each point is a parrish -- we see that there is great variation both in the number of poor persons not covered (size and x-axis) as well as how concentrated they are in a given parrish (y-axis, poverty incidence); both these factors weigh heavily in programming strategies as well as in the ease of beneficiary selection.  

The greatest numbers of not reached are found in parrishes between the ranges of _poor persons:_ 10,000-100,000 and _poverty incidence:_ 0.25-0.50 (marked by the yellow box); however, these parrishes also have a much higher than average number of organisations present (more red). This means that operational barriers are much lower in accessing these populations than the parrishes in light blue found in the middle of the plot.

```{r parrplot-PLOTLY, fig.height=6}

parrplot <- parr0 %>% 
  mutate_at(vars(pob_pobre, not_reached, org_count), ~(round(.))) %>% 
  mutate(percent_pobre = round(percent_pobre, digits = 2))%>% 
  ggplot(aes(x = not_reached, y = percent_pobre, 
             colour = org_count, 
             text = paste0(parroquia,",", "\n", 
                           municipio, ",", "\n",
                           estado))) +
   geom_rect(aes(xmin = 10000, xmax = 100000, ymin = 0.25, ymax = 0.50),
            fill = "transparent", colour = "gold", size = 0.2) +
  geom_jitter(aes(size = not_reached), alpha = 0.75) +
  scale_colour_gradientn(
    colours = c("cornflowerblue", "tomato", "firebrick")) +
  scale_x_continuous(trans = "log10", labels = comma) + 
  scale_size_continuous(range = c(0.3, 5)) +
  xlab("Not covered poor") + ylab("Poverty incidence") +
  labs(colour = "Number of \norganisations", 
       title = "Scatterplot of parrishes by poor persons not covered and poverty incidence") +
  theme(legend.title = element_text(size = 7),
        legend.text = element_text(size = 7),
        plot.title = element_text(size = 11),
        axis.title = element_text(size = 8.5)) 
  

ggplotly(parrplot, tooltip = c("y", "x", "size", "text", "colour")) %>% 
           layout(showlegend = TRUE, legend = list(font = list(size = 7))) %>%
           config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0(
    "Scatterplot of parrishes by number poor persons not reached and poverty incidence",
                                    "<br>",
                                    "<sup>",
                                    "size: number of poor persons not reached; colour: number of organisations present; drag and click to select and zoom","</sup>")))

```

For agencies truly unable to expand outside their current footprints, there are still many beneficiaries who are not covered or -- as we will discuss in the next chapter -- have only been reached with the interventions of one sector.

<br><br><br><br>  

## 4. Multi-sector programming

Humanitarian emergencies are multidimensional and needs affected persons are not delineated by cluster. Phenomena such as displacement or food insecurity result from the complex interplay between numerous underlying factors and the shocks and stresses of the hazard. Multi-sector or integrated programming, consists of implementing layers of individual, household and community-level interventions to comprehensively meet the needs of a target population. It is often held up as a key marker of programme quality in strategy documents and humanitarian standards, but rarely achieved in practice. 


### 4a. Summary table of multi-sector coverage 

However, just because two Clusters operate in the same area do not mean their beneficiaries coincide. As an estimate, we calculated a theoretical maximum number of multi-sector beneficiaries, expressed below as `multi_sector_ben` (explanation and calculation in the code chunk below). But, to elaborate with an example, we make the charitable assumption that that females under 18 who are beneficiaries of Nutrition and females under 18 in that same parrish who are beneficiaries of Protection are the same people -- we then sum the various age and sex disaggregation subtotals by parrish to determine the maximum number of persons who could have received multi-sector support. The actual number of multi-sector beneficiaries is likely much lower. 

This data has been summarised below, according to the number of sectors present in a parrish: 

```{r multi-sector-ben-TABLE-and-REF}
# Notes on calculation for multi-sector beneficiaries. 

# Basically, beneficiaries per parrish are aggregated into 
# sector subtotals and a beneficiary frequency total.  
# The maximum value of the sector subtotals is compared against the beneficiary frequency total, 
# if the maximum value is equal to the frequency total, then there is only one sector,
# if the maximum value is less than the frequency total, 
# the difference between the two (or the sum of all other sector subtotals) 
# becomes the theoretical maximum number of multisector beneficiaries.

# Performing this calculation at admin level 3 makes sense as a parrish is small enough
# that there it is realistic to assume that overlaps in beneficiaries between sectors exist --
# i.e. that females under 18 in a parrish who are beneficiaries of nutrition 
# and females under 18 in that same parrish who are beneficiaries of WASH are the same people. 

# Although I do feel this calculation to be very charitable 
# We can't really do much more unless there is a beneficiary register. 
# The real number of multisector beneficiaries is likely MUCH LOWER 
# but that can only be verified through sampled large-scale post-distribution/post-intervention monitoring,
# which is extremely rare. 
# I actually could have raised this with the third-party monitors that UNICEF, 
# so it's an oversight on my part as well.

# These operations were already performed in "A note on the data" and are part of `parr` and are
# therefore, commented out here 

# act_ben %>%
#       group_by(pcode3, desagregacion, sector) %>% 
#       pivot_wider(names_from = sector, values_from = beneficiarios) %>% 
#       rename(nutricion_ben = Nutricion, proteccion_ben = Proteccion, wash_ben = WASH, 
#              salud_ben = Salud, educacion_ben = Educacion, sa_ben = Seguridad_Alimentaria) %>% 
#       replace_na(list(nutricion_ben = 0, educacion_ben = 0, wash_ben = 0, salud_ben = 0,
#                       sa_ben = 0, proteccion_ben = 0)) %>%
#       summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, 
#                          educacion_ben, salud_ben, sa_ben), sum)) %>%
#       mutate(ben_freq    = nutricion_ben + proteccion_ben + wash_ben + salud_ben + educacion_ben + sa_ben,
#              sec_ben_max = pmax(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben, sa_ben),
#              ms_ben_max  = ifelse(sec_ben_max >= ben_freq - sec_ben_max, 
#                                   ben_freq - sec_ben_max, 
#                                   sec_ben_max)) %>%
#           group_by(pcode3) %>% 
#           summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben,
#                                 salud_ben, sa_ben, ben_freq, sec_ben_max, ms_ben_max), sum)) %>% 
#           mutate(sector_count = rowSums(select(., ends_with("_ben")) != 0))

# using parr means that all parrishes are included, but they are needed for the percentage calculations. 

parr %>%
  filter(ben_freq != 0) %>% 
  group_by(sector_count) %>% 
  summarise(parrishes = n(),
            multi_sector_ben = sum(ms_ben_max),
            one_sector_ben = sum(ben_freq) - sum(ms_ben_max),
            `multisector_%` = round(sum(ms_ben_max) / sum(ben_freq) * 100, digits = 2)) %>% 
  pander(style = "rmarkdown", caption = "Summary of multisector coverage")

```


```{r REF-just-for-vaccination-percentage, include=FALSE}
percent_w_vac <- ven1 %>% 
  pivot_longer(f_0_18:m_18plus, names_to = "desagregacion", values_to = "beneficiarios") %>% 
 #filter(categoria != "VACUNACION") %>% # Vaccination activities filtered out
  filter(beneficiarios != 0) %>% 
  group_by(pcode3, ubicacion, desagregacion, actividad) %>% 
  slice(which.max(beneficiarios)) %>% 
  ungroup() %>% 
  mutate(sector = ifelse(str_detect(sector, "Proteccion_GBV|Proteccion_General|Proteccion_NNA"), 
                         "Proteccion", sector)) %>% 
  group_by(pcode3, desagregacion, sector) %>% 
      pivot_wider(names_from = sector, values_from = beneficiarios) %>% 
      rename(nutricion_ben = Nutricion, proteccion_ben = Proteccion, wash_ben = WASH, 
             salud_ben = Salud, educacion_ben = Educacion, sa_ben = Seguridad_Alimentaria) %>% 
      replace_na(list(nutricion_ben = 0, educacion_ben = 0, wash_ben = 0, salud_ben = 0,
                      sa_ben = 0, proteccion_ben = 0)) %>%
      summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, 
                         educacion_ben, salud_ben, sa_ben), sum)) %>%
      mutate(ben_freq    = nutricion_ben + proteccion_ben + wash_ben + salud_ben + educacion_ben + sa_ben,
             sec_ben_max = pmax(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben, sa_ben),
             ms_ben_max  = ifelse(sec_ben_max >= ben_freq - sec_ben_max, 
                                  ben_freq - sec_ben_max, 
                                  sec_ben_max)) %>%
          group_by(pcode3) %>% 
          summarise(across(c(nutricion_ben, proteccion_ben, wash_ben, salud_ben, educacion_ben,
                                salud_ben, sa_ben, ben_freq, sec_ben_max, ms_ben_max), sum)) %>% 
          mutate(sector_count = rowSums(select(., ends_with("_ben")) != 0)) %>% 
  summarise(ms_percent_w_vac = sum(ms_ben_max) / sum(ben_freq))

round(percent_w_vac$ms_percent_w_vac[1] * 100, digits = 1)
  
```


Overall, the results are not encouraging -- a maximum of **`r round(sum(parr$ms_ben_max) / sum(parr$ben_freq) * 100, digits = 1)`%** of beneficiaries (outside of the top 11 parrishes) could potentially covered by multi-sector support. When vaccinations are included, this percentage drops to **`r round(percent_w_vac$ms_percent_w_vac[1] * 100, digits = 1)`%**. But, as mentioned, vaccinations will be excluded from this analysis as government partners were not able to accurately provide records at the parrish level, many times defaulting to municipal or state-level reporting; additionally, the footprint for vaccination activities is determined by federal government priorities (that may not align with the humanitarian imperative), which UNICEF is also unable to influence. 

As the leader of the Education, Nutrition, WASH and Child Protection Clusters, UNICEF supported activities that reached **`r round((filter(act_ben, org_lider == "UNICEF") %>% {sum(.$beneficiarios)}) / sum(act_ben$beneficiarios) * 100, digits = 1)`%** of all beneficiaries. Meaning that this the low percentage of multi-sector support could largely be resolved by better internal coordination and better programmatic oversight within UNICEF -- these issues will be even more apparent in sections 4c and 4d. 

<br><br>

### 4b. Parrish-level gaps in multi-sector programming

A total of **`r round(sum(parr0$ben_freq) - sum(parr0$ms_ben_max), digits = 0) %>% format(big.mark = ",")`** beneficiaries received support from only one sector -- this is **`r round((sum(parr$ben_freq) - sum(parr$ms_ben_max)) / sum(parr$ben_freq) * 100, digits = 1)`%** of all beneficiary frequencies. Parrishes below have been plotted according to their multi-sector coverage and their total number of beneficiary frequencies; larger sizes indicate parrishes where there are higher numbers of beneficiaries benefitting from only one sector: 

```{r}
ms_scatter <- parr0 %>% 
  mutate(multi_sector_percent = round(ms_ben_max / ben_freq * 100, digits = 1),
         one_sector_percent = round((ben_freq - ms_ben_max) / ben_freq * 100, digits = 1),
         multi_sector_ben = round(ms_ben_max, digits = 0),
         one_sector_ben = round(ben_freq - ms_ben_max, digits = 0),
         ben_freq = round(ben_freq, digits = 0)) %>% 
  ggplot(aes(x = ben_freq, 
             y = multi_sector_percent, 
             text = paste0(parroquia,",", "\n", 
                           municipio, ",", "\n",
                           estado, ",", "\n",
                           "sector count: ", sector_count),
             colour = estado)) +
  geom_point(aes(size = one_sector_ben), 
             alpha = 0.8) +
  scale_x_continuous(trans = "log10", labels = label_comma(accuracy = 1)) +
  scale_size_continuous(range = c(0.1, 5)) +
  scale_colour_manual(values = c(rep("coral",24))) +
  xlab("Beneficiary frequencies") + ylab("Percentage received multi-sector support") +
  labs(title = "Scatterplot of parrishes by beneficiary frequencies and multi-sector coverage", 
       size = "", colour = "") +
  theme(plot.title = element_text(size = 11),
        axis.title = element_text(size = 8.5)) 

ggplotly(ms_scatter, tooltip = c("x", "y", "size", "text")) %>% 
  layout(legend = list(font = list(size = 6))) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0(
    "Scatterplot of parrishes by beneficiary frequencies and multi-sector coverage",
                                    "<br>",
                                    "<sup>",
                                    "size: number of beneficiaries supported by only one sector; double-click state to toggle view; mouse over for details","</sup>")))

```

We note only a very loose relationship (r-squared = `r cor(parr0$multisector_percent, parr0$ben_freq, use = "complete.obs")^2 %>% round(digits = 3)`) between the number of beneficiary frequencies and the percentage of those beneficiary frequencies reached by interventions from multiple sectors. There is little else it is correlated with and, as can be seen above, there is no discernible pattern to multisector coverage. There are the **`r parr0 %>% filter(ben_freq != 0 & ms_ben_max == 0) %>% nrow()`** parrishes at the bottom of the plot with only one sector present in each -- this is **`r round((parr0 %>% filter(ben_freq != 0 & ms_ben_max == 0) %>% nrow()) / (parr0 %>% filter(ben_freq != 0) %>% nrow()) * 100, digits = 1)`%** of all parrishes we are responding in. 

```{r avg-sec-state-TABLE}

parr0 %>% 
  filter(beneficiarios != 0) %>% 
  select(estado, sector_count) %>% 
  group_by(estado) %>% 
  summarise(avg_sector = round(mean(sector_count), digits = 2)) %>% 
  arrange(desc(avg_sector)) %>% 
  filter(avg_sector > 2 | avg_sector < 1.15) %>% 
  pivot_wider(names_from = estado, values_from = avg_sector) %>% 
  mutate(`|` = c("|")) %>% relocate(`|`, .after = ZULIA) %>% 
  pander(caption = "Top 5 and bottom 5 states, average number of sectors per parrish", missing = "")

```

However, we do note that in the states where UNICEF has offices (Bolivar, Tachira, Zulia and Distrito Capital), there is much higher multi-sector coverage than in other states, perhaps indicating that a more decentralised approach where field offices have greater say in prioritisation might lead to more coordination in multi-sector programming. Though, that this greater multi-sector coverage has not expanded outside of these states to the other areas under their purview is indicative of the level of planning and coordination capacity field offices are capable of. 

<br><br>

### 4c. Cluster combinations

This section will examine the parrishes that have multi-sector coverage and the types of inter-cluster combinations that can be found in them. Let us begin with an overview of the geographic reach of each cluster -- we use frequencies here, as each individual might have benefitted from multiple combinations of clusters: 

```{r clust-TABLE}
parr %>% 
  select(pcode3, educacion_ben, nutricion_ben, salud_ben, wash_ben, proteccion_ben) %>%
  rename(Educacion = educacion_ben, Nutricion = nutricion_ben, Salud = salud_ben,
         WASH = wash_ben, Proteccion = proteccion_ben) %>% 
  pivot_longer(cols = 2:6, names_to = "cluster", values_to = "beneficiary_frequencies") %>% 
  filter(beneficiary_frequencies != 0) %>% 
  group_by(cluster) %>% 
  summarise(parrishes = n(),
            beneficiary_frequencies = sum(beneficiary_frequencies)) %>% 
  pander(caption = "Cluster coverage summary, excluding vaccination", big.mark = ",", style = "rmarkdown", 
         justify = c("left", "right", "right"))
  
```

<br> 

Next, we will summarise the inter-cluster combinations according to: 

* **combination**, referring to the various cluster-wise pairs that exist; 
* **parrishes**, indicating the number of parrishes each combination is present in; 
* **cluster1** and **cluster2**, which show the number of beneficiary frequencies reached by both clusters in each pair, in the order that they appear in **combination**. 
* **pair_sum**, which shows the total number of beneficiary frequencies in that pair i.e. the pair_sum for edu_nut would be the sum of education and nutrition beneficiaries. 
* **%ms_max**, which shows the maximum percentage of multisector beneficiaries of each pair i.e. if the pair edu-nut has 10 education beneficiaries and 30 nutrition beneficiaries, the maximum number of beneficiaries which received support from both sectors is 10, resulting in a **%ms_max** of 25%. But, as mentioned in the notes for section 4a, this is just a theoretical maximum and the actual level of coincidence is likely much lower.

```{r clust-com-REF-and-TABLE}
# creation of reference df for the cluster combinations 
clust_com <- parr %>% 
  filter(ben_freq != 0) %>% 
  select(pcode3, ben_freq, educacion_ben, nutricion_ben, salud_ben, wash_ben, proteccion_ben) %>%
  # mutate a new column for each combination of sectors -- if edu is the first cluster in the combination, 
  # only education beneficiaries will be used to fill values in the column 
  mutate(edu_only = ifelse(educacion_ben == ben_freq, educacion_ben, 0),
         edu_nut = ifelse(educacion_ben > 0 & nutricion_ben > 0, educacion_ben + nutricion_ben, 0),
         edu_sal = ifelse(educacion_ben > 0 & salud_ben > 0, educacion_ben + salud_ben, 0),
         edu_wash = ifelse(educacion_ben > 0 & wash_ben > 0, educacion_ben + wash_ben, 0),
         edu_prot = ifelse(educacion_ben > 0 & proteccion_ben > 0, educacion_ben + proteccion_ben, 0),
         nut_only = ifelse(nutricion_ben == ben_freq, nutricion_ben, 0),
         nut_sal = ifelse(nutricion_ben > 0 & salud_ben > 0, nutricion_ben + salud_ben, 0), 
         nut_wash = ifelse(nutricion_ben > 0 & wash_ben > 0, nutricion_ben + wash_ben, 0), 
         nut_prot = ifelse(nutricion_ben > 0 & proteccion_ben > 0, nutricion_ben + proteccion_ben, 0),
         sal_only = ifelse(salud_ben == ben_freq, salud_ben, 0),
         sal_wash = ifelse(salud_ben > 0 & wash_ben > 0, salud_ben + wash_ben, 0),
         sal_prot = ifelse(proteccion_ben > 0 & salud_ben > 0, salud_ben + proteccion_ben, 0),
         wash_only = ifelse(wash_ben == ben_freq, wash_ben, 0),
         prot_wash = ifelse(wash_ben > 0 & proteccion_ben > 0, wash_ben + proteccion_ben, 0),
         prot_only = ifelse(proteccion_ben == ben_freq, proteccion_ben, 0)) %>% 
  # pivot_longer to the clust_freq column 
  pivot_longer(names_to = "combination", values_to = "pair_sum", 8:22) %>%
  filter(pair_sum != 0) %>% 
  group_by(pcode3, combination) %>% 
  summarise(educacion_ben = mean(educacion_ben),
            nutricion_ben = mean(nutricion_ben),
            salud_ben = mean(salud_ben),
            wash_ben = mean(wash_ben),
            proteccion_ben = mean(proteccion_ben),
            pair_sum = sum(pair_sum)) %>% 
  # calculating the sum of frequencies in each pair
  mutate(cluster1 = 
           case_when(
             str_detect(combination, "edu_nut") ~ educacion_ben,
             str_detect(combination, "edu_sal") ~ educacion_ben,
             str_detect(combination, "edu_wash") ~ educacion_ben,
             str_detect(combination, "edu_prot") ~ educacion_ben,
             str_detect(combination, "nut_sal") ~ nutricion_ben,
             str_detect(combination, "nut_wash") ~ nutricion_ben,
             str_detect(combination, "nut_prot") ~ nutricion_ben,
             str_detect(combination, "sal_wash") ~ salud_ben,
             str_detect(combination, "sal_prot") ~ salud_ben,
             str_detect(combination, "prot_wash") ~ proteccion_ben,
             str_detect(combination, "edu_only") ~ educacion_ben,
             str_detect(combination, "nut_only") ~ nutricion_ben,
             str_detect(combination, "sal_only") ~ salud_ben,
             str_detect(combination, "wash_only") ~ wash_ben,
             str_detect(combination, "prot_only") ~ proteccion_ben)) %>%
  mutate(cluster2 = 
           case_when(
             str_detect(combination, "edu_nut") ~  nutricion_ben,
             str_detect(combination, "edu_sal") ~  salud_ben,
             str_detect(combination, "edu_wash") ~ wash_ben,
             str_detect(combination, "edu_prot") ~ proteccion_ben,
             str_detect(combination, "nut_sal") ~  salud_ben,
             str_detect(combination, "nut_wash") ~ wash_ben,
             str_detect(combination, "nut_prot") ~ proteccion_ben,
             str_detect(combination, "sal_wash") ~ wash_ben,
             str_detect(combination, "sal_prot") ~ proteccion_ben,
             str_detect(combination, "prot_wash") ~ wash_ben,
             str_detect(combination, "only$") ~ 0)) %>%
  select(pcode3, combination, cluster1, cluster2, pair_sum)

# pander table cluster combinations 
rbind(
clust_com %>%  
  filter(str_detect(combination, "only$")) %>% 
  group_by(combination) %>% 
  summarise(parrishes = n(),
            cluster1 = round(sum(cluster1), digits = 0),
            cluster2 = round(sum(cluster2), digits = 0),
            pair_sum = round(sum(pair_sum), digits = 0)) %>%
  mutate(`%ms_max` = pmin(cluster1, cluster2) / pair_sum * 100,
         `%ms_max` = round(ifelse(is.nan(`%ms_max`), 0, `%ms_max`), digits = 1)) %>%
  arrange(desc(pair_sum)) %>% 
  rbind(NA),

clust_com %>%  
  filter(!str_detect(combination, "only$")) %>% 
  group_by(combination) %>% 
  summarise(parrishes = n(),
            cluster1 = round(sum(cluster1), digits = 0),
            cluster2 = round(sum(cluster2), digits = 0),
            pair_sum = round(sum(pair_sum), digits = 0)) %>% 
  mutate(`%ms_max` = pmin(cluster1, cluster2) / pair_sum * 100,
         `%ms_max` = round(ifelse(is.nan(`%ms_max`), 0, `%ms_max`), digits = 1)) %>%
  arrange(desc(pair_sum))

) %>% 

  kable(caption = "Cluster combinations, sorted by pair_sum", format.args = list(big.mark = ",")) %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

  
```

<br>

The most common pairing was between the **Education** and **Nutrition** clusters -- they coincide in `r clust_com %>% filter(combination == "edu_nut") %>% nrow()` parrishes, followed by **Nutrition** and **Protection**. In the next section, we will investigate whether the substantial overlap between Nutrition and Protection at the parrish level was coordinated or -- as in the case with its overlap with Education, where there are no concrete programmatic links -- due more to its wide operational presence.  

**Protection** and **WASH**, however, both do have explicit programmatic links (in the logframe) with **Education** and co-occur with it in `r clust_com %>% filter(combination == "edu_prot") %>% nrow()` and `r clust_com %>% filter(combination == "edu_wash") %>% nrow()` parrishes respectively. A fruitful avenue of investigation would be how many beneficiaries of Education also benefitted from Protection interventions and how close it is to the `r round(clust_com %>% filter(combination == "edu_prot") %>% {sum(.$cluster2) / sum(.$pair_sum)} * 100, digits = 1)`% theoretical maximum.

**Nutrition** operates alone in `r clust_com %>% filter(combination == "nut_only") %>% nrow()` parrishes out of the `r act_ben %>% filter(sector == "Nutricion") %>% distinct(pcode3) %>% nrow()` that it is present in, this is the most out of any of the other clusters -- it is necessary to evaluate the extent to which other clusters can make use of the footholds established by Nutrition. 

Whilst **Nutrition** and **Health** have excellent programmatic complementarity, especially with Health's focus on obstetric, antenatal and neonatal care, this combination has the second-lowest number of beneficiary frequencies of all the combinations. 

**WASH** overall has excellent programmatic overlap with all other clusters; and was the only cluster to programme specific multi-sector interventions -- WASH in schools and WASH in health/nutrition centres. And almost none of WASH's beneficiary frequencies occurred in parrishes where no other clusters were present. Its great reach and blanket coverage (especially water supply and other community-level activities) mean that other clusters operating in the same areas as WASH are "guaranteed" to reach beneficiaries with multi-sector programming -- the challenges of these combinations being

1. the intentionality of the multi-sector coverage and 
2. matching the scale of WASH activities. 

Similar to WASH and Health, **Protection** has very limited beneficiary frequencies in parrishes where it operates alone. Protection coincides the most with Nutrition -- this should serve as an impulse for the creation of referral pathways between the two since both carry out screening activities, made easier by the fact that both manage some form of beneficiary-level database. Protection has the most explicit progammatic links to Education in the logframe.

<br><br>

### 4d. Activity categories 

To close chapter 4, the state of multi-sector programming is poor: as we have noted, there is little intentionality in deciding which areas have multi-sector coverage and which areas do not -- multi-sector links do exist at the activity level, but this is a poor approximation of integrated programming. And we see the weakness of this approach in the table below, which lists the most common inter-cluster activity category combinations at the parrish level.  

```{r cc-categoria-TABLE}

act_ben %>% 
  select(pcode3, categoria) %>% 
  distinct() %>% 
  filter(!str_detect(categoria, "OTRO")) %>% 
  pairwise_count(categoria, pcode3, upper = FALSE) %>% 
  arrange(desc(n)) %>% 
  left_join(act_ben %>% select(sector1 = sector, categoria) %>% distinct(), 
            by = c("item1" = "categoria")) %>% 
  left_join(act_ben %>% select(sector2 = sector, categoria) %>% distinct(), 
            by = c("item2" = "categoria")) %>% 
  filter(sector1 != sector2) %>% 
  mutate_at(vars(item1, item2), ~str_to_title(.)) %>% 
  select(actvity_category1 = item1, activity_category2 = item2, count = n) %>% 
  head(15) %>% 
  kbl(caption = "Most common inter-cluster activity category combinations") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
  
```
<br> 

Of the 15 most common activity category combinations, we see only four combinations which might actually convey multi-sector benefits: 

* The 4th and 11th entries, **Prevention of acute malnutrition / Provision of protection services** and **Provision of protection services / Treatment of acute malnutrition** where referrals between the Nutrition and Protection clusters might feasibly result in vulnerable populations receiving a combination of protection services, micronutrient supplementation, de-worming, treatment and counselling that would reduce their vulnerability in a multidimensional manner; the

* 9th entry, **Water supply in communities / Prevention of acute malnutrition**, as improved water supply is a core component of preventing malnutrition; and the

* 11th, **Access to education and student retention / Provision of protection services**, as the population of children with poor access to education and are in danger of dropping out would, presumably, be more in need of protection services (which include referrals to the formal social welfare system, legal aid, support for GBV survivors and UASC). 

However, as mentioned previously, these multi-sector benefits are theoretical and it has not been established that there is communication between the implementing partners of these activities. Project monitoring is required to establish this. 

Combinations like **Access to education and student retention / Prevention of acute malnutrition** (1st) deal with almost entirely separate populations -- children of schoolgoing age are outside of the target population for Nutrition; though it is feasible that Education and Nutrition could both be dealing with the same young mother who has dropped out of school. 

Furthermore, combinations like **Prevention of acute malnutrition / Resilience in education** (2nd) and **Capacity building in education / Prevention of acute malnutrition** (3rd) have no overlap as malnutrition prevention has no programmatic link to safe school strategies, DRR in schools or teacher training.

As it stands, cluster footprints seem to have much more to do with opportunistic expansion strategies dependent on partners' preferred areas than a needs-based approach which targets the most vulnerable. The next chapter is an effort to examine and correct this. 

<br><br><br><br>  



## 5. Decision trees

### 5a. Introduction to trees -- organisational presence

To prioritise between the `r nrow(parr)` parrishes in Venezuela -- that is, to determine where we should be working -- it is necessary to split them up into more easily digestible groups, and we will use decision trees to do this. A prioritisation or vulnerability score is another commonly-used prioritisation tool, but, as we will see, collapsing a number of variables down into one score is not always helpful. 

To understand how a decision tree functions, let us construct one to predict whether or not there is a humanitarian agency present in a parrish (`org_present` -- this is our dependent variable). We have supplied our model with a basket of census indicators from which it will construct a tree to predict our dependent variable -- unhide the code below to see the full model. The decision tree printed below is the result: 

```{r tree2}
# just to show the decision tree of how partners seem to have chosen locations. 
# using full parr dataset for the tree
# no doubt there are other factors, but this is the data I have -- 
# looking at specific partner characteristics would be interesting. 
set.seed(3000)

tree2 <- parr %>% 
  rpart(org_present ~ percent_pobre + percent_urbana + densidad_ppl_km2 + 
        razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +  
        percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
        percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina, 
        promedio_de_personas_por_vivienda, data = ., minbucket = 100)

fancyRpartPlot(tree2, digits = -3, sub = "", palettes = "Blues", type = 2)
```

To understand the plot above, all parrishes have been split into four groups (the terminal nodes at the bottom marked **[4]**, **[5]**, **[6]** and **[7]**) based on the percentage of parrishes in each node where humanitarian agencies are present. Each node has three figures -- for instance, the root, at the top, and marked **[1]**, shows that on average, 0.544 or 54.4% of all parrishes have humanitarian agencies present in them. The next numbers, "n = 1109" shows that `r nrow(parr)` parrishes are in that group and next to it is the percentage of parrishes it contains, which, since it is the root, is 100%. 

We see that **[7]** (in dark blue) is the node with the highest concentration of parrishes with agencies present (84.3%); it consists of parrishes more than 79.4% urbanised and denser than 156 ppl/km2. And **[4]**, the node with the lowest concentration of parrishes with agencies (25.6%) is less than 79.4% urbanised _and_ less than 21.8% urbanised. 

This is, of course, not to imply that this actually depicts partners' decision-making process, just that these are the factors towards which we, as a response, are predisposed. Perhaps it is understandable that the most heavily populated parrishes are more likely to have organisations present, though population density and urban population are both negatively correlated with poverty incidence. The largest determinants of the _number_ of beneficiaries reached per parrish are population density and urbanisation, as beneficiary numbers tend to scale in line with larger populations. 

<br><br>

### 5b. Prioritisation tree

Now let us focus on future decisions: 

Several trees were built and trialled to split parrishes into targetting groups. As mentioned, the independent variables come from a pool of indicators from the census dataset, with some originating from the 2019 UNICEF Municipal Prioritisation Tool, which was a Principal Components Analysis of key variables related to poverty, health and mortality and violence and insecurity. After numerous iterations, **`tree3`** was chosen; it splits parrishes into groups according to the:  

* The poverty score, which is the a rescaled average of the number of poor persons and the poverty incidence of each parrish. 
  
To see the specific variables and formulae used for each of the major iterations, as well as additional notes on the development and application of decision trees, unhide the source code below. 

```{r trees-REF-write-into-parr0}
 
# As opposed to a prioritisation score -- typically the weighted average of several 
# demographic and socioeconomic indicators -- 
# a tree is much better at accounting for the variations across geographic areas.
# A partner might not have the capacity to work outside of urban areas or 
# might have specific geographic biases and decision trees are a good tool 
# to make the best possible targetting decisions within one's constraints.  

# With that in mind, tree3 was developed to aid future prioritisation. 
# The independent variable it strives to predict is the poverty score, which, as mentioned, 
# is just the rescaled average of number of poor persons and poverty incidence. 
# The performance of tree3 was considered superior to both tree1 
# (whose indendepent variable is just the absolute number of poor persons) 
# and tree4 (which considered parrish-level gaps) due to its ability to 
# clearly distinguish its groups of parrishes and because it is not dependent on gaps data --
# meaning it will not shift when the 5Ws are updated.

set.seed(3000)

# number of not covered poor persons 
tree1 <- parr0 %>%
  rpart(not_reached ~ estado + percent_pobre + percent_urbana + 
        densidad_ppl_km2 + razon_de_dependencia_de_menores_de_15_anos + 
        razon_de_dependencia_total +  
        percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
        percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina, 
        promedio_de_personas_por_vivienda, data = ., cp = 0.038)

# tree based on poverty_score
tree3 <- parr0 %>%
  rpart(poverty_score ~ estado + percent_urbana + densidad_ppl_km2 +
        razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +  
        percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
        percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina,
        promedio_de_personas_por_vivienda, data = ., cp = 0.044)

# tree based on gap score -- let's not use this 
# as tree3 is more stable and will not change based on new 5W data 
# tree4 <- parr0 %>%
#   rpart(gap_score ~ estado + percent_urbana + densidad_ppl_km2 +
#         razon_de_dependencia_de_menores_de_15_anos + razon_de_dependencia_total +  
#         percent_sin_agua_segura + percent_sin_saneamiento_mejorado +
#         percent_sin_servicio_electrico + percent_analfabeto + percent_hogares_jefatura_femenina, 
#         promedio_de_personas_por_vivienda, data = ., cp = 0.045)

# plotcp(tree3)
# printcp(tree3)

# adding tree1 and tree3 rules to the dataset 
parr0 <- parr0 %>% 
  mutate(rule1 = row.names(tree1$frame)[tree1$where]) %>%
      left_join(rpart.rules.table(tree1) %>% 
      filter(Leaf == TRUE) %>% 
      rename(rule1 = Rule) %>% 
      group_by(rule1) %>% 
      summarise(subrules1 = paste(Subrule, collapse = ",")))  %>% 
  mutate(rule3 = row.names(tree3$frame)[tree3$where]) %>%
      left_join(rpart.rules.table(tree3) %>% 
      filter(Leaf == TRUE) %>% 
      rename(rule3 = Rule) %>% 
      group_by(rule3) %>% 
      summarise(subrules3 = paste(Subrule, collapse = ",")))

```

<br><br>

### 5c. Sub-groups of decision tree3

Below is a plot of **`tree3`** -- the `r nrow(parr0)` parrishes where the number of beneficiaries does not exceed the number of poor persons (corresponding to the _under_ and _no coverage_ categories) have been split into four terminal nodes: **[4]**, **[5]**, **[6]** and **[7]**. The manner in which they have been split is meaningful for targetting decisions and this section will compare the characteristics of each. Please note that this is a different tree than in section 5a -- the two models have the same numbering because they have the same number of splits.

```{r tree3-rpartPlot}

fancyRpartPlot(tree3, digits = -3, sub = "", palettes = "Blues", type = 2)
```
 
<br>
 
#### Summary and overview of the terminal nodes of tree3 

```{r tree3-rules-TABLE}
# will they be confused that the terminal nodes have the same codes? 
# Should I explain in the text that this is because they have the same number of splits 
# or will that just make them more confused? 

parr0 %>% 
  group_by(rule3) %>% 
  summarise(parr_no_ben = n_distinct(pcode3[beneficiarios == 0]),
            beneficiaries = sum(beneficiarios),
            avg_beneficiaries = sum(beneficiarios) / n(), 
            not_reached = sum(not_reached),
            avg_not_reached = sum(not_reached) / n(),
            avg_org_count = mean(org_count),
            avg_population = mean(poblacion_2019),
            percent_poor = round(sum(pob_pobre) / sum(poblacion_2019) * 100, digits = 1),
            percent_urban = round(sum(pob_urbana) / sum(poblacion_2019) * 100, digits = 1),
            density_ppl_km2 = sum(poblacion_2019) / sum(area_km2, na.rm = TRUE),
            parrishes = n()) %>% 
  gather(key = variable, value = value, 2:ncol(.)) %>% 
  spread_(key = names(.)[1], value = 'value') %>% 
  # reordering the table instead of having it be alphabetical
  arrange(factor(variable, levels = c("not_reached", "avg_not_reached", "avg_population", 
                                      "beneficiaries",
                                      "avg_beneficiaries",  "avg_org_count",
                                      "percent_poor", "percent_urban", "density_ppl_km2", 
                                      "parrishes", "parr_no_ben"))) %>%  
  
  pander(big.mark = ",", caption = "Summary table of the terminal nodes of tree3")

```

* **[4]** consists of population centres which are easy to reach, but with only `r round(filter(parr0, rule3 == "4") %>% {sum(.$pob_pobre) / sum(.$poblacion_2019)} * 100, digits=1)`% of the population being poor, careful targetting and beneficiary selection is required -- blanket coverage will only result in excessive inclusion errors. It also has the highest average number of organisations present per parrish (avg_org_count). There are `r filter(parr0, rule3 == "4") %>% nrow()` parrishes in this group. These parrishes should not be prioritised -- resources should be allocated elsewhere. 

* **[5]** is probably the best option for expansion for most partners -- it has the highest concentration of poor persons not covered per parrish (avg_not_reached), is substantially poorer than **[4]**, with a poverty incidence of `r round(filter(parr0, rule3 == "5") %>% {sum(.$pob_pobre) / sum(.$poblacion_2019)} * 100, digits=1)`%. Additionally, these parrishes are still very urbanised (`r round(filter(parr0, rule3 == "5") %>% {sum(.$pob_urbana) / sum(.$poblacion_2019)} * 100, digits=1)`%), meaning that access to these populations will not be challenging. The coverage of organisations is still fairly high and partners should consider expanding into parrishes to the ones they currently cover. This is the largest group, with `r filter(parr0, rule3 == "5") %>% nrow()` parrishes. 

* **[6]** is where access starts to get more challenging -- though these parrishes have an average poverty incidence of `r round(filter(parr0, rule3 == "6") %>% {sum(.$pob_pobre) / sum(.$poblacion_2019)} * 100, digits=1)`%, the rate of urbanisation drops to `r round(filter(parr0, rule3 == "6") %>% {sum(.$pob_urbana) / sum(.$poblacion_2019)} * 100, digits=1)`% and the population density is only 18 ppl/km2. But there are still more poor persons not covered per parrish in this group than in **[4]**. There are `r filter(parr0, rule3 == "6") %>% nrow()` parrishes in this group. 

* **[7]** consists of the poorest, most vulnerable and most remote parrishes. Working in these areas will incur significant operational and logistical costs. However, with an average poverty incidence of `r round(filter(parr0, rule3 == "7") %>% {sum(.$pob_pobre) / sum(.$poblacion_2019)} * 100, digits=1)`%, blanket coverage will be warranted in many cases -- if the challenge of reaching all of the population can be met. Additionally, they also have the lowest average number of poor persons not covered, given their extremely low population density of 1.8 ppl/km2. Humanitarian agencies have the lowest presence in these parrishes. It is advisable for donors to incentivise activities in these areas as they are very underserved. There are `r filter(parr0, rule3 == "7") %>% nrow()` parrishes in this group.

<br><br>

### 5d. Map of parrishes by decision tree node

```{r tree-3-MAP-org-present}
# just one note for this map -- I still can't figure out how to get the tooltip to appear when
# you're hovering over the centroid instead of at the border; hoveron fill doesn't work. 
# I think you should just ask stackoverflow GIS

# hex for Set2 "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#FFFFFF"
# hex for Dark2 "#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#FFFFFF"
# hex for Accent "#7FC97F", "#BEAED4", "#FDC086", "#FFFF99", "#FFFFFF"
# scale_fill_viridis_d()
# scale_fill_manual(values = c())

parrmap_org <- parr %>% 
  left_join(parr0 %>% 
              select(pcode3, rule3), by = "pcode3") %>% 
  right_join(pcode3_shape, by = "pcode3") %>% 
  st_as_sf() %>% 
  mutate(not_reached = round(not_reached, digits = 0),
         tree_node = rule3) %>% 
  mutate_at(vars(percent_pobre, percent_urbana), ~(round(., digits = 2))) %>% 
  ggplot() +
  geom_sf(size = 0.1, 
          aes(fill = tree_node,
              text = paste0(parroquia,",", "\n", 
                           municipio, ",", "\n",
                           estado, "\n", 
                           "not covered: ", not_reached, "\n",
                           "poverty incidence: ", percent_pobre, "\n",
                           "percent urban: ", percent_urbana, "\n",
                           "org present :", org_present),
             alpha = org_present)) +
  theme_void() +
  scale_fill_manual(values = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3")) +
  scale_alpha_discrete(range = c(1, 0.7)) +
  theme(legend.title = element_text(size = 7),
        legend.text = element_text(size = 7),
        plot.title = element_text(size = 11)) +
  guides(alpha = FALSE) +
  labs(fill = "Tree node",
       alpha = "") +
  ggtitle('Map of parrishes by decision tree node (colour) & if organisations present (alpha)')
  
ggplotly(parrmap_org, tooltip = c("text", "fill")) %>%
  layout(title = list(text = paste0(
    "Map of parrishes by decision tree node (colour) & if organisations present (alpha)",
                                    "<br>",
                                    "<sup>",
                                     "mouse over for details; drag and click to select and zoom; double-click legend select/deselect","</sup>")))


```

Above is a map of parrishes by their decision tree node (denoted by colour), we have also decreased the alpha for parrishes where there are already organisations present, meaning that they appear more transparent. Looking at areas with the greatest concentrations of **[4]** and **[5]**, we can see that they conform to the the Venezuela Costal Range and the Venezuelan Andes, where most of the country's population is located; as a reminder, parrishes in node **[5]** are excellent candidates for expansion. 

We also see three large clusters of parrishes from node **[7]** -- the poorest and most-sparsely populated areas -- in Amazonas and Bolivar (at the bottom of the map), in Delta Amacuro (at the extreme right) and in Lara and Falcon (top-left). Double-click on each legend item to toggle the view.  

As a final note for this chapter, the prioritisation tree was limited to four terminal nodes because they captured the vast majority of the variance amongst the parrishes -- any more splits would only have diminishing returns. But, should a partner want to see a more complex tree with more nodes, it would be very easy to supply them with it. But I feel these more complex trees would mostly serve as references, rather than actual prioritisation tools -- I already question partners' ability to deal with 4 separate groups, each requiring their own strategies, much less 8 or 10.

<br>

* Chapter 5 annex: Additional notes on tree1, which was not selected -- unhide code to see

```{r tree1-notes, results=FALSE}
# the main problem I see is that each of the leaves has little variance in terms of poverty incidence
# but let me know if you want maps or products focused on this tree, it's pretty easy to do. 
# [15] is very, very attractive. Maybe I can do something with it.  

# 6 is dense, urban and highest operational presence,
# 2 is just too big. 800 parrishes is just too many. The low end is distinguished much better in tree3
# 14 is just rich, urban and not a priority. It's also a really small leaf.  
# 15 is actually a really good leaf -- really high nc_per_parr, few parrishes, 
# very dense, very urban and 42% poor and such an immensely low coverage percent. 
# Good low-hanging fruit. I almost want to keep tree1 just because of this leaf. 
# Maybe I will make one map just for this. 59,508 nc_per_parr is massive. 

parr0 %>% 
  group_by(rule1) %>% 
  summarise(parr_no_ben = n_distinct(pcode3[beneficiarios == 0]), 
            beneficiarios = sum(beneficiarios),
            ben_per_parr = sum(beneficiarios) / n(), 
            not_reached = sum(not_reached),
            nr_per_parr = sum(not_reached) / n(),
            nr_per_mun = sum(not_reached) / n_distinct(pcode2), 
            avg_org_count = mean(org_count),
            coverage_percent = sum(beneficiarios) / sum(poblacion_2019),
            percent_pobre = sum(pob_pobre) / sum(poblacion_2019),
            percent_urbana = sum(pob_urbana) / sum(poblacion_2019),
            densidad_ppl_km2 = sum(poblacion_2019) / sum(area_km2, na.rm = TRUE),
            parroquias = n(),
            municipios = n_distinct(pcode2),
            parr_per_mun = n() / n_distinct(pcode2)) %>% 
  gather(key = variable, value = value, 2:ncol(.)) %>% 
  spread_(key = names(.)[1], value = 'value') %>% 
  arrange(factor(variable, levels = c("not_reached", "nr_per_parr", "nr_per_mun", "beneficiarios",
                                      "ben_per_parr",  "avg_org_count", "coverage_percent",
                                      "percent_pobre", "percent_urbana", "densidad_ppl_km2", 
                                      "parroquias", "parr_no_ben", "municipios", 
                                      "parr_per_mun"))) %>%  pander(big.mark = ",")

```


<br><br><br><br>  

## 6. Recommendations 

To summarise some important next steps:

1. **Review these findings** with all humanitarian partners; partners should adapt a set of recommendations from specific to their organisations and develop their own action plans.

2. Clusters must **jointly establish target populations and areas** and define a complementary suite of activities meant to address their specific needs. For instance, vulnerable households in target areas with children under 5 should be assessed for protection risks and referred if necessary; receive nutrition screening and preventive care services and counselling; be made aware of available ECCD options and, if applicable, be provided with education kits; receive hygiene kits and benefit from, at the community level, interventions to improve the supply of safe water. The community around them should also be targetted with school rehabilitation, awareness-raising activities and any health facilities should be stocked with a hospital kit and key medicines. In future analyses, it is hoped that we can track specific activity combinations that have been intentionally constructed and implemented in a coordinated manner. 

3. In particular, **Nutrition and Health** should endeavour to overlap in more parrishes, given the complementarity of their activities; additionally, many agencies have field officers who double-hat as health and nutrition officers, so this low level of coordination is viewed doubly unfavourably.

4. Agencies should use the **Reference table** in the next chapter to sort through and define a preliminary list of parrishes that will prioritised for coverage. They should also request agency-specific maps and other products so that each implementing partner may examine how best they may cover poor persons not reached. In particular, agencies should focus on parrishes which have very low coverage (`coverage_percent`), one or less sectors (`sector_count`), high poverty incidence (`percent_poor`) and high numbers of persons `not_reached`; `percent_urban`, in addition to a parrish's location and the tree node, can help determine the relative ease of reaching and working in that area. 

5. **Develop operations plans** that are area-focused and multi-sector; decentralise more decision-making power to field offices and increase personnel and capacity in the field offices to implement those plans. Request all datasets and products necessary to develop these plans. Country office-level planning also needs to be more coordinated and greater programmatic oversight must be exercised over each of the sections. 

6. **Split humanitarian partners** into groups based on their capacity to work in the various groups established by the decision tree -- there is much low-hanging fruit in node [5], but the most vulnerable will be found in node [7]. This will help clusters understand if their current pool of partners is sufficient to cover the gaps we have mentioned. A sample partner analysis has been included in the appendix of this document.   

7. Clusters should **coordinate the selection of new parrishes** and ensure that there are enough agencies present in each to ensure that needs are met in a comprehensive and integrated manner. Partners should share their intended actions openly and communicate their targets at the parrish level. Review coverage and multi-sector programming in inter-cluster meetings. 

8. **Halt and reallocate** all uncommitted resources intended for interventions in node [4]; explain to communities the reasons behind the reallocation; and present this document to donors as evidence for the reallocation. Push donors to encourage expansion into uncovered areas.

9. Ensure that all **M&E activities** survey multi-sector coverage. 

10. Develop and maintain a **consolidated beneficiary register** (at least within UNICEF), so that multi-sector coverage may be accurately tracked and the longitudinal monitoring of target populations facilitated. 

11. Commission a **new coverage and gaps analysis** every 6 months to review progress and identify persistent issues. 

<br><br><br><br>

## 7. Reference table 
_type or use slider to filter by categories or values;_ 
_use arrows to sort; table preliminarily arranged in descending order of poor persons of not reached_

```{r}
parr0 %>% 
  mutate(educacion_only = ifelse(educacion_ben > 0, "educacion", ""),
         nutricion_only = ifelse(nutricion_ben > 0, "nutricion", ""),
         salud_only = ifelse(salud_ben > 0, "salud",  ""),
         wash_only = ifelse(wash_ben > 0, "wash", ""),
         proteccion_only = ifelse(proteccion_ben > 0, "proteccion", ""),
         sectors_present = str_trim(paste0(educacion_only, " ", nutricion_only, " ", proteccion_only, " ", 
                                  salud_only, " ", wash_only))) %>% 
  mutate_at(vars(pob_pobre, not_reached, org_count, beneficiarios), ~(round(., digits = 0))) %>% 
  mutate_at(vars(percent_pobre, percent_urbana, coverage_percent), ~(round(.*100, digits = 1))) %>% 
  mutate(rule3 = as.numeric(rule3),
         sectors_present = ifelse(sectors_present == "", "none", sectors_present)) %>% 
  select(state = estado, municipality = municipio, parrish = parroquia,  
         percent_poor = percent_pobre, percent_urban = percent_urbana, not_reached, 
         beneficiaries = beneficiarios, coverage_percent, tree_node = rule3,
         org_count, sector_count, sectors_present, pcode3) %>% 
  arrange(desc(not_reached)) %>% 
  # the js is adjusting the font size for the whole container -- there doesn't seem to be another way
  datatable(filter = "top", options = list(pageLength = 10, scrollX = TRUE,
                                           initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "8.5pt", "'});"),
          "}")
       ) 
     ) 


```

