-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflict Data Additions #11
Comments
Remember, you're dealing with an old school Correlates of War-era peace scientist here who is more knowledgeable about inter-state stuff than intra-state stuff, so I might ask some dumb questions. Importantly, is there a type of "seminal" analysis on this front? Think of articles like Fearon and Laitin (2003) [state-year] and Bremer (1992) [dyad-year] as illustrative. If you do state-level analyses of civil conflict, or dyadic analyses of dispute onset, you've seen these 100 times before and the basic information in there is mimicked in every similar analysis that adjusts the set of covariates or the sampling frame. When it comes to different levels of analysis, I'm less knowledgeable and could best see the value if I know there is an exemplar analysis that serves as a template to copy. I downloaded the GED and, 50 MBs into the download as I type, I already know that ain't happening. 😛 I could see some workarounds, as you mentioned. Perhaps the data can be stored remotely, and loaded in by the user. I'm trying to keep those remote data sets to a bare minimum for a variety of CRAN-related reasons, but that's an option. That said, here's where I think this would be optimal. For one, the UCDP/PRIO people have such a killer suite of data sets for researchers doing civil conflict that I really think they need their own API for these things, and perhaps their own R package around it. Perhaps I'm being too vain or misremembering things, but I seem to recall some UCDP/PRIO people poking around this package late last year (when it first went on CRAN) and talking about how they really need something like that with their own data. I think they do, and that I think Second, and this is invariably going to happen soon (and perhaps really soon if it's urgent), but In a case like this, I can see this working as follows. You have subnational units, but those units are nested in states. Perhaps there are covariates at the subnational level that I don't know about, and perhaps I just need to be shown why these are important as a different level of analysis (beyond state-year, dyad-year, and soon: leader-levels). There are also state-level covariates that will allow you to make more cross-sectional comparisons at a higher order than the subnational-level (e.g. state-level GDP per capita). The "declare" function would allow you to load the GED data, "declare" them to be 1) state-level and 2) using the Gleditsch-Ward system of states and that would allow you to use a function like What do you think of that? |
Canonical isn't really my style so I hope you'll settle for interesting instead. Regarding UCDP, the file is definitely large and my thought was a function that downloads it from the internet because as far as I know, the direct link generally doesn't change. As far as structure, I have pretty detailed script that turns the UCDP event data into duration format somewhere in my dropbox that may provide a good place to start in terms of building a backend. I've always found it absolutely insane that given how massive their database is, UCDP doesn't have its own API. The structure of ACLED is basically identical with the main difference being that UCDP consists of "conflict episodes" while ACLED strictly codes events. Unfortunately, this requires two slightly different approaches to expanding the data with UCDP being a bit more tricky than ACLED. As far as subnational covariates are concerned, the College of William and Mary's AIDData project is the single best source for those and has them at multiple levels, but honestly it's probably way too much work to integrate that. I think as long as the standard GADM identifiers are included, which I believe are in both UCDP GED and ACLED, it's best to leave the decision of subnational covariates to the end user because selection is hard to generalize and requires specific knowledge of the question at hand. If there was a pre-compiled global data source that had the night time lights data at various administrative levels, I think that would be worth including but my main thought here was focused on the addition of the conflict data itself. Honestly, I think the The downside to downloading the data from the internet, of course, is that you'd have to add something like |
I had a chance to look at the way you've got things structured and I think the best place to start is going to be a generalization of the As far as geographic level is concerned, Weidmann's (2015) analysis suggests 95% of conflict events, and the GED in particular, fall within 50km of the reported coordinates so the function should probably support the first and second administrative levels (the equivalents of states/provinces and counties respectively; probably via an additional user-specified character argument) but not the third because that introduces complications related to spatial error (and apparently some countries just don't have a third admin level at all). This also helps resolve the problem of file sizes since only dealing with a semi-aggregated version of the data drastically reduces the amount of space required and eliminates the potential headache that introducing non-api based web dependencies would be. Any thoughts on this? |
I hope you don't mind me saying this is something I might have to return to. Don't misinterpret silence on my end as lack of interest. It's just that this R package/manuscript was R&Red at an IR journal and the major addition the reviewers/editor wanted was leader stuff. It's what I'm knee-deep doing. Re: time. Time is no issue. {lubridate} and the {tidyverse} make dates a cinch. Re: districts, though. Do we know and know well the full universe of districts/subnational units? Do they change/have they changed? In the American context, for example, we still stupidly have the 50 states and I don't think we have new counties that have formed within states at any point in the past forever. I can see, however, that possibly being an issue in other countries, certainly war-torn countries, that are imposing wholesale changes in their borders (or parts of it) during or after a conflict. Btw, I see you forked the Github repository. If you're wanting to propose an addition via Github, feel free. It might make more sense to drop whatever code additions you have in the data-raw directory. Writing documenation is a chore but I'm happy to do that. |
I absolutely understand. The GED stuff, at least in terms of the functions that would need to be written, happens to be closely related to some stuff I have to do for a manuscript I'm revising for ISA Midwest in late November so I figured I may as well get them started in the process. I probably should have led by saying that I already have most of the code written to do what I've described above for the GED data, it's mainly a matter of getting it in the proper format and verifying compatibility with the rest of the package. As far as the geography stuff goes, there are three major systems--PRIO Grid, GADM, and GeoBoundaries--and it's worth considering which would be the most useful. I'll talk with some of the faculty in my department who do this sort of thing and see if there are any preferences for one system over the other. The GED supports two of the three but I need to double check which ones those are. This is definitely a longer term thing but I'll hopefully have something ready and working by January or so. |
Dumping these links here so I don't lose track of them. https://twitter.com/adamjnafa/status/1472678781307564044 |
Here's the code for the function in the tweet. ##------------------------------------------------------------------------------------###
###------------------Functions for Pulling UCDP Data Via the RestAPI----------------------
###------------------------------------------------------------------------------------###
#
# This is a function designed to import data from the Uppsala Conflict Data Program's
# API directly into an R session. The function takes n required arguments and uses them
# to construct an API call.
#
# Arguments:
# .resource The UCDP Dataset to be imported. Currently supported options include
# - `gedevents` for the UCDP Georeferenced Event Dataset (UCDP GED)
# - `ucdpprioconflict` UCDP/PRIO Armed Conflict Dataset
# - `dyadic` for UCDP Dyadic Dataset
# - `nonstate` for UCDP Non-State Conflict Dataset
# - `onesided` for UCDP One-Sided Violence Dataset
# - `battledeaths` for UCDP Battle Related Deaths Dataset
#
# .version The version of the UCDP resource to be downloaded. If not Specified,
# the argument defaults to the most recent release, which at this time is
# is version 21.1
#
# .date_range An optional vector of length two containing dates for the beginning and
# end of the time range to pull observations for respectively. For example,
# `.date_range` = c("1991-01-01", "2000-12-31")` would retreive all events
# that occured between January 1, 1991 and December 31, 2000. This argument
# is only evaluated if `.resource = "gedevents"`
#
# .filters An optional string of additional conditions to use for filtering in the
# API call. For more details see https://ucdp.uu.se/apidocs/
#
# ... Additional arguments. Currently supported options include `.pagesize`
# which defaults to the maximum of 1000
#
# Usage:
#
# # Make and API Call for the UCDP GED
# ged_data <- ucdp_api_data(
# .resource = "gedevents",
# .version = "21.1",
# .date_range = c("1999-01-01", "2000-12-31")
# )
#
# # Print the first few rows of the retreived data
# head(ged_data[[3]])
#
# API Call Constructor Function
ucdp_api_data <- function(.resource,
.version = NULL,
.date_range = NULL,
.filters = NULL,
...
){
# Base URL for the UCDP API
.base_url = "https://ucdpapi.pcr.uu.se/api"
# Check for a user-specified argument of .version
if(!is.null(.version)){
.base_api_string = str_c(.base_url, .resource, .version, sep = "/")
}
# If .version is not specified, use the most recent version
else {
.base_api_string = str_c(.base_url, .resource, "21.1", sep = "/")
}
# Check for a user-specified argument of .pagesize
if(!is.null(.pagesize)){
.base_api_string = str_c(.base_api_string, "?pagesize=", .pagesize, sep = "")
}
# If .pagesize is not specified, set it to 1000
else {
.base_api_string = str_c(.base_api_string, "?pagesize=1000", sep = "")
}
# Check for a user-specified date range of length 2
if(length(.date_range) == 2 & .resource == "gedevents"){
# Contrstruct a string with the specified time range
.time_string = str_c(
"&StartDate=",
lubridate::ymd(.date_range[1]),
"&EndDate=",
lubridate::ymd(.date_range[2]),
sep = ""
)
# Update the base API Call
.base_api_string = str_c(.base_api_string, .time_string, sep = "")
}
# Additional user-specified filtering conditions
if(!is.null(.filters)){
# Update the API with user-specified conditions such as
.base_api_string = str_c(.base_api_string, .filters, sep = "&")
}
# Make the initial API call
.result <- jsonlite::fromJSON(.base_api_string)
# If the number of pages is > 1 recover each page
if(.result$TotalPages > 1){
.df <- tibble(
# Generate a sequence of pages
.pages = seq(0, (.result$TotalPages - 1), 1),
# Generate a sequence of pages
.api_calls = str_c(.base_api_string, "&page=", .pages)
)
# Recover each page and append the the result
.ucdp_data <- map_dfr(
.x = ged_example[[2]]$.api_calls,
~ jsonlite::fromJSON(.x)$Result
)
return(list(.result, .df, .ucdp_data))
}
# Otherwise, just recover the data from the original object
else {
return(.result)
}
} |
Let me start by saying this is an awesome package that has been extremely useful for constructing duration data. Over the past decade peace science (comparative politics/civil conflict) has moved in a subnational direction (there's only so much we can learn from the country-year) and I'm wondering how much work it would be to add support for the major event-level data projects? Namely, UCDP's Georeferenced Event Dataset (GED) Global version 21.1 and the Armed Conflict Location & Event Data Project. Of course, due to size limitations it may not be feasible to include them by default but this could be handled via some basic functions that when called download and import the data. In the case of ACLED there may be some licensing issues that complicate things. The most straightforward solution I can think of in that case would be to utilize their recently added API and have users specify their own key but it may be necessary to seek guidance from ACLED regarding how best to handle that since you have to have an API key to even download their curated data files now. I may have some time to work on some of the API stuff next month if you're interested in adding that functionality.
The text was updated successfully, but these errors were encountered: