Workshop

Random forest is a combination of multiple decision trees. It works by combining multiple covariates to determine which predictor is the most important when configuring the overall statistic. Random forest uses machine-learning methods called “Ensemble Methods”. These are basically methods that determine which covariate is the most important during prediction, and which is less important, in order to weigh each variable and count it for as much as it should be counted in each situation. Efficiency is improved during the use of ensemble methods. Specifically, Nieves et al. use random-forest machine learning to predict the global population density, by combining many covariates, of which will be picked apart for each area and each variable will be given a weight determined by how important it is to that particular area. When adding up these population densities, they will add up to one. This is because some grid cells will have more people, and some will have less, so they will balance out. This is to get a better and more accurate count at the grid cell level. The geospatial covariates that proved to be the most important, or that had the most weight given to them, were built environment covariates, which included building footprints, lights at night imagery, and LC Urban areas and LC Rural settlement. These proved to be a lot more important that natural factors. This is because more than 54% of the world population live in urbanized areas. A dasymetric population allocation is a process in which they are attempting to get more accurate data. It goes from a more general data set to a more specific data set by using population densities, going from larger cells, to smaller cells. The probability will be higher if there are many building tops and urban areas, and lower if there is water, or some other natural occurrence that would make it hard for humans to live.