README.md 22.5 KB
Newer Older
g-weatherill's avatar
g-weatherill committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Exposure to Site Tool - User Manual and Technical Guide

```
---
Author: Graeme Weatherill
Affiliation: Seismic Hazard & Risk Dynamics (Section 2.6)
             GFZ German Research Centre for Geosciences
             14473, Potsdam, Germany
Email: graeme.weatherill@gfz-potsdam.de
License (source code): GNU General Public License v3.0 (GPLv3)
License (document): Creative Commons CC-By-4.0 
---
```

15
16
17
18
19
20
21
22
The exposure to site tool is a short(ish) Python 3 module for constructing the ESRM20 site amplification files for a given exposure model configuration. It supports several different workflows, depending on whether the user wishes to define the resulting site model from a regularly spaced grid, an existing exposure model or simply a shapefile of administative regions. 

### Before you start

The tools require additional data files not included with the main code. These files contain the regional scale European site input data. More details can be found in the description below. The data files are distributed with a Createive Commons CC-By-4.0 license and are available for download from:

<https://nextcloud.gfz-potsdam.de/s/93ZR4ky8D4mDXb9>

g-weatherill's avatar
g-weatherill committed
23
24
25

### Installation

26
The tool requires as standard set of Python libraries that can be installed via most common Python package managers. A `setup.py` file is included to facilitate installation. We recommend using a Python virtual environment for installing and running the tools (<https://docs.python.org/3.8/tutorial/venv.html>).
g-weatherill's avatar
g-weatherill committed
27
28
29
30
31

The required dependencies are:

(Required)

32
33
34
35
36
37
38
39
40
* Python >=3.7
* h5py >=2.10
* Numpy >=1.18
* Scipy >=1.3
* Pandas >=0.25
* Shapely >=1.7, < 2.0
* Pyproj >= 1.9
* matplotlib >= 3.0
* geopandas >= 0.9.0
g-weatherill's avatar
g-weatherill committed
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

(Additional)

* ipython >=7.2
* jupyter >=1.0.0


##### For Linux/OSX

Install Python virtualenv via the Python package manager, e.g. for Ubuntu:

```bash
sudo apt install git python3-venv
```

On OSX this may be `brew` or `macports`

Download the code from the Gitlab repository

```bash
git clone https://gitlab.seismo.ethz.ch/efehr/esrm20_sitemodel.git
```

Create and enter the virtual environment by

```bash
python3 -m venv exp2site
source exp2site/bin/activate
```

Update the python package manager `pip`:

```bash
pip install --update pip
```

Then run:
78

g-weatherill's avatar
g-weatherill committed
79
```bash
80
pip install -e esrm20_site_model/
g-weatherill's avatar
g-weatherill committed
81
82
83
84
85
86
87
88
89
```

This should install the dependencies and the `exposure2site` package. Additional packages to help run the code can be installed via

```bash
pip install ipython
pip install jupyter
```

90
91
92
93
94
95
<i><b>Note</b>: some installations may also require `pygeos`. If an error is raised indicating that it is not availablr then it can be installed via:</i>

```bash
pip install pygeos
```

g-weatherill's avatar
g-weatherill committed
96
97
98
99
100
101
102
103
104
105

##### Windows

For users experienced in creating virtual environments on Windows we suggest following the steps above in the manner best suited to your own Python environment.

For other users we suggest installing and running the toolkit using Anaconda. Details on how to install and get started with Anaconda can be found here: <https://conda.io/projects/conda/en/latest/user-guide/getting-started.html>. 

To create the virtual environment via `conda` run:

```bash
106
conda create --name exp2site python=3.8
g-weatherill's avatar
g-weatherill committed
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
conda activate exp2site
```
 
 Download the code from the EFEHR repository, either from the zip/tar.gz file or, preferably, via Git from the Conda command prompt
 
```bash
git clone https://gitlab.seismo.ethz.ch/efehr/esrm20_sitemodel.git
```

Update the python package manager `pip`:

```bash
pip install --update pip
```

Navigate to the `exposure_site_tool` and run the installation

```bash
125
pip install -e esrm20_site_model/
g-weatherill's avatar
g-weatherill committed
126
127
```

128

g-weatherill's avatar
g-weatherill committed
129
130
131
132
133
134
135
136
137
138
139
140
This should install the dependencies and the `exposure2site` package. Additional packages to help run the code can be installed via

```bash
pip install ipython
pip install jupyter
```


***

## Required Datasets

141
142
143
144
145
In addition to the tools themselves, you will need several files containing pre-processed site data.

The zipped file containing all of the necessary data is available for download from here: <https://nextcloud.gfz-potsdam.de/s/93ZR4ky8D4mDXb9>

The downloaded data should be unzipped and stored in a directory named `site_data`, this directory should then be placed into the main code within the folder `exposure2site`.
g-weatherill's avatar
g-weatherill committed
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192

The required datasets comprise 5 files:

1. `GEOL_V8_ERA2.shp`

	The European geological data from BRGM stored in shapefile format

2. `europe_gebco_topography.hdf5`

	An hdf5 binary containing all of the required site data for Europe on a 30 arc-second grid, bounded by longitudes -25˚E to 45˚E and latitudes 34.0˚N to 72˚N. The file contains the following attributes

	`longitude`: Vector of `nlon` longitudes

	`latitude`: Vector of `nlat` latitudes

	`elevation`: Array (`nlat`,`nlon`) of GEBCO elevations (topography and bathymetry

	`slope`: Array of (nlat, nlon) slopes derived from the `elevation` data set using the Horn algorithm availble from the Generic Mapping Tools `grdgradient` function

	`built_density`: Built density from the Global Human Settlement database, rendered from the 250 m x 250 m grid to the 30 arc-second grid

	`geology`: Geological era code per 30 arc-second cell


3. `europe_2020_vs30.hdf5`

	Slice of the USGS Global Vs30 dataset for Europe limited to the region enclosed by longitudes -30˚E to 65˚E and latitudes 30˚N to 75˚N. 
	
4. `volcanic_front_distances_5min.hdf5`

	Distances of sites on a 5' x 5' grid from the volcanic front (i.e. forearc-backarc limiting zone). This is needed for the subduction ground motion models and can be safely interpolated from the reference grids to the desired sites.
	
5. `ghs_global_250m_Europe`

	The original built density dataset at 250 m x 250 m spacing in the Mollewiede projection, limited to Europe and the Middle East. The global grid from which this is taken can be downloaded from the Global Human Settlement project, here: <https://ghsl.jrc.ec.europa.eu/download.php>

6. `ATTENUATION_REGIONS.shp`

	A shapefile containing the ESHM20 attenuation regions to which the sites are assigned
	
	 
Longitudes and latitudes in all 30 arc-second data sets refer to the locations of the <b>cell centres</b> beginning from the integer bounds such that the lower left cell centre is at (`llon + (1 / 240), llat + (1 / 240)`) and the upper right cell centre is at (`ulon - (1 / 240), ulat - (1 / 240)`), where `llon`, `llat`, `ulon` and `ulat` are the lower (left) longitude, lower (south) latitude, upper (right) longitude and upper (north) latitude respectively.  

***

# General Usage

193
The Ex2Site tool is mostly intended for command line usage (although Jupyter Notebooks illustrating how to call the functions directly are also provided). With the `exposure_to_site_tools.py` file in the current working directory, and the data files in the subdirectory `./site_data` the tool is run by:
g-weatherill's avatar
g-weatherill committed
194
195

```bash
196
exposure2site --run=choice_of_workflow --output-file=path/to/output_site_model.csv
g-weatherill's avatar
g-weatherill committed
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
```



## `grid`

The reference grid for the site information is 30 arc-seconds; however, we can downsample to a coarser grid by averaging out the properties of the cells contained on the reference grid for each of the corresponding target grid. This is illustrated in the figure below for the case that we wish to downsample the reference 30 arc-second grid to a 90 arc-second grid.

![](images/downsampling_grid.png "General formulation for downsampling a 30'' grid to 90''")

The 90 arc-second grid cell is aligned with the set of 30 arc-second cells such that it contains 9 reference grid cells. Each grid cell is associated with a given property (e.g. slope, vs30 etc.), \\(x_i\\) for \\(i=1, 2, ... 9\\). In the case that we wish to treat all cells as evenly weighted then the value of the property, \\(y_j\\) for the \\(j^{th}\\) 90 arc-second cell is the simple mean:

\\[
y_j = \frac{1}{9}\sum_i^9 x_i 
\\]    

Or, more generally for a grid spaced as a multiple, \\(\gamma\\), of 30 arc-seconds:

\\[
y_j = \frac{1}{\gamma^2}\sum_i^{\gamma^2} x_i 
\\]

The reference point for the coarser grid that will be used in the site model will always be the centre of the coarse grid (regardless of the weighting), as indicated by the red dot in the figure.

For the purposes of defining an <i>optimal</i> site model for the risk calculations we may wish to assign a site property to the grid cell that is closest to the site conditions that affect the greatest proportion of the population. One of the clearest cases where this may be relevant is in the case that one wishes to define the slope, vs30 and geology for coarse cell containing a small town in an upland valley. By just taking the value at the centroid of the cell, it is merely a matter of chance as to whether that point may be representative of a site in the valley (lower slope, lower Vs30 and potentially younger Quaternary geology) or of a hillside (higher slope, higher Vs30, potentially older geology). To <i>potentially</i> improve the assignment of the site property we can weight the high-resolution grid cells by their built environment density (\\(\rho_i\\)), as defined from the global human settlement index.

Each grid cell is associated with a built density, which is re-scaled here as a value from 0 (uninhabited) to 1 (densely inhabited), originally defined as 0 and 100 in the GHS dataset. With this information we can weight the averaging of the properties in the cell by the respective built density. To do so we define the weight for each cell by normalising the built density with respect to the total built density:

\\[
w_i = \frac{\rho_i}{\sum_i^{\gamma^2} \rho_i}
\\]

With each cell assigned then as a weight we can define the "average" properties in one of three ways:

`MEAN` (<i>default</i>) Takes the weighted mean of the data such that
\\[
y_j = \frac{\sum_i^{\gamma^2} w_i \cdot x_i}{\sum_i^{\gamma^2} w_i} = \sum_i^{\gamma^2} w_i \cdot x_i
\\]

`MAX` Takes the value of \\(x_i\\) with the highest weight

`MEDIAN` Takes the weighted median defined by sorting \\(x_i\\) from lowest normalised weight to highest and taking the value of \\(x_i\\) for the first cell with a cumulative weight, \\(w_i^c\\), greater than or equal to 0.5

#### The <i>Smoothed GHS</i> option

The GHS layer can be useful for applying weightings in order to bias the centroid location and/or properties toward the areas of highest urban density. There are some limitations to this process, however, in that the maximum value in the GHS is equal to 100 but there is not necessarily an easy way to distinguish between, for example, the downtown of a small town (which may have only a few cells equal to 100) versus that of a large metropolitan region (where a large area of cells may all take the value of 100). This may be problematic when using the `MAX` or `MEDIAN` options as the specific maximum or median is somewhat randomly chosen when many cells in a region contain the maximum value. In order to try to elucidate the large dense urban areas from the small towns, the user can apply weighting from a <i>smoothed</i> version of the GHS that uses the following kernel

\\[
GHS^{SMOOTH} (x, y) = \sum_{i=-N}^{N} \sum_{j=-N}^{N} GHS (x + i,y + j) 
\\]

where N is the size of the `Moore Neighbourhood` (i.e. the number of adjacent cells to the target cell). The consequence of this kernel is that cells surrounded by cells of similarly high density will take a higher value than that of a high urban density cell surrounded by lower density cells. At present the smoothing can be applied assuming a Moore Neighbourhood of \\(N = 1, 3\\) or \\(5\\). 

To use the smoothed GHS the additional keyword argument can be added:

`--smooth-n #` where `#` is the N size of the Moore Neighbourhood (1, 3 or 5). 

Or if running the function from within a Jupyter Notebook, the optional keyword argument `smooth_n=#` should be used (which takes \\(N=0\\) by default).

#### Handling Geological Data

The weighting approach described previously may be applicable for the case that the quantity is numerical and thus a "weighted average" can be calculated. For categorical data such as geology, however, the resulting property for the coarse cell should correspond to one of the categories represented in the set of sub-cells. For this purpose as geological era identifier, \\(z_i\\), is assigned to each cell on the 30 arc-second grid based on the location of the cell centre with respect to the European geological map. To assign the geological property to the coarser grid, the user can choose between one of three options:

`MODE` (<i>default</i>) Assigns the modal geological unit found within the \\(\gamma^2\\) sub-cells.

`MAX` Assigns the geological unit found at the highest weighted sub-cell

`MEDIAN` Sorts the geological values from low to highest according to the normalised weight and cumulates the weight for the call from 0 to 1. Assigns the geological unit of the cell whose cumulative weight is greater than or equal to 0.5

### Required `grid` properties

`--bbox llon/llat/ulon/ulat` Defines the limits of the regular grid

`--spacing ##.#` Defines the spacing of the downsampled grid, which must be a multiple of 30 arc-seconds (i.e. 30, 60, 90, 120, 150 ...) 

### Optional `grid` properties

`--onshore-only True|False` Set whether to clip the resulting site model only to those sites onshore (`True` <i>default</i>) or to return all grid cells. Onshore is defined as having an averaged elevation greater than -10 m above mean sea level. This value was determined by inspection found to produce good agreement with the European coastlines

`--weighting True|False` Determine whether or not to apply weighting of the averaging by the built density (default=`True`)

`--averaging mean|median|max` For numerical data this indicates whether to define the "average" property as the `mean` (default), `median` or `max`. These averaging terms can be applied with or without weighting

`--geological-weighting mode|median|max` For the geological data, this controls how the geological unit is determined from the sub-cells, with options of `mode` (default), `median` or `max`

`--default-xvf ##.##` For sites outside of the range of at which the subduction forearc/backarc distance taper applies, this sets the default volcanic front distance. Mostly this can be ignored and the default value of -150 km used. 

### `grid` Example

The following command would produce a 120 arc-second grid between 22˚E and 30˚E and 45˚N to 51.5˚N, clipping only to onshore sites, applying building density weighing and taking the resulting property as the maximum numerical value and modal geological category. The result will be exported to `example_site_model.csv`

```bash
289
exposure2site --run grid --output-file example_site_model.csv --bbox 22/45/30/51.5 --weighting True --averaging max --geological-weighting mode --onshore-only True
g-weatherill's avatar
g-weatherill committed
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
```
***


## `exposure`

The `exposure` workflow assumed that the target sites of an exposure model and their corresponding spatial polygons are predefined, and thus the intention is to assign properties to the target site based on an averaging over the corresponding polygons. For the current purpose, it is assumed that the target sites for the exposure are stored in a separate csv file from the polygons, which are themselves stored in a shapefile. An example combination of exposure and target sites is shown below.

![](./images/belgium_admin2.png "Polygons for Belgium at the Admin 2 level")

In the tool, the underlying spatial data from the 30 arc-second grids are taken for an area enclosed by the bounding box of the exposure model. The points are then assigned to their respective polygons using standard spatial join methods. At present, the spatial join is done entirely in the WGS84 (longitude, latitude) space in order to reduce computational time re-projecting data (TODO: this may change in the future). Once the intersections have been performed, each of the N polygons contains \\(n_j\\) sites.  From these sites the averages can be taken in the same manner as those done for the `grid`, and can be weighted by the building density if needed. Therefore, the options `--weighting`, `--averaging`, `--geological-weighting` and `--default-backarc-distance` all control the definition of the site properties in the same manner as for the `grid` calculation.

### Required `exposure` Arguments

`--input-exposure PATH/TO/input_exposure_model.csv` Enter the relative path to the csv exposure file. Within the exposure file the target sites must have the column headers `lon` and `lat`. <b>The country and administrative level are interpreted from the filename, which should take the form `Exposure_Type_Country.csv`</b>

`--shapefile-dir PATH/TO/SHAPEFILE_DIRECTORY` Inputs the path to the directory where the corresponding shapefiles are contained. The code will identify the shapefiles for the country and retrieve the polygons for the highest administrative level for which the exposure is defined.

### Optional `exposure` Arguments

All of the optional arguments specified for the `grid` option can be used here, with the exception of `--onshore-only`.

### `exposure` Example

The following command will read the exposure model in the file `Exposure_Res_Dummy.csv`, whose geometry can be found in the file `Adm2_Dummy.shp` in the directory `Country_Shapefiles`. Weighting based on built density will be applied, this time with the value of the site property at the maximum weighted location. Geology will be determined from the modal geological category within the polygon. The result will be written to the output xml file `Dummy_Site_Model.xml`

```bash
317
exposure2site --run exposure --output-file Dummy_Site_Model.xml --input-exposure ./Exposure_Res_Dummy.csv --shapefile-dir ./Country_Shapefiles --weighting True --averaging max --geological-weighting mode
g-weatherill's avatar
g-weatherill committed
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
```

***
## `admin`

The `admin` workflow is a variation of the `exposure` workflow for which only the administrative polygons of the regions are defined but not the target sites. As the target sites are not defined in the input files, the user must determine how they can be obtained from the geographical data. For this the user is provided with several options, controlled with the command `--centroid-weighting`. As for the `exposure` workflow, all \\(i\\) sites on the regular 30 arc-second grid, and assigned to each of the N polygons such that there are \\(j\\) sites per polygon. If omitted, the target site is taken from the ordinary centroid of the polygons. Otherwise the centroid is determined from a weighted grid using the built density data set from one of three options:

`mean` Each site \\(X_i\\) (long, lat) within the polygon is assigned a normalised weight \\(w_i\\) from the built density data set, as defined previously. The location of the weighted centroid \\(Y_j\\) is calculated from the locations \\(X_i\\) of the \\(i\\) sites via:

\\[
Y_j = \frac{\sum_i^{\gamma^2} w_i \cdot X_i}{\sum_i^{\gamma^2} w_i} = \sum_i^{\gamma^2} w_i \cdot X_i
\\]

`median` Each site \\(X_i\\) within the \\(j^{th}\\) polygon is assigned the corresponding weight \\(w_i\\) as before. The sites are then ranked in order of weight and the site corresponding to the 50th percentile weight is selected.

`max` The site corresponding to the highest weight is selected.

A comparison of the centroids weighted by building density for the Administrative 2 level of Belgium is shown below. In many cases locational discrepancies can be seen between the definitions, often by a significant margin.

![](./images/belgium_centroids.png "Alternative Centroids for Belgium at the Admin 2 level")

Once the centroids have been defined, the `admin` workflow then proceeds in accordance with the `exposure` workflow in assigning the site properties on the basis of the "average" properties for the polygon. H

### Required `admin` properties

`--input-shapefile path/to/input_shapefile.shp` Path to the input administrative shapefile

### Optional `admin` properties

`--admin-level #` Defines the administrative level at which to define the centroids. The administrative level must be less than or equal to the maximum administrative level in the input shapefile, and if not defined will take the highest administrative level in the shapefile.

`--centroid-weighting mean|median|max` Controls how to define the centroid per polygon as either the built density weighted `mean`, `median` or `max`, or the unweighted centroid if not defined.

Other configurable properties (as defined previously) are: `--weighting`, `--averaging`, `--geological-weighting` and `--default-backarc-distance`.


***

## `point`

The `point` workflow is a simple way of assigning site properties to a given location. This assumes that the user would like the properties exactly at the site locations specified in the input file. No weighting or averaging is undertaken. This may be useful in the case of the industrial exposure or any other type of exposure where facilities are placed at specific locations

The required input file can be any comma-separated text file, but the target sites must be specified under the column names `lon` and `lat`, or alternatively `xcoord` and `ycoord`, respectively. 

<b>Note</b> that duplicate points in the resulting site model will be removed!

### Required `point` properties

`--input-file path/to/input_site_file.csv` Defines the relative path to the input csv file containing the target sites

### Optional `point` properties

`--onshore-only True|False` Clip the resulting site model file to those sites located only onshore (elevation > 10 m below M.S.L.) (default=`True`) or retain all input sites (`False`)

### `point` Example

The following command will define the site model from a simple csv file of longitudes and latitudes (`./site_locations.csv`) and export the resulting model to `./example_point_site_model.csv`, clipping only to onshore sites

```bash
377
exposure2site --run point --output-file ./example_point_site_model.csv --input-file ./site_locations.csv --onshore-only True 
g-weatherill's avatar
g-weatherill committed
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
```

## `join`

This is not a workflow per se, but merely a command to combine multiple site model files into a single site model. Duplicate sites will be removed. The join can apply to a series of files specified by the user, or to an entire directory of site model files. The site models must be in the correct OpenQuake csv or xml format.

### Required `join` arguments:

One of either:

`--input-dir PATH/TO/INPUT_DIRECTORY` Specifying the directory where the site model files are contained

`--files model1.csv+model2.csv+model3.xml` Enumerates the list of files to be concatenated, separated by the `+` operator (no spaces allowed)

### `join` example

The following command joins together the site model in the file `site_model_countryA.csv` and the site model in the file `site_model_countryB.xml`, exporting them to `site_model_countriesAB.csv`:

```bash
397
exposure2site --run join --output-file site_model_countriesAB.csv --files site_model_countryA.csv+site_model_countryB.xml
g-weatherill's avatar
g-weatherill committed
398
399
400
401
402
```




aneberle's avatar
aneberle committed
403