Read Section 4.2.8 of Geospatial Analysis Online. You are going to have to search for the section, there is no easy direct link. The section covers several shape metrics that are useful for considering shape complexity. These metrics allow one to ask, is one shape more complex than another.

As part of this lab, we will calculate several "classic" shape complexity metrics based on relationships between perimeter and area.We will look at data from qocha reservoirs that are located in the northern Lake Titicaca Basin mesopotamia. To learn more about this area, take a look at Craig et al. 2011. Here were are looking at a slightly different question than was addressed by the paper. Here we want to know if there is more than one "kind" of field. Prior researchers divided qocha into three main types based on size and shape. There are very small qocha, large qocha, and linear qocha. We want to know if there are "natural" groups with clear breaks between one type and another type or if the range of variability is only continuous.

Circles have the greatest area to perimeter ratio. The more complex a shape becomes (as sinuosity increases on the boundary) the more the perimeter increases in relation to area. We want to capture this variability and attempt to determine if there are multiple "groups" that might represent different kinds of fields.

- Download the data for Lab 5.
- Add the shapefiles to ArcMap.
- Export Qocha_Polygon as a new shapefile and name it Qocha_Working
- Open the attribute table for Qocha_Working and delete the last 4 columns. You will be recreating these, but you can use the original as a reference.
- In the Qocha_Working shapefile, add a field, make it a double, and name it P1A.
- Using the field calculator, apply the following expression "
**[Shape_Leng] / [Shape_Area]**". This is the perimeter/area metric. - Add another field, make it double, and name it P2A.
- Using the field calculator, apply the following expression "
**([Shape_Leng] * [Shape_Leng]) / [Shape_Area]**". This is the perimeter square/area metric. Note the use of parentheses to control the order of operations. Note also, that the sqr () function returns the square root. Compare this same expression to the windows calculator. The windows calculator uses sqr () to generate the square. This is very odd given that the field calculator is using VBA, a Microsoft product and the calculator is a Microsoft calculator. The same expression sqr () can produce extremely different results depending on where it is implemented. It is always good to spot check calculations. - Add another field, make it double, and name it A_circle.
- Using the field calculator, apply the following expression "
**([Shape_Leng] * [Shape_Leng])/12.57**". This is the expression for Perimeter squared divided by 4 Pi. - Add another field, make it double, and name it C
- Using the field calculator, apply the following expression "
**Sqr ([Shape_Area]/[A_Circle])**". This expression will calculate the term C. - Now export the table as a DBF file.
- Open the exported file in Excell, and select all.
- Open PAST and paste. It will also be necessary to Edit the Labels (see the check box). Take the values from the first row and make these the headers.
- Once the headers are edited, remove the first row so that only data are now in the table.
- Describe a bit the characteristics of both Perimeter and Area. Use the Statistics>Univariate option. Past will provide a list of values.
- N: number of objects in the sample
- Min: the lowest value
- Max: the highest value
- Sum: all of the values of N added together
- Mean: the average
- Std. error: The standard error of the estimate of the mean.
- Variance: variance is calculated according to the following formula

- Stand dev: The sample standard deviation
- Median: The median of the sample. For n odd, the given value such that there are equally many values above and below. For n even, the average of the two central values.
- 25prcntil: The given value such that 25% of the sample is below, 75% above.
- 75 prcntil: The given value such that 75% of the sample is below, 25% above
- Skewness: The sample skewness, zero for a normal distribution, positive for a tail to the right.
- Kurtosis: zero for a normal distribution.
- Geom. mean:

- Are the distributions of area and perimeter "normal? What metrics might one use to determine this?
- Now use PAST to generate histograms and box plots. Do this under the Plot function. Be sure to check the outliers box.
- Are there outliers? Where are the outliers located? How might one identify the outliers in the table of values? Hint, I'd use Excel. Might be useful to identify the outliers in the table so that they can be treated differently during later calculations.
- To me it looks as though the values exhibit right, or positive, skew. Try normalizing the distribution of values by employing a transformation. Click on the thumbnail below to see a figure from Drennan (2009) that describes the action of several common transformations.
- Try applying the logarithm transformation to both perimeter and area. I often apply this function using the following Excel equation =log([cell]) where the cell in the bracket is the value I want to transform.
- In Excel, once the function has been applied to both area and perimeter in PAST create a new column and paste in the values. It may be possible to apply similar functions in PAST. If someone figures out how to do this, please report back to class.
- Once the values are in PAST, use the Stats>Univariate function to compare the skew between both the transformed and untransformed area an perimeter.
- For untransformed area I got skewness value of 82.01
- Once log transformed, for area I got a skewness of 0.356159
- For untransformed perimeter, I got a skewness value of 9.701
- Once log transformed, for perimeter I got a skewness of 0.51

- To me it seems that log greatly reduces the skew. However, unlike other transformations listed in Drennan's (2009) figure above, the logarithm is difficult to reverse. Try some of the other transformations listed in Drennan and see if there is a reversible one that make the necessary adjustment.
- Where might it be necessary to know the skew and to be able to reduce skew?
- Remember, the original question this lab started with was to evaluate if there were multiple groups among the qocha. Based on ethnographic data, it was postulated that three groups were present. Two of these groups were based on size and one was based on a combination of size and shape. Typically when I'm looking for multiple groups in a distribution I pay close attention to the possibility of more than one mode. When multiple modes are present, this is a likely indicator of multiple groups.
- Look at the histograms of each of the shape metrics (P1A, P2A and C), do any of these histograms indicate more than one group? Are there any other indicators of the presence of multiple groups? Could outliers be considered a distinct group? If you thought outliers might constitute a distinct group, or different type of qocha, how would you identify these cases? Hint: I'd give some consideration to the mean and standard deviation.
- Is there a relationship between perimeter and area? To explore this, use the Model>Linear function. Does the relationship between shape and area chance once the values have been log transformed? How about the other transformations that you tried? Why might the relationship change after transformation?