Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity. — George E. P. Box
Models of archaeological site potential are not new to archaeology or cultural heritage; however, the analytical tools used to develop, test, and refine those models have continued to evolve in recent years. Using a series of scripts in GIS, Python, and R—coupled with archaeological and environmental data—a series of models were developed that are undergoing iterative testing and refinement to mitigate bias introduced by linear, block, and compartment-level compliance surveys.
There are assumptions built into the models that warrant acknowledgment. First among those is that the archaeological site locations are correct. If site locations used in the model were collected on paper maps, or with one of the many early generations of geographic positioning system (GPS) hardware and software, it is possible that these data can be introducing a fair amount of error into the models. This error may be acceptable in the first iteration, but should be mitigated where possible through site relocation efforts. Additional error can be introduced in the environmental layers. For instance, in one case it was known that the existing streams shapefile was problematic for one of the models based upon several field reports and personal experience. To correct for this, LiDAR data were used to generate new stream and ordered stream shapefiles in advance of incorporating those data into the model.
The machine learning package used to generate the models can be further tuned using a series of automated runs and evaluations in additional packages. This aids in maximizing the predictive ability of the models while avoiding overfitting.
Models can be used as an exploratory measure to generate testable hypotheses, and also to better manage and protect those areas where cultural resources may be located. In this case, the suite of models were generated using site locations associated with temporal diagnostics, and will be used to posit possible temporal affiliations for sites lacking temporal diagnostics. Those hypotheses can then be tested by conducting additional survey or excavation efforts at the site, after which the new data—whether it supports the current hypothesis or not—can be added to the next iteration of the model, adding both to its’ accuracy and utility.