Exploratory Study of the Objective Scoring System, version 3,
Poster #1 - Development and validation
Poster #2 - Validation with mixed format field investigation polygraphs
Poster #3 - Additional validation with LEPET and PCSOT screening exams
The OSS-3 Report
PCSOT Model Policy
In this third study we investigated the accuracy of OSS-3 algorithm with screening cases, and developed decision rules for screening polygraphs in which multiple distinct investigation targets are tested in the absence of any known allegation or incident.
Screening polygraphs represent a substantial portion of field examinations, and include law enforcement pre-employment testing (LEPET), post-conviction sex offender testing (PCSOT) and counterintelligence/security screening. None of the samples from our two previous experiments included screening cases. All of the field examination data used for training, validation and cross validation of the OSS-3 method has consisted of confirmed case data from field investigations involving known or alleged crimes. For that reason, our previous experiments do not yet support the application of a form of the OSS-3 method to screening situations. Furthermore, we identified important mathematical differences between screening and diagnostic polygraph examinations, and we are doubtful that the same decision rules will apply to diagnostic and screening testing situations with equal effectiveness.
Polygraph screening examinations are conducted in the absence of any known incident or allegation and represent a distinct challenges compared with polygraph testing of known incidents or known allegations, beginning with the fact that screening examinations commonly attempt to simultaneously investigate multiple distinct issues to which it is conceivable that an examinee could be lying regarding involvement in one or more issues while being truthful regarding involvement in other issues under investigation. For further discussion of the difference between screening and diagnostic polygraphs, see Krapohl and Stern (2003).
In this project, our objectives were to:
We used double bootstrap resampling and Bonferonni t-tests to evaluate the effectiveness of the OSS-3 algorithm with screening polygraphs. We defined decision rules to address the mathematical complications inherent in the classification of mixed-issues screening polygraphs. Though we did not anticipate the ability to determine truthfulness and deception simultaneously within a single examination, we did intend to develop a screening rule model that could both efficiently classify screening polygraph results using a procedurally and mathematically structured approach.
We constructed three samples of screen polygraph examinations: 1) LEPET examinations, 2) PCSOT disclosure exams, and 3) PCSOT maintenance exams. Screening commonly involves the testing of larger numbers of test subjects for possible involvement in a number of concerns for which all, none, many or some of the subjects may be involved. While the confirmation of deceptive polygraph examination results is limited by the availability of physical or confessional evidence, truthful screening polygraph results are unlikely to be confirmed by external evidence or the confession of an alternative suspect. Though it would be ideal to have access to screening polygraphs that are confirmed by data external to the polygraph test, we recognized that much can still be learned from the sample data that are available for study. In our three screening samples, we included deceptive cases that were confirmed by the examinee's direct confession to the test questions. We included cases that were classified as truthful by the original examiner, with the additional confirmation through quality assurance review of each case. .
We constructed the LEPET sample as a matched sample of 30 deceptive and 30 truthful examinations conducted at two large law enforcement agencies after January 1, 2007. Deceptive cases were selected based on direct confessions to test questions, and truthful cases were selected for inclusion in the sample after confirmation through mandated quality control procedures at the agencies which conducted the examinations. LEPET screening exams were conducted according to the APA model policy on police applicant testing, and consisted of two to four relevant questions. See Table 1.
Table 1. LEPET
We constructed two matched samples of PCSOT cases,
conducted in a metropolitan area from January 1, 2003 to December 31,
2006. One PCSOT sample included disclosure examinations regarding the
subjects' reported histories of unknown/unreported sexual offenses, not
including the crime of conviction. The other PCSOT sample consisted of
maintenance polygraphs, regarding non-compliance with supervision and
treatment contracts and unreported sexual behaviors. Both samples
included 30 deceptive and 30 truthful cases. Deceptive cases were
selected on the basis of the subject's confession to the test
questions. Cases classified as truthful were included in the sample,
after being reviewed by two of the authors for proper administration
and interpretation of the test data. PCSOT examinations were conducted
according to the APA standards for PCSOT testing and the standards
of practice published by the Colorado Sex Offender Management Board.
Table 2. PCSOT Maintenance
Table 3. PCSOT
We completed a double-bootstrap Bonferonni t-test of M=602 resampled sets from the LEPET sample data, using both the spot scoring (MGQT) rules and the screening rules described above. Table 4 shows that our screening rules produced significantly fewer inconclusive classifications (p=<.001) compared with the spot scoring rules, along with a significant improvement in specificity to truthfulness. Overall decision accuracy, including sensitivity to deception, false negative and false positives classifications did not differ significantly between the two decision rules. See Table 4 and Chart 1.
4. Double-bootstrap t-test of M=602 resample sets,
of N=60 LEPET screening exams.
Spot (MGQT) Rules Screening Rules sig.
Correct Decisions 95.5% 93.2% .239†
INC 26.4% 3.3% <.001
Sensitivity 83.4% 93.3% .044†
Specificity 57.2% 87.0% <.001
FN 3.4% 6.7% .205†
FP 3.2% 6.4% .207†
Chart 1. LEPET Screening Sample, spot scoring and screening rules.
investigate the significance of any observed differences in the
accuracy outcomes of the three screening samples, we used a bootstrap
ANOVA of M=1000 resample sets of size equivalent to the screening
samples (N=60) for each of the three screening sets. Data indicate that
decision accuracy did not differ significantly across the three
screening samples. See Table 5 and Chart 2.
Table 5. Bootstrap ANOVA constructed from 1000 resampled sets of N=60.
Using Screening Rules
Correct 94.6% 93.1% 93.2% 0.049 .952
INC 3.4% 1.6% 1.7% 0.174 .840
Sensitivity 93.1% 93.3% 90.0% 0.089 .915
Specificity 89.9% 89.9% 93.3% 0.086 .917
FN 6.9% 6.7% 6.7% 0.001 .999
FP 3.4% 6.8% 6.7% 0.130 .878
differences in any of the comparisons.
Chart 2. Bootstrap ANOVA with LEPET and PCSOT Samples.
further investigate the performance of the OSS-3/Screening algorithm
with screening and diagnostic exams, we combined the data from the
three screening samples for a combined screening sample of N=180 cases.
We then used a double-bootstrap t-test of M=1802 resampled sets of
N=180 cases from the combined screening samples data, and M=2922
resampled sets of N=292 cases from the OSS training sample. Our
experiment revealed no significant differences between the
OSS-3/Screening and OSS-3/Senter methods in their intended
applications. See Table 6 and Chart 3.
Correct Decisions 93.9% 93.2% .337
INC 4.5% 1.6% .005
Sensitivity 90.6% 92.2% .271
Specificity 88.8% 91.1% .202
FN 6.7% 6.7% .497
Chart 3. Training sample and combined screening samples.
In this poster we assess the potential accuracy of a third version of OSS-3 (Screening decision rules). We concluded that this version of OSS-3 might provide satisfactory accuracy for use in field settings. We also recognize that these issues should be further investigated.
While the decision accuracy of traditional spot scoring rules was acceptable, our assumptions about the potential for excessive inconclusive rates were confirmed using our screening sample data. We are not yet convinced that inconclusive rates from field testing situations would be as low as those observed in our experiments, inasmuch as they might be an artifact of an unintended bias of our case confirmation criteria.
Kircher and Raskin (2002) suggested important differences in the discriminate function for pneumograph RLL values using probable-lie and directed-lie comparison question techniques. We would therefore advise against attempts to apply the present screening model to directed-lie screening tests without further study.
The authors would like to thank Walt Goodson of the Texas Department of Public Safety and Brian Sulla of the Houston Police Department for providing data for the LEPET sample, in addition to H. Lawson Hagler, Jason Walker, and Mitch LaCost of Accountability Polygraph Services, Centennial Colorado, for providing data for the two PCSOT samples.
Kircher, J.C., & Raskin, D.C. (2002). Computer methods for the psychophysiological detection of deception. In M. Kleiner (Ed.) Handbood of Polygraph Testing. London: Academic Press.
Krapohl, D.J., & Stern, B.A. (2003). Principles of multiple-issue polygraph screening: A model for applicant, post conviction offender, and counterintelligence screening. Polygraph, 32(4) 201-210.