An Exploratory Study of the Objective Scoring System, version 3,
with Mixed Issues Screening Polygraphs

by Raymond Nelson, Mark Handler, and Donald Krapohl

Poster #1 - Development and validation
Poster #2 - Validation with mixed format field investigation polygraphs
Poster #3 - Additional validation with LEPET and PCSOT screening exams
The OSS-3 Report
Other Materials
PCSOT Model Policy


In this third study we investigated the accuracy of OSS-3 algorithm with screening cases, and developed decision rules for screening polygraphs in which multiple distinct investigation targets are tested in the absence of any known allegation or incident.

Screening polygraphs represent a substantial portion of field examinations, and include law enforcement pre-employment testing (LEPET), post-conviction sex offender testing (PCSOT) and counterintelligence/security screening. None of the samples from our two previous experiments included screening cases. All of the field examination data used for training, validation and cross validation of the OSS-3 method has consisted of confirmed case data from field investigations involving known or alleged crimes. For that reason, our previous experiments do not yet support the application of a form of the OSS-3 method to screening situations. Furthermore, we identified important mathematical differences between screening and diagnostic polygraph examinations, and we are doubtful that the same decision rules will apply to diagnostic and screening testing situations with equal effectiveness.

Polygraph screening examinations are conducted in the absence of any known incident or allegation and represent a distinct challenges compared with polygraph testing of known incidents or known allegations, beginning with the fact that screening examinations commonly attempt to simultaneously investigate multiple distinct issues to which it is conceivable that an examinee could be lying regarding involvement in one or more issues while being truthful regarding involvement in other issues under investigation. For further discussion of the difference between screening and diagnostic polygraphs, see Krapohl and Stern (2003).

In this project, our objectives were to:

  • Develop a form of OSS-3 specifically for screening examinations, and cross validate with samples of LEPET and PSCOT examinations,
  • Explore decision rules that would maximize the classification efficiency of the OSS method with screening polygraph examinations, and
  • Compare the accuracy of the OSS-3 with screening case to the results observed with the training sample.


We used double bootstrap resampling and Bonferonni t-tests to evaluate the effectiveness of the OSS-3 algorithm with screening polygraphs. We defined decision rules to address the mathematical complications inherent in the classification of mixed-issues screening polygraphs. Though we did not anticipate the ability to determine truthfulness and deception simultaneously within a single examination, we did intend to develop a screening rule model that could both efficiently classify screening polygraph results using a procedurally and mathematically structured approach.

Screening Samples

We constructed three samples of screen polygraph examinations: 1) LEPET examinations, 2) PCSOT disclosure exams, and 3) PCSOT maintenance exams.  Screening commonly involves the testing of larger numbers of test subjects for possible involvement in a number of concerns for which all, none, many or some of the subjects may be involved. While the confirmation of deceptive polygraph examination results is limited by the availability of physical or confessional evidence, truthful screening polygraph results are unlikely to be confirmed by external evidence or the confession of an alternative suspect. Though it would be ideal to have access to screening polygraphs that are confirmed by data external to the polygraph test, we recognized that much can still be learned from the sample data that are available for study. In our three screening samples, we included deceptive cases that were confirmed by the examinee's direct confession to the test questions. We included cases that were classified as truthful by the original examiner, with the additional confirmation through quality assurance review of each case. .

We constructed the LEPET sample as a matched sample of 30 deceptive and 30 truthful examinations conducted at two large law enforcement agencies after January 1, 2007. Deceptive cases were selected based on direct confessions to test questions, and truthful cases were selected for inclusion in the sample after confirmation through mandated quality control procedures at the agencies which conducted the examinations. LEPET screening exams were conducted according to the APA model policy on police applicant testing, and consisted of two to four relevant questions. See Table 1.

                                    Table 1. LEPET examinations.                                                          
                                                      Four Relevant Questions                    37
                                                      Three Relevant Questions                   22
                                                      Two Relevant Questions                    1

We constructed two matched samples of PCSOT cases, conducted in a metropolitan area from January 1, 2003 to December 31, 2006. One PCSOT sample included disclosure examinations regarding the subjects' reported histories of unknown/unreported sexual offenses, not including the crime of conviction. The other PCSOT sample consisted of maintenance polygraphs, regarding non-compliance with supervision and treatment contracts and unreported sexual behaviors. Both samples included 30 deceptive and 30 truthful cases. Deceptive cases were selected on the basis of the subject's confession to the test questions. Cases classified as truthful were included in the sample, after being reviewed by two of the authors for proper administration and interpretation of the test data. PCSOT examinations were conducted according to the APA standards for PCSOT testing and the standards of practice published by the Colorado Sex Offender Management Board.

                        Table 2. PCSOT Maintenance                         Table 3. PCSOT Disclosure               
                              Four Questions      4                                        Four Questions     17
                              Three Questions     48                                      Three Questions    40
                              Two Questions      8                                        Two Questions      3


We completed a double-bootstrap Bonferonni t-test of M=602 resampled sets from the LEPET sample data, using both the spot scoring (MGQT) rules and the screening rules described above. Table 4 shows that our screening rules produced significantly fewer inconclusive classifications (p=<.001) compared with the spot scoring rules, along with a significant improvement in specificity to truthfulness. Overall decision accuracy, including sensitivity to deception, false negative and false positives classifications did not differ significantly between the two decision rules. See Table 4 and Chart 1.

Table 4. Double-bootstrap t-test of M=602 resample sets, of N=60 LEPET screening exams.

                                           Spot (MGQT) Rules                             Screening Rules        sig.

Correct Decisions                95.5%                                                  93.2%                 .239†

INC                                     26.4%                                                    3.3%               <.001

Sensitivity                             83.4%                                                  93.3%                 .044†

Specificity                             57.2%                                                  87.0%               <.001

FN                                         3.4%                                                    6.7%                 .205†

FP                                          3.2%                                                    6.4%                 .207†

† Not Significant.

Chart 1. LEPET Screening Sample, spot scoring and screening rules.

OSS3 Chart

To investigate the significance of any observed differences in the accuracy outcomes of the three screening samples, we used a bootstrap ANOVA of M=1000 resample sets of size equivalent to the screening samples (N=60) for each of the three screening sets. Data indicate that decision accuracy did not differ significantly across the three screening samples. See Table 5 and Chart 2.

Table 5. Bootstrap ANOVA constructed from 1000 resampled sets of N=60.  

      Using Screening Rules

                                             LEPET           PCSOT            PCSOT
                                                                    Maintenance   Disclosure           F =            Sig.        

Correct                                94.6%              93.1%              93.2%                0.049           .952

INC                                       3.4%                1.6%                1.7%                0.174           .840

Sensitivity                             93.1%              93.3%              90.0%                0.089           .915

Specificity                             89.9%              89.9%              93.3%                0.086           .917

FN                                          6.9%               6.7%                6.7%                0.001           .999

FP                                           3.4%               6.8%                6.7%                0.130           .878

No significant differences in any of the comparisons.

Chart 2.  Bootstrap ANOVA with LEPET and PCSOT Samples.

OSS3 Chart

To further investigate the performance of the OSS-3/Screening algorithm with screening and diagnostic exams, we combined the data from the three screening samples for a combined screening sample of N=180 cases. We then used a double-bootstrap t-test of M=1802 resampled sets of N=180 cases from the combined screening samples data, and M=2922 resampled sets of N=292 cases from the OSS training sample. Our experiment revealed no significant differences between the OSS-3/Screening and OSS-3/Senter methods in their intended applications. See Table 6 and Chart 3.

Table 6. Combined screening samples.

                                          Training Sample         Combined Screening Samples                
                                             Senter Rules                      Screening Rules
                                               M=2922                              M=1802                                 sig.

Correct Decisions                93.9%                                      93.2%                                    .337

INC                                       4.5%                                        1.6%                                    .005

Sensitivity                             90.6%                                       92.2%                                   .271

Specificity                             88.8%                                       91.1%                                  .202

FN                                         6.7%                                         6.7%                                   .497

FP                                          4.9%                                        6.6%                                   .214

Chart 3. Training sample and combined screening samples.

OSS3 Chart


In this poster we assess the potential accuracy of a third version of OSS-3 (Screening decision rules).  We concluded that this version of OSS-3 might provide satisfactory accuracy for use in field settings.  We also recognize that these issues should be further investigated.

While the decision accuracy of traditional spot scoring rules was acceptable, our assumptions about the potential for excessive inconclusive rates were confirmed using our screening sample data. We are not yet convinced that inconclusive rates from field testing situations would be as low as those observed in our experiments, inasmuch as they might be an artifact of an unintended bias of our case confirmation criteria.

Kircher and Raskin (2002) suggested important differences in the discriminate function for pneumograph RLL values using probable-lie and directed-lie comparison question techniques. We would therefore advise against attempts to apply the present screening model to directed-lie screening tests without further study. 

The authors would like to thank Walt Goodson of the  Texas Department of Public Safety and Brian Sulla of the Houston Police Department for providing data for the LEPET sample, in addition to H. Lawson Hagler, Jason Walker, and Mitch LaCost of Accountability Polygraph Services, Centennial Colorado, for providing data for the two PCSOT samples.


Kircher, J.C., & Raskin, D.C. (2002).  Computer methods for the psychophysiological detection of deception. In M. Kleiner (Ed.) Handbood of Polygraph Testing. London: Academic Press.

Krapohl, D.J., & Stern, B.A. (2003). Principles of multiple-issue polygraph screening: A model for applicant, post conviction offender, and counterintelligence screening. Polygraph, 32(4) 201-210.