acm-header
Sign In

Communications of the ACM

Communications of the ACM

Who Should Test Whom?


The construction of software engineering teams, the interaction between members, and how individual personalities influence these, has been a concern from the 1960s to the present day [5]. Nevertheless, despite claims from leading figures in the field that it is fundamentally people that make the difference between software success and failure, a corpus of knowledge and good practice has failed to emerge. While there have been some attempts to investigate these issues through the application of psychometric tests, the issue of what personality analysis can or cannot offer software engineering is still open for debate [6, 9]. In this article we argue that the lack of progress in this field is due in part to the inappropriate use of psychological tests, frequently coupled with basic misunderstandings of personality theory by those who use them. To support this case we will present our analysis of papers that focus on the empirical use of personality tests in a software engineering context. Our analysis is supported by the expertise of the first author, who is both a chartered psychologist and a trained administrator qualified in the use of MBTI and 16PF psychometric tests. We conclude with a set of recommendations for test application and use for researchers, participants, and readers.

Back to Top

Analyzing Personality in Software Engineering Research

We surveyed papers published in the software engineering field relevant to the topic of personality testing, using digital libraries. This process generated 40 papers published between 1984 and 2004. From this pool 13 distinct papers were identified that focused on the empirical use of personality tests in a software engineering context: this subset is used to illustrate our arguments (osiris.sunderland.ac.uk/~cs0hed/CACMdata provides access to the full data set). Our analysis of these papers concentrates on examining test selection to identify whether reliable and valid instruments have been used, whether the test chosen is appropriate for the purpose, and the extent to which the personality testing process used is explicitly reported and discussed. It is our contention that as a minimum a paper must account for these issues if there is to be any confidence in the resultant data analysis. The majority of the papers surveyed (25 out of 40) focused on the Myers-Briggs Type Indicator (MBTI); we will therefore confine our discussion to this tool. The MBTI classifies personality in terms of people's preferred ways of operating in the world. It categorizes individuals into one of 16 personality types. These types are derived from people's expressed preferences on four functions: (E)xtroversion vs. (I)ntroversion, (S)ensing vs. (I)ntuition, (T)hinking vs. (F)eeling, and (J)udging vs. (P)erceiving [2]. Each type has a number of positive features: there is no ideal type, they are all equally valued. The eight functions and their focus are summarized in the table here.

Back to Top

What Can Personality Testing Offer Software Engineering? Dispelling the Myths

In general, the MBTI has been used within software engineering research in one of two ways: to discover the personality type(s) that most typify good software engineers, or to identify the makeup of software development teams that are likely to work well together, or exhibit tensions.

Capretz [3] studied the MBTI types of 100 software engineers and found the largest type was ISTJ. While Capretz acknowledges there is no link between type and performance, and that other factors have a bearing on career choice, he goes on to state that these findings are important for employers looking for software engineering professionals. More recently, the MBTI was used to investigate the link between personality type and a code review task with a sample of 64 students [4]. In this study those with an NT (Intuition-Thinking) preference were seen to perform the task better than other types: the largest single type was ENTP. The authors expressed surprise as their results conflicted with those of Capretz. However, these findings do not tell us a lot about the ideal or even adequate software engineer, given the fact that type is not normally distributed in the population. As Kerth et al. [9] correctly point out: personality tests cannot identify good software engineers over bad, nor can their results predict "on the job" performance; there is some evidence of the importance of other factors, such as work experience [11]. Where researchers wish to identify the personality factors related to software engineering, or those factors that typify a group of exceptional software engineers, a more appropriate approach would be to use a trait-based instrument (such as the 16PF) where comparisons to a normative sample can be made. The main barrier to this approach would be choosing, or most probably creating, a representative normative sample for comparison.

The relevance of the MBTI to identify the makeup of software development teams has been limited, in that observable behavior is not always related to the underlying type. People can, and do, choose to operate in the non-preferred mode as situations dictate. The MBTI is a tool for the development of self-awareness and, when results are shared, awareness of others. Knowledge of personality type within a team allows people to expect others to react differently from themselves and equip them to cope more constructively with those differences. As such, the MBTI can be used to improve teamwork with the hoped for byproduct of improved productivity and quality, as long as the test is used properly.

Back to Top

Will the Real MBTI Please Stand Up?

The value of any psychometric instrument is directly related to the techniques used during construction; not all psychometric tests are created equal. Test publishers describe the precise methods of test development, in particular, statistical data relating to test reliability and validity. They do this because to ignore such issues would render a test worthless: a poor test will yield poor results. However, the importance of this appears to be lost on many of those who use such tests. Unfortunately, the casual reader of many of the articles discussed here would not see the significance of this point, or its likely bearing on the validity of the research, because in the majority of cases details of the specific tests used are glossed over, and in some cases, misrepresented. Even when researchers have used the real MBTI, for example [1, 11], details of the administration process are missing.

Karn and Cowling's [8] study of the interactions of personality types during software development claims to have used the MBTI to identify the individual personality types of two teams of student software developers. In fact, the MBTI was not used, a later technical report [7] reveals that a freely available test was used (www.humanmetrics.com). While it is claimed that there are "no significant statistical differences between this test and the MBTI" [7], the argument is not convincing. On inspection of www.humanmetrics.com, no data is provided on the methods of test construction, no reliability or validity data, and no MBTI correlation data. Moreover, in our opinion, the content and style of the site itself is hardly indicative of a professional organization: no surface contact details are provided and there is no firm evidence of credentials. The site offers an interesting range of other free tests including, "find your perfect partner"—perhaps the basis for a new slant on the concept of "pair programming"? Although this site might offer some amusement, the potential effects on the subject group are not so lighthearted. A critical part of the administration process is gaining client acceptance and willingness to answer honestly. The testing environment in this case can in no way have guaranteed that the subjects will have taken the process seriously.

Miller and Yin [10] discuss the use of the MBTI in the construction of software inspection teams. They claim to use the MBTI within this study, but then comment that they use the "standard approach of online specialized questionnaires." We were interested to discover precisely which version of the MBTI had been used within this study and through personal communication with the authors it was established that rather than using the full MBTI, they had in fact constructed their own test: no details of test construction and validity were provided.

You might ask: So what? Well, the development of a robust personality measure is a time-consuming, iterative process that can take many years, not least because personality measures present particular problems during construction. For a test to be of any value it must be both reliable and valid. Test reliability is the extent to which a test is consistent within itself, and over time. That is, the degree to which a test will give the same score or personality type for an individual on retesting. Test validity is the extent to which a test measures what it is intended to measure. Reputable tests such as the MBTI provide statistical data on these factors and details of the methods and samples used to gather this data. To ignore these factors when choosing a test will increase the possibility of acquiring misleading data.

In addition, respondents may attempt to distort their profiles; for example, by responding to items in ways they believe will create a favorable impression. Care must be taken therefore during item development to limit the insight a respondent may have, and to ensure that one pole of the preference does not appear more appealing or acceptable than the other. Standardized tests such as the MBTI are developed in a way that will limit the effects of such response sets. However, this peace of mind comes at a price: tests such as the MBTI can be relatively expensive to purchase. Freely available tests generally do not provide data on reliability and validity. Nor do they offer an insight into the test construction process, nor comment on the possible effects of response sets and how the test design limits these effects. Taken together, the two issues of the lack of detail provided on test construction, and the absence of validity and reliability data, severely limit our ability to trust the results of such tests. A test is worthless if we cannot be sure that it measures what it is supposed to measure, and its results are consistent over time.

Back to Top

Test Administration and Feedback

All tests, including the MBTI, have a degree of error in their accuracy, and this error may be amplified by external factors, the most potent being the administration process. Therefore, all standardized tests provide administration procedures and it is important that these procedures are followed. Administration is not a simple process of issuing instructions and asking people to complete question booklets—it involves a degree of skill to ensure that the need for standardization is met and that clients are at ease and have a good understanding of personality theory and the underlying assumptions of the instrument in question. Test publishers are fully aware of this and consequently the purchase and use of standardized tests is restricted to qualified users who have undergone specific training.

In the case of personality assessment an important aspect of the process is providing feedback it to clients. With the MBTI feedback is an absolute necessity, as it involves type verification and the process of "best fit." The MBTI questionnaire provides an indication or estimate of an individual's personality type (the "reported type"). Type verification occurs during a client feedback session; in some instances type verification cannot be resolved in a single session.

The feedback process involves the test administrator explaining the history and aims of the MBTI, along with a description of the four functions, the client then self-assesses their type. This is done through a process of open discussion during which the test administrator asks questions that will facilitate reflection. It is essential that this process takes place as there may be discrepancies between an individual's reported type and their self-assessment of their type. A number of studies have investigated discrepancies between reported and self-assessed type. Generally speaking, disagreements with reported type occur more frequently in dichotomies where the expressed clarity index is weak. For example, Walck [12] found that out of a sample of 256 people 75% agreed on all four letters and 97% agreed on three out of the four. The clarity index is a score that represents how sure the respondent is that he or she prefers one dimension over the other. It also provides an indication of those preferences where the client may not agree with their reported type. However, it does not, as Hardiman (in response to [9]) suggests, provide an indication that the respondent has degree of preference ranging on a continuum. The process of helping a client reach "best fit" can only be done by a qualified test administrator. Untrained individuals might bias this process through inappropriate or leading questions, their own misunderstandings of the MBTI and Jungian theory, or through the influences of their own personality type. If this process is not undertaken we cannot be sure the respondent would agree with the reported type. This is an important point since the clarity index data reported in some papers, for example [8], suggests the reported type for some respondents on some dimensions is extremely weak, whereas Bradley and Hebert [1] have examples of equal clarity indices for a pair of functions, but no discussion of how the choice for one over the other was made (for example, N rather than S).

None of the papers reviewed discussed the administration and feedback process in any detail. Therefore, even if we discount the problems of using inappropriate tests, the basic concept of the individual being involved in the process of "best fit" has been ignored, therefore we cannot be certain about the accuracy of the types identified in the research papers. These aspects may not have been neglected but, if they were carried out, they were not seen as worthy of discussion (despite being central to the effective and acceptable use of the MBTI).

Back to Top

Conclusion

Our analysis of what is required for effective use of psychometric tests leads us to the following recommendations aimed at potential participants, researchers, and readers.

We recommend to a potential participant (whether work or research related) that you ask the following of your tester:

1. What test is to be used? Press for the specific test and its version, a qualified tester will be able to be precise.

2. Is it a recognized and validated test? If it is not either politely refuse or, if you agree to be involved, be aware of the limitations of the test and its results.

3. Are the testers fully trained and qualified to administer the process? If this question causes bemusement then the answer will be "no."

4. How will the process be administered? A valid approach will be relatively time consuming since for each individual there will be pre-test and post-test discussions in addition to the test time.

To a researcher whose aim is to investigate the impact of personality in a software team we suggest the following:

1. Become qualified, or team up with a qualified tester, and use standardized tests: use the flowchart provided in the figure here to help identify the appropriate test, and process, for your work.

2. Ensure in publications that you are clear about the tests and the process used and the following information is provided: the test used, the administration process, who the qualified testers were, how feedback was given, whether the types in the paper are reported types or verified types (derived after feedback).

To a reader of such articles we suggest the following:

1. Look for explicit details of types of test used, administration process, the qualifications of the testers.

2. Don't assume because a paper has been published in a prestigious journal it is flawless.

Finally, people are entitled to develop questionnaires of their own to test out their hypotheses, gather data, and report results. These approaches are not invalid or necessarily suspect and therefore we are not warning against such work. However, those who claim to be using personality testing are making claims of authenticity and validity that are often not warranted. Software engineers often complain about those who, in the course of their work, do some programming in support of their professional activities: the claim being that such individuals are not professionals and do not understand the discipline. The same can be said of those who adopt psychological approaches without the relevant qualifications and background.

Back to Top

References

1. Bradley, J.H. and Hebert, F.J. The effect of personality type on team performance. Journal of Management Development 16, 5 (1997), 337–353.

2. Briggs Myers, I., McCaulley, M.H., Quenk, N.L., and Hammer, A.L. MBTI® Manual: A Guide to the Development and Use of the Myers-Briggs Type Indicator®. Third Edition, CPP Publications, Palo Alto, CA, 1998.

3. Capretz, L.F. Personality types in software engineering. International Journal of Human Computer Studies 58, 2 (2003), 207–214.

4. Devito Da Cunha, A. and Greathead, D. Code review and personality: Is performance linked to MBTI type? Technical Report Series, CS TR 837, School of Computing Science, University of Newcastle upon Tyne, UK, April 2004.

5. Gorla, N. and Lam, W.Y. Who should work with whom? Building effective software project teams. Commun. ACM 47, 6 (June 2004).

6. Hardiman, L.T. Personality types and software engineers. IEEE Computer 30, 10 (Oct. 1997), 10.

7. Karn, J.S. and Cowling, A.J. A study into the effect of disruptions on the performance of software engineering teams. Research Memorandum CS-04-07, Department of Computer Science, Sheffield University, 2004.

8. Karn, J.S. and Cowling, A.J. An initial observational study of the effects of personality type on software engineering teams. In Proceedings of the Eighth International Conference on Empirical Assessment in Software Engineering (EASE 2004), IEE, Edinburgh, UK, 2004.

9. Kerth, N.L, Coplien, J., and Weinberg, J. Call for the rational use of personality indicators. IEEE Computer 31, 1 (Jan. 1998), 146–147.

10. Miller, J. and Yin, Z. A cognitive-based mechanism for constructing software inspection teams. IEEE Transactions on Software Engineering (Nov. 2004).

11. Turley, R.T. and Bieman, J.M. Competencies of exceptional and non-exceptional software engineers. Journal of Systems and Software 28, 1 (1995), 19–38.

12. Walck, C.L. The relationship between indicator type and true type: Slight preferences and the verification process. Journal of Psychological Type 23 (1992), 17–21.

Back to Top

Authors

Sharon McDonald (sharon.mcdonald@sunderland.ac.uk) is a senior research lecturer in the School of Computing and Technology at the University of Sunderland, U.K., and a chartered psychologist trained in the administration and interpretation of the MBTI and 16PF personality scale.

Helen M. Edwards (helen.edwards@sunderland.ac.uk) is a professor of software engineering in the School of Computing and Technology at the University of Sunderland, U.K., a fellow of the British Computer Society, and a chartered engineer.

Back to Top

Figures

UF1Figure. Guidelines for the testing process.

Back to Top

Tables

UT1Table. The MBTI functions and their focus.

Back to top


©2007 ACM  0001-0782/07/0100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2007 ACM, Inc.


 

No entries found