|
Links!
|
Communication and organization in software development: An empirical study
This empirical study addresses the communication among members of a software development organization. Reactions between participants were evaluated. Now the question that needs to be studied is whether the relationship among participants has an effect on the amount of communication effort spent. A qualitative and quantitative method for data collection and analysis were used.
This study is based on the data collection from the observation of 10 (mostly design and code) re-view meetings and from interviews with the evaluated participants. There is no formal hypothesis to this study, the goal of the study is to generate a theory, and one of its contributions will be a set of hypotheses to guide any further study.
This study took place at the IBM Software Solutions Laboratory in Toronto, Canada. The development project studied was DB2* (DATABASE 2*), an IBM commercial database system with several versions for different platforms. The process was chosen for study because it was well defined in DB2, and there was a lot of communication between participants as well as it could be observed.
A description of the DB2 environment had three major parts that corresponds to three aspects of the environment most relevant to the study includes the review process, the organizational structure, and information flow.
Process: The official purpose of the review process is for participants to plan, prepare, meet, rework and follow-up. They would read the material individually, discover any defects the author of the code may have overlooked, meet and discuss it with each other. The rework is the responsibility of the original author and one reviewer is made the chief reviewer who is responsible for the follow-up.
Organization: A first line manager is responsible for all developers who are in turn responsible for maintaining specific collection of software components, facilitating the release of DB2 products, or providing services. Some teams and developers are divided and headed by a task leader. A second line manager is responsible for the whole DB2 development.
Information Flow: This third part is made up of the types of interactions, or instances of communication, which takes place between members of the organization.13 type of interaction have been identified but only a few will be mentioned in the results section. One type of interaction is the defects where all participants or reviewers discuss the defects the author may have overlooked.
Study design This empirical study examines the role of organizational structure in process communication among software developers. Qualitative methods were used to collect data that was then quantified into variables that were then analyzed into quantitative methods.
The unit of analysis in this study is interaction. The data collection procedure is about participant observation and structured interviews. After the data is collected, they are transformed into a set of quantitative variables such as the medium of communication used, how many people were present in the meeting.
Data collection The data for this study was collected from the initial visit to IBM in Toronto in June 1994 and two follow-up visits in November 1994 and April 1995.There was e-mail communication between each visit. During each review, each separate discussion was timid with the beginning and the ending times. Other sources of data were the administrative data such as (date, time, participants, material reviewed, preparation time).
Variables The variables chosen for analysis fall into three categories. First is the dependent variable, communication effort. Second, a set of independent variables represent organizational structure. Third, a large set of intervening variables threaten to confound the results if not taken into account. 1- Communication effort labeled CE is the defined as the amount of effort, in person-minutes, expended to complete an interaction. How CE is determined depends on the interaction type for example defects interaction type, then the information needed would be information recorded during the observation and discussion plus any information about preparation time. 2- There are four organizational structures:
a- The first two are XOD and MOD. XOD is the maximum distance between any two participants on the organization chart whereas the MOD is the median organizational distance. Since IBM changed its management structure frequently, this provided accurate results of the work relationships.
b- The next two variables are familiarity and physical distance. Familiarity meant that the participants know each other and have worked together in the past and each knows the basic work of the other. Physical distance is how many walls, buildings; cities are between the two participants.
3- The third variable is outside the scope of this study such as personality traits, culture, may intervene in the significance of the result. A few of the intervening variables most relevant to the results presented in this paper are listed below:
Data Analysis The method of analysis depended on dividing the data based on the values of one or more variables then analyzing those subsets. The subsets of interaction that were analyzed were:
For each of these subsets scatter plots and histograms were generated. To test relationships, Spearman correlation coefficients were calculated. Another two-variable relationship that we explored with scatterplots was the relationship between CE and the number of participants (N). ANOVAs tests were done on some combinations of variables for some subsets, but no meaningful results yielded due to no enough infoemation. Mann-Whitney tests were also used to test some special hypotheses about combined effects of organizational distance. Results Data characterization. First, as can be seen in Figure 1, the distribution of the dependent variable, communication effort, is highly skewed toward the low end. The box plot at the top shows another view of this distribution. The box itself is bounded by the 25th and 75th percentiles, and the 90th percentile and upper extreme are shown as vertical lines. The diamond indicates the mean of the data. Ninety percent of the interactions had a CE of less than 600 person-minutes. The maximum amount of effort any interaction required was 1919 person-minutes, and the minimum was three person-minutes. The median was 38 and the mean was about 190. Table 1 shows the numbers and cumulative percentages of data points at each level of MOD and XOD (recall that there are exactly 100 data points, so simple percentages are not shown). About 60 percent of the interactions had a median organizational distance (MOD) of two or less, and more than three- fourths had a maximum organizational distance (XOD) of four or higher. If we look at MOD and XOD together, as in Table 2 we see that most of the data fall into three categories:
Familiarity and physical distance. In this subsection, we present two hypotheses. Hypothesis: Interactions tend to require more effort when the participants are not previously familiar with one another's work. Hypothesis: Interactions tend to require more effort when the participants work in physically distant locations. Two of the organizational structure variables, familiarity and physical distance, exhibit straightforward relationships with communication effort. Figures 2 and 3 show their distributions. In Figure 4 they are both plotted against communication effort. A box plot is also shown for each level of each independent variable. The top and bottom boundaries of the boxes indicate the 75th and 25th percentiles. The median and the 90th and 10th percentiles are also shown as short horizontal lines (the median and 10th percentiles are not really visible on most boxes). The width of each box (and of the partitions on the horizontal axis) reflects the number of data points in that level. It appears from Figure 4 that high effort is associated with low familiarity and with high physical distance (the latter observation being the strongest). That is, interactions tend to require more effort when the participants are not previously familiar with one another's work. This observation is consistent with Krasner's findings [9] about "common internal representations." As for physical distance, interactions tend to require more effort when the participants work in physically distant locations. Curtis [1] and Allen [5] have had similar findings. However, it must be noted that most interactions have low familiarity and high physical distance, as shown in Figures 2 and 3. The Spearman correlation coefficients, which reflect the strength of the relationships between each independent variable and the dependent variable, are shown in Table 3. Physical distance has the highest coefficient, which implies that it has a stronger direct effect on communication effort than any of the other variables.
Organizational distance. We now present a hypothesis on organizational distance. Hypothesis: More effort is required when the set of participants includes mostly organizationally close members, but has a few organizationally distant members. The results pertaining to organizational distance are both more complex and more interesting. Figure 5 shows two scatter plots, each with communication effort on the vertical axis, and one of the two versions of the organizational distance variable on the horizontal axis. Box plots for each level of each independent variable also show the 10th, 25th, 75th, and 90th percentiles, as well as the median. From Figure 5, we can observe that the highest-effort interactions are those with a relatively low median organizational distance (MOD) and relatively high maximum organizational distance (XOD). This category is the second described in the subsection on familiarity and distance, in the discussion of the distributions of MOD and XOD. This observation implies that groups require more effort to communicate when they include a few (but not too many) members who are organizationally distant from the others. Less effort is required when the group is composed of all organizationally close members (low MOD and low XOD), or all or nearly all organizationally distant members (high MOD and high XOD). We tested the statistical significance of this result by calculating the Mann-Whitney U statistic. This is a nonparametric test meant to indicate whether or not two independent samples exhibit the same distribution with respect to the dependent variable (CE). In this case, the two groups were those interactions falling into the high XOD/low MOD category, and those that did not. The test yielded a significant value, even at the p < .01 significance level. This result contrasts with Curtis, [1] who hypothesized that the relationship between organizational distance and communication ease is more straightforward. If we examine in more detail the subset of interactions that were effort-intensive, we find further evidence supporting this hypothesis. The 11 highest-effort interactions all required a communication effort greater than 500 person-minutes. The distributions of organizational distance in this subset are shown in Table 4. None of the high-effort interactions had a MOD more than two, and none had a XOD less than four. In fact, all of the interactions in this high-effort subset belong to the second category (low MOD/high XOD) described above.
Three of the types of interactions that were most effort-intensive were the defects, questions, and discussion interaction types. Most of the interactions of type defects involved a set of participants that fell into the second category (low MOD/high XOD), including all of those with CE above the mean. Recall that defects interactions are those in which review participants raise and discuss defects during the review meeting. The questions interaction refers to questions raised and discussed during the review meeting. Again, the highest-effort interactions of this type fall into the second category of participant sets (low MOD/high XOD). The discussion interactions (which include other types of technical discussion during the review meeting) tend to be less effort-intensive than the questions or defects interactions, but still require more effort than most interaction types. Discussion interactions exhibit the same patterns in organizational distance as mentioned above for the questions and defects interactions. Meeting interactions. Two hypotheses are presented for meetings. Hypothesis: Interactions that take place during a meeting (a verbal request and an unprepared reply) tend to require more effort than other interactions, especially when they involve communication technology. Hypothesis: Having more participants tends to make interactions more effort-intensive, even when the effort is normalized by the number of participants. Nearly all of the high-effort interactions involved a verbal request for information (Mr = verbal) and no written preparation of the information (Mp = verbal), and were executed using some sort of communications technology (Mt = videoconference or conference call). These patterns in the use of communication media, shown in Figure 6 differ dramatically from the patterns seen in the data as a whole. Interactions that involved a verbal request and no preparation usually took place during a face-to-face meeting in which many people were present, which implicitly increases the communication effort. In those meetings in which conference calling or videoconferencing was used, the technology actually slowed down the process. Significant amounts of time were spent waiting for remote participants to find the right page, to clarify issues for remote participants, etc. Also, the communication technology was unfamiliar to some participants. This result implies, however, that the meeting participants did not, or could not, keep the meeting from running long by cutting short their discussions. The defects, questions, and discussion interactions constitute all of the technical communication that takes place during a review meeting. The effort recorded for these interactions includes the effort required to prepare for, carry out, and digest this technical information. Since these interactions form the core of the work of a review, it is comforting to know that they are the ones that require the most effort. In fact, over all 10 of the reviews studied, 70 percent of the total communication effort expended was consumed by interactions of these three types. One other variable deserves a little more attention. The median number of participants in high-effort interactions is 10, but the median in the larger set of interactions is about half that (5.5). This result is not so straightforward as it might seem, however, because the variable N (number of participants) is not completely independent from communication effort. For some interactions, in fact, N is used in the calculation of CE. For example, CE for an interaction of type discussion is calculated by multiplying the amount of time spent in general discussion during the review meeting by N. To investigate whether or not the number of participants has an independent effect on effort, we normalized communication effort by dividing it by N. Then we picked the 15 interactions with the highest normalized CE (15 was the smallest number that included the 11 interactions we analyzed before as the highest effort). The median number of participants in this subset is 8, lower than 10, but still considerably higher than the median of the data as a whole (5.5). So it appears that the highest-effort interactions involve more participants than interactions in general, regardless of which way effort is calculated. This result should not be surprising, given that it is consistent with the theoretically quadratic growth of the number of communication channels among n people. That is, there are n (n - 1)/2 pairs among n interaction participants, so, intuitively, the effort to communicate in a group of n should also grow faster than n. Defects interactions included anywhere from 4 to 15 participants, with a median of 10 participants. All of the defects interactions with (normalized or unnormalized) CE above the mean had seven or more participants. The same was true for questions interactions. Familiarity and physical distance. : In this section there are two hypotheses. Hypothesis: Interactions tend to require more effort when the participants are not previously familiar with one another's work. Hypothesis: Interactions tend to require more effort when the participants work in physically distant locations. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||