Free Access
Ann. For. Sci.
Volume 66, Number 8, December 2009
Article Number 808
Number of page(s) 12
Published online 25 November 2009

© INRA, EDP Sciences, 2009


Stem volume can be estimated from forest inventory data with volume tables, geometric formulae or regression equations that convert tree measurements to some measure of stem volume. Accurate estimation of stem volume makes it possible to estimate the monetary value of one of the many commodities and services that forests provide to society, i.e. timber. Studies of forest resource sustainability and national and international assessments of forest resources also require reporting of growing-stock volumes (e.g., FAO, 2006).

Volume prediction to any merchantable limit has been carried out by several methods, most usually involving the use of volume-ratio and stem taper equations. Volume-ratio equations predict merchantable volume as a percentage of total tree volume (e.g., Burkhart, 1977; Clutter, 1980; Reed and Green, 1984). Taper equations (e.g., Kozak, 1988, 2004; Newnham, 1992) are mathematical formulae that describe the stem shape; integration of the taper equation from the ground to any height provides an estimate of the volume to that height. In both cases, merchantable stand volume is obtained by summing the merchantable volume of all the trees within a stand. Although merchantable volume equations developed from volume-ratio equations are very easy to use and develop, those obtained from taper functions are preferred nowadays, perhaps because they allow estimation of diameter at a given height (Diéguez-Aranda et al., 2006). Furthermore, if tree form can be accurately described, then volume for any merchantability limit can be accurately predicted (Jordan et al., 2005). Once the volume of a portion of a tree is defined by a stem profile model, the quantity of wood products of any dimension can be derived by analytical geometry and input variables describing the technique for wood processing (e.g., saw blade kerf) (Zakrzewski and MacFarlane, 2006).

Taper functions not only provide estimates of diameter at any height along the stem and total stem volume, but also the merchantable volume and height to any top diameter from any stump height, and individual volumes for logs of any length at any height from the ground (Kozak, 2004).

Although tree taper has long been studied through the development of taper functions by many forest researchers throughout the world (e.g., Bi, 2000; Demaerschalk, 1972; Kozak, 1988; Max and Burkhart, 1976; Newnham, 1992), the topic is still relevant, perhaps because no single theory has been developed that adequately explains the variation in stem form for all kinds of trees (Newnham, 1988).

Introduced by Clutter (1963), compatibility means that an integrated model can be obtained by summation of the differential model; thus, for a given merchantable volume equation there is an intrinsically defined compatible taper function (Clutter, 1980), implying that integration of the taper function from the ground to total height of the tree would provide the appropriate volume, and subsequently the merchantable volume equation (Jordan et al., 2005). One common approach has been to develop a compatible volume system that ensures compatibility between the taper function and an existing total volume equation by including the latter in the former and imposing the condition on the parameters so that integration of the taper equation from zero to total height provides the total volume of the tree (e.g., Fang and Bailey, 1999; Fang et al., 2000; Goulding and Murray, 1976). The usefulness of this approach is apparent, as total volume equations already exist and will continue to be used in the future, although they can also be used with new volume equations (Diéguez-Aranda et al., 2006).

Scots pine (Pinus sylvestris L.) is one of the most important forest species in Spain. Forests of this species cover vast extensions in the main mountain ranges, providing important environmental and social benefits, as well as high quality timber and other forest products. The total area covered by Scots pine in Spain is approximately 1.28 million hectares, with a total volume of more than 91 million cubic meters and an annual growth of almost 3.7 million cubic meters (MMA, 2005).

Individual total tree volume estimation is currently based on existing volume tables and equations developed for Spain, and only volume systems have been developed for Galicia (Diéguez-Aranda et al., 2006; Novo et al., 2003) and for the north of Spain (Lizarralde and Bravo, 2003). From these, only the volume system developed by Diéguez-Aranda et al. (2006) is compatible. With ever changing market conditions, there is a need for accurate estimates of tree volumes from multiple upper stem merchantability limits, which are not currently possible with existing full-stem volume tables and equations for this species.

The objective of this study was to develop a compatible merchantable volume system that provides a good description of the stem profile, thus providing accurate estimates of any portion of the stem volume for Scots pine in the major mountain ranges of Spain.


2.1. Study area and data description

The data used in this study correspond to trees felled in stands located in the three main areas where Scots pine occurs naturally in Spain (the Pyrenees, the Iberian Mountain Range and the Central Mountain Range), and from a group of trees from stands planted in Galicia (northwestern Spain). The data cover 9 of the 17 regions of origin of this species in Spain, as defined by Catalán et al. (1991), although they cover all areas where Scots pine is common. A total of 2682 trees were felled, 679 in the Central Mountain Range (60 trees from the origin region of Sierra de Gredos – No. 11 –, and 619 from the origin region of Sierra de Guadarrama – No. 10 –), 421 in the Pyrenees (179 trees from the origin region of Pirineo montano húmedo Aragonés – No. 5 –, and 242 from the origin region of Pirineo montano húmedo Catalán – No. 6 –), 1354 in the Iberian Mountain Range (80 trees from the origin region of Alto Ebro – No. 2 –, 475 from the origin region of Montes Universales – No. 12 –, 44 from the origin region of Sierra de Gúdar – No. 14 –, 170 from the origin region of Sierras de Tortosa y de Beceite – No. 15 –, and 585 from the origin region of Monta na Soriano-Burgalesa – No. 8 –), and 228 in plantations in Galicia.

The data were grouped into mountain ranges (MR), in order to study differences in stem taper for the main areas where this species occurs in Spain (Fig. 1). Data correspond to 80 trees for the Northern Iberian Range (MR1), 679 trees for the Central Range (MR2), 228 trees for the Galician Mountains (MR3), 689 trees for the Southern Iberian Range (MR4), 585 trees for the Soria and Burgos Mountains (MR5), and 421 trees for the Pyrenees (MR6).

The trees were selected to ensure a representative distribution by diameter and height classes. Diameter at breast height (D – cm –, 1.3 m above ground level) was measured to the nearest 0.1 cm in each tree. The trees were later felled, leaving stumps of average height of 0.11 m, and total bole length, i.e. total height (H, m), was measured to the nearest 0.01 m. The trees were then cut into logs at 1 to 2.5 m intervals, measured to the nearest cm. Two perpendicular diameters over bark were measured in each cross section (at height h – m – from ground level), to the nearest 0.1 cm, and then geometrically averaged (d, cm). Log volumes were calculated in cubic meters with Smalian’s formula. The top section was treated as a cone. Over-bark total stem volume (above stump) was obtained by summing the over-bark log volumes and the volume of the top of the tree.

thumbnail Figure 1

Mountain range data groups and distribution of Pinus sylvestris L. in Spain (Ceballos, 1966).

The scatter plot of relative diameter (d/D) against relative height (h/H) was examined visually for each mountain range to detect possible anomalies in the data. Extreme data points were observed in all mountain ranges, and therefore the systematic approach proposed by Bi (2000) for detecting abnormal data points was applied to increase the efficiency of the process. This involved local quadratic fitting with a smoothing parameter of 0.3 for all the mountain ranges, which was selected after iterative fitting and visual examination of the smoothed taper curves overlaid on the data (Fig. 2). By use of this approach, the number of extreme values accounted for between 0.18% and 1.9% of the total taper measurements among the six mountain ranges, and about 0.73% for all the mountain ranges together.

A small portion of the extreme data points were caused by mistakes in measuring the bole sections or in transcription of field notes, but most of these data points corresponded to stem deformations brought about by fire damage, large knots and other physical damage such as partial death of the stem and wood, growth deformations, etc. Since taper functions are not intended for deformed stems, these data points (but not the whole tree) were excluded from further analysis. Summary statistics of the final data set used in this study are shown in Table I.

2.2. Stem taper selected functions

Relatively simple taper functions can effectively describe the general taper of trees; however, they do not describe the entire stem accurately. They may provide reasonable estimates in the mid-portion of the bole, but are usually less accurate for estimating the profile in the butt or upper stem segments (Demaerschalk and Kozak, 1977; Max and Burkhart, 1976; Newnham, 1992).

To deal with this, two types of equations –segmented and variable-exponent taper equations– were used. Although a large number of taper functions of these kinds have been developed and many describe the diameter along the stem quite well (e.g., Bi, 2000; Bruce et al., 1968; Kozak, 1988; Max and Burkhart, 1976; Muhairwe, 1999; Newnham, 1992; Riemer et al. 1995), the segmented function of Fang et al. (2000) and the variable exponent function of Kozak (2004) have shown very good results in many studies of several species of pinus and other species in Spain (Barrio et al., 2007; Castedo-Dorado et al., 2007; Diéguez-Aranda et al., 2006; Rojo et al., 2005) and in Mexico (Corral-Rivas et al., 2007), and behaved better than others in preliminary analyses. They were therefore selected for further analysis.

thumbnail Figure 2

Refined data points of relative diameter and relative height plotted with a local regression loess smoothing curve (smoothing factor = 0.3) for each mountain range studied.

Table I

Summary statistics for the tree data set.

In describing tree taper, it is generally accepted that a tree stem can be divided into three geometric shapes: the top is considered as a cone, the central section a frustum of a paraboloid, and the butt a frustum of a neiloid (Husch et al., 1982). Valentine and Gregoire (2001), however, reported that for some species the segments did not conform to these shapes.

Segmented models, first introduced by Max and Burkhart (1976), describe these shapes by fitting each with a different equation, and then mathematically joining the segments to produce an overall segmented function. The segmented compatible model developed by Fang et al. (2000) assumes three sections with a variable-form factor, although constant for each. The expression of this model is: (1) where: k = π / 40000, q = h / H,

p1 = h1 / H and p2 = h2 / H (h1 and h2 are the heights from ground level where the two inflection points assumed in the model occur),

ai, bi and pi are the parameters to be estimated.

Fang et al. (2000) also derived a compatible model for merchantable (v) and total volume (V) from stump height by direct integration of the taper model. Their expressions are: (2)(3)Although the development of this compatible system is based on equation (3), any other volume equation can be used as input into the system.

Variable-exponent taper equations were introduced by Kozak (1988), and describe the stem shape with a changing exponent or variable, from the ground to the top of the tree, to represent the neiloid, paraboloid, conic and several intermediate forms (Kozak, 1988; Newnham, 1988). They are basically allometric functions of the form y = kxc, where y and x are the dependent and independent variables, respectively, k a constant and c is the exponent term descriptive of tree form. This approach is based on the assumption that the stem form varies continuously along the length of a tree and eliminates the need to develop segmented taper functions for different portions of the stem. In comparison with single and segmented taper models, this approach usually provides the lowest degree of local bias and the greatest precision in taper predictions (e.g., Bi, 2000; Kozak, 1988; Sharma and Zhang, 2004), although they have the disadvantage that cannot be analytically integrated to calculate total stem or log volumes. Kozak (2004) developed two new models based on his original 1988 model in order to reduce the associated multicollinearity. The comparison of these models against the original and another one proposed by the same author in 1997 (Kozak, 1997) verified that the most stable, and therefore the most consistent taper model was the following: (4)where and w = 1 − q1 / 3.

2.3. Model fitting

To avoid problems in the estimation of the parameters of the system of Fang et al. (2000) when h = H, i.e., when d = 0, a small value, lower than the appreciation limit used in the data collection, was reassigned to the diameters equal to zero. A value lower than the corresponding height appreciation limit was also subtracted from the heights equal to the total height. A similar approach was used by Fang et al. (2000) in order to avoid the logarithm of 0 in model fitting. This approach allows the use of the entire data set for fitting and does not significantly change the parameter estimates (Diéguez-Aranda et al., 2006; Fang et al., 2000).

Although there are several possibilities for fitting the compatible system of Fang et al. (2000), selection of the fitting option will depend on the forest manager, who should decide if the major use for the system will be for estimating total volume followed by volumes in size assortments (or vice versa), or a mixture between them (Diéguez-Aranda et al., 2006). Here the problem is the model of Kozak (2004), which does not have a closed from integral. Under certain conditions it is still possible to do simultaneous fitting when one of the models is an integral (e.g., Parresol and Thomas, 1996; Thomas et al., 1995); however, as actual volumes are rarely known, because they are usually calculated by Smalian’s or Huber’s formula, which provide only approximations of the actual volume and have been shown to overestimate the volumes, especially in the butt region (Husch et al., 1982; Kozak, 1988; Martin, 1984), the option of estimating the parameters of the taper function and recovery of the implied total volume equation was finally used for fitting the compatible system of Fang et al. (2000). This also allowed a more rigorous comparison between the estimations of this system and the variable-exponent taper equation of Kozak (2004), as in both cases the same variable (d) was optimized during the fitting process.

The models were fitted by use of the least squares technique. However, there are several problems associated with stem taper function analysis that violate the fundamental least squares assumptions of independence of errors, of which multicollinearity and autocorrelation are the most important (Kozak, 1997). Appropriate statistical procedures should thus be used in model fitting to avoid problems of autocorrelated errors, and models with low multicollinearity should be selected whenever possible (Kozak, 1997), because these problems may seriously affect the standard errors of the coefficients, invalidating statistical tests using t or F distributions and confidence intervals (Neter et al., 1990, p. 300), and because the least squares estimates of regression coefficients are no longer efficient (there is no upper bound on the variance of the estimators), although they remain unbiased (Myers, 1990, p. 392).

Multicollinearity refers to the existence of strong correlations among the independent variables in multiple linear or nonlinear regression analysis, because some of the variables represent or measure similar phenomena. To evaluate the presence of multicollinearity among variables in the models analyzed, the condition number (CN), which is defined as the square root of the ratio of the largest to the smallest eigenvalue of the correlation matrix, was used. According to Belsley (1991, pp. 139–141), if the CN is 5–10, collinearity is not a major problem, if it is in the range of 30–100, then there are problems associated with collinearity, and if it is in the range of 1000–3000 there are severe problems associated with collinearity. Myers (1990, p. 370) suggests a CN greater than 32 as indicating problems associated with multicollinearity.

Since the database contains multiple observations for each tree, it is reasonable to expect that the observations within each tree are spatially correlated, which violates the assumption of independent error terms. Thus, a continuous autoregressive error structure (CAR(x)), which accounts for the distance between measurements, was used to account for the inherent autocorrelation of the hierarchical structure of the data. This error structure expands the error term to (Zimmerman and Nú nez-Antón, 2001): (5)where eij is the jth ordinary residual of the ith individual, eijk is the j-kth ordinary residual of the ith individual, Ik = 1 when j > k and 0 when jk, ρk is the k-order continuous autoregressive parameter to be estimated, and hij-hijk is the distance separating the jth from the j-kth observation within each tree i, with hij > hijk. In this case εij is an independent normal distributed error term with mean value of zero.

To test for the presence of autocorrelation and the order of the CAR(x) to be used, plots representing residuals versus residuals from previous observations (lag-residuals) within each tree were examined visually. Appropriate fits for the models with correlated errors were achieved by including the CAR(x) error structure in the MODEL procedure of SAS/ETS® (SAS Institute Inc., 2004), which allows for dynamic updating of the residuals.

2.4. Model comparison

Numerical and graphical analyses of the residuals were used as criteria for judging the performance of the taper functions. Three goodness-of-fit statistics were used: the coefficient of determination (R2) (Ryan, 1997, pp. 419 and 424), the root mean square error (RMSE), and the Bayesian Information Criterion (BIC) (Schwarz, 1978). Although there are several shortcomings associated with the use of the R2 in non-linear regression, the general usefulness of some global measure of model adequacy appears to override some of those limitations (Ryan, 1997, p. 424); nevertheless, it must not be used as the only criterion for selecting the best model (Myers, 1990, p. 166). The RMSE is useful because it is expressed in the same units as the dependent variable, and thus shows the mean error of the model; in addition, it penalizes the models with more parameters, in agreement with the general principle of scientific simplicity. The BIC is a criterion for model selection also based on the residual sum of squares and the number of parameters to be estimated; given any two estimated models, the model with the lower value of BIC is the one to be preferred. The expressions of these statistics can be summarized as follows: (6)(7)(8)where Yi, and are the measured, estimated and average values of the dependent variable, respectively; n is the total number of observations used to fit the model; and p is the number of model parameters.

Ordinary residuals are measures of quality of fit and do not necessarily assess the quality of future prediction (Myers, 1990, p. 168). For this purpose, validation of the model must be carried out, and for this process, only newly collected data will help somewhat (Kozak and Kozak, 2003). Because of the scarcity of such data, several methods have been proposed (e.g., splitting the data set or cross-validation, double cross-validation), although they seldom provide any additional information compared with the respective statistics obtained directly from models built from entire data sets (Kozak and Kozak, 2003). Thus, because decisions have to be made with available information, we decided to wait to obtain new data before such validation is carried out.

Although single indices of overall prediction (R2 and RMSE) are good indicators of the effectiveness of a taper function, they may not indicate the best model for practical purposes. Therefore, the taper models were further assessed by use of box plots of dresiduals against position (percentage relative height points along the stem, i.e., 5%, 15%, 25%, and so on up to 95%). The same was done for h residuals by relative diameters. These graphs, calculated by position, are very important for showing areas for which the taper functions provide especially poor or good predictions (Kozak, 2004; Kozak and Smith, 1993).

2.5. Comparison of taper functions between regions

To compare the differences in the analyzed taper functions between different mountain ranges, the non-linear extra sum of squares method (Bates and Watts, 1988, pp. 103–104) was used. This method is based on the likelihood-ratio test for detecting simultaneous homogeneity among parameters, and requires the fitting of reduced and full models. This test has frequently been applied in forestry to analyze differences between different geographic regions (e.g., Castedo et al., 2005; Huang et al., 2000; Pillsbury et al., 1995).

thumbnail Figure 3

An example, for mountain range 2, of d residuals plotted against: Lag1-residuals (left column), Lag2-residuals (middle column), and Lag3-residuals (right column) for the model of Fang et al. (2000) fitted without considering the autocorrelation parameters (first row), and with continuous autoregressive error structures of first and second order (second and third rows, respectively).

Table II

Parameter estimates (approximate standard errors in brackets) of the models analyzed.

The reduced model corresponds to the same set of parameters for all the mountain ranges. The full model corresponds to different sets of parameters for each mountain range, and it is obtained by expanding each global parameter including an associated parameter and a dummy variable to differentiate the six mountain ranges. The appropriate test statistic uses the following expression: (9)where SSER is the error sum of squares of the reduced model, SSEF is the error sum of squares of the full model, and dfR and dfF are the degrees of freedom of the reduced and full model, respectively. The non-linear extra sum of squares follows an F-distribution.

If the above F-test results reveal that there are no differences among the taper equations for different mountain ranges, only a composite model, fitted to the combined data, is needed. If the F-test results show that there are differences among the taper equations (P < 0.05) further tests are needed to evaluate whether the differences are caused by as few as two or as many as all the mountain ranges. For instance, a full model for each of the 15 possible mountain range paired comparisons should be compared with the corresponding reduced model by use of the F-test. Only when an insignificant F-value (P > 0.0033 considering the Bonferroni’s correction) is obtained, should the taper function for these two mountain ranges be considered similar and combined.


Initially, the models were fitted without expanding the error terms to account for autocorrelation. A similar trend in residuals of the taper model as a function of the distance between the measurements along the stem within the same tree was apparent in all of the models analyzed. An example of the observed autocorrelation with the model of Fang et al. (2000) is shown in Figure 3 (first row). After correcting for autocorrelation with a modified second-order continuous autoregressive error structure, the trends in residuals disappeared (Fig. 3, third row).

All the parameters were significant at P < 0.05 (Tab. II), except the following parameters in the model of Kozak (2004): b5 for MR1, b2 and b4 for MR2, and b2, b4 and b5 for MR4. All the models provided reasonably good data fits (Tab. III), and explained more than 97% of the total variance of d, with RMSEs between 0.73 and 1.59 cm, depending on the mountain range.

The multicollinearity of both models was moderate, as inferred from the condition number (Tab. III), with values of 23–49 and 40–78, depending on the mountain range, for the models of Fang et al. (2000) and Kozak (2004), respectively. This indicates that on some occasions slight problems with multicollinearity may appear (as observed by the fact that some parameters were not significant); however, these problems are not really important, at least for practical use of the models.

The box plots of d residuals against relative height classes (Fig. 4) and of h residuals against relative diameter classes (Fig. 5) did not show any clear systematic tendency that indicates deficient behavior of the models, or any clear differences between the two models analyzed.


Nowadays detailed information is available as regards the different functions and methodologies for correct estimation of diameters at different heights and total or merchantable stem volume, and as indicated by the high percentage of explained variability obtained in this and previous studies (e.g., Barrio et al., 2007; Diéguez-Aranda et al., 2006).

Single taper models were found to represent stem shape quite accurately (e.g., Bruce et al., 1968; Goulding and Murray, 1976; Kozak et al., 1969), although more flexible models were later introduced (Kozak, 1988; Max and Burkhart, 1976), in an attempt to provide a better description of the stem profile, especially in the high-volume butt region (Cao et al., 1980).

Merchantable volume equations derived from stem taper functions that are compatible with total volume equations are usually preferred. The use of the total volume equation simplifies the calculations and makes the model more suitable for practical purposes when classification of the products by merchantable sizes is not required (Diéguez-Aranda et al., 2006).

In the present study the compatible volume system of Fang et al. (2000), which uses a segmented model to describe the stem shape, and the exponent variable taper model of Kozak (2004) were fitted to Scots pine stem data, in an attempt to estimate the stem shape, and thus merchantable and total volume, as accurately as possible. The models were fitted by use of a continuous autoregressive error structure to deal with the problem of autocorrelation associated with the use of repeated measures within an individual. Although accounting for autocorrelation does not improve the predictive ability of the model, it prevents underestimation of the covariance matrix of the parameters, thereby making it possible to carry out the usual statistical tests (West et al., 1984), i.e., it improves interpretation of the statistical properties. The model estimations were not significantly different from those obtained with models fitted without considering such correction; therefore, autocorrelation parameters are disregarded in practical applications unless one is working with several diameter measurements at different heights on the same tree.

The autocorrelation may be explained by the effect of stand conditions (particularly stand density) on stem form (Larson, 1963), although some studies found this influence to be variable (e.g., Sharma and Zang, 2004, only found density differences in one of the three species they analyzed). Correlated errors can also be caused by lack of fit, and tests for autocorrelation have also been suggested for testing for this (Draper and Smith, 1998). However, the R2 values obtained (Tab. III) suggest that lack of fit is not an issue in this study.

To study possible differences among different locations, the data were grouped into mountain ranges, and different models were fitted for each. The statistics of fit showed little differences in the two models; the model of Kozak (2004) provided lower errors for 4 out of the 6 mountain ranges, with reductions from 0.8 to 0.1 cm in the RMSE (3.9 to 0.7%). As regards the plots of d residuals against relative height classes (Fig. 4), there was no difference between the two models. In general, for relative heights between 0–10% and 65–85%, both models showed larger standard errors of the estimates than at other height intervals. These relative height classes may be associated with stem butt swell and the point that was equivalent to the base of the live crown for most sample trees (Jiang et al., 2005). For relative heights over 90% there was a slight tendency to underestimate the diameter for mountain ranges 2, 4, 5 and 6. These tendencies were slightly more evident in the model of Kozak (2004). Because stem analysis was usually stopped at a 7.5 cm top diameter and few measurements existed in the top sections, these results should be considered carefully. However, as the latter part of the stem is the least valuable, and the part that accumulates least volume, these results do not have a great impact on the overall performance and applied use of the models. For sections closer to the ground, both models provided good estimates with no bias (Fig. 4). Accurate predictions of diameter of these sections are important because the basal log is particularly important from a commercial point of view.

Table III

Goodness-of-fit statistics and condition number of the models analyzed.

As regards the plots of h residuals against relative diameter classes (Fig. 5), there was generally a slight tendency to overestimate h for relative diameter classes under 25%, particularly for mountain ranges 2 and 5 with both models.

All these statistics and plots show no clear advantage of one model against the other. However, the model of Fang et al. (2000) has the advantage that it is compatible with a merchantable and a total volume equation; furthermore, a new or preexisting volume equation can be used as input into the system, making its application more flexible. In this case, the precision of the taper model obviously depends on the precision of the volume equation used (Diéguez-Aranda et al., 2006).

thumbnail Figure 4

Box plots of d residuals (Y-axis, cm) against relative height classes (X-axis, in percent) for the different models and mountain ranges. The plus signs represent the mean of prediction errors for the corresponding relative height classes. The boxes represent the interquartile ranges. The maximum and minimum diameter over bark prediction errors are represented respectively by the upper and lower small horizontal lines crossing the vertical lines.

thumbnail Figure 5

Box plots of h residuals (Y-axis, m) against relative diameter classes (X-axis, in percent) for the different models and mountain ranges. The plus signs represent the mean of prediction errors for the corresponding relative diameter classes. The boxes represent the interquartile ranges. The maximum and minimum height prediction errors are represented respectively by the upper and lower small horizontal lines crossing the vertical lines.

Finally, multicollinearity was also considered. As previously explained, this problem is not a decisive factor, although models with less severe multicollinearity should be used whenever possible (Kozak, 1997). As inferred from the condition number (Tab. IV), the model of Fang et al. (2000) showed weak multicollinearity (slightly less than the model of Kozak, 2004). Therefore, as in previous studies in which these models have been compared (e.g., Barrio et al., 2007; Corral-Rivas et al., 2007), the compatible volume system of Fang et al. (2000) is proposed as the most adequate for describing the stem profile and predicting stem volume of Scots pine in the major mountain ranges of Spain. Also, the model of Fang et al. (2000) had no non-significant parameters, which is another rationale for choosing it over the model of Kozak (2004).

Table IV

F-test of the differences mountain ranges obtained with the model of Fang et al. (2000).

Results of the fitting process for full and reduced forms of the model of Fang et al. (2000) with the combined data are shown in Table IV (first row). The F-statistic calculated with equation (9) was 92.9, and the probability of such a value under the null hypothesis was lower than 0.001. There were, therefore, differences among the taper equations for different mountain ranges. Since the differences may be caused by as few as two or as many as all the mountain ranges, F-tests were also carried out for each pair of mountain ranges so that the source of the differences could be identified (Tab. IV). All of the 15 possible paired comparisons produced significant F-values, suggesting that significantly different equations (Eq. (1) with the parameters of Tab. II) are required for the six mountain ranges. The greatest differences (as inferred from the F-values) were between mountain range 3 and mountain ranges 4, 5 and 6. These differences may be due to the different origin of mountain range 3 (plantations). All the models have the first inflection point at around 10% of total height, and the second inflection point at around 67% of total height, except the model for mountain range 5, for which the second inflection point was at around 20% of the total height, suggesting that the trees from that region were different than trees from other regions. Similar differences were obtained for other models (e.g., Castedo et al., 2005; Saunders and Wagner, 2008), which have been found to be specific to localities, site fertility and/or structural stand types.

Diéguez-Aranda et al. (2006) also fitted the model of Fang et al. (2000) to data from artificial Scots pine stands in Galicia and found it to be the best among several other models in terms of estimating diameters along the stem and total stem volume. They simultaneously fitted the equations to predict diameter and volume, and their parameter estimates were therefore slightly different from those obtained in the present study. They explained 98.7% of the variation in d, with a RMSE of 1.1 cm, as in the present study, showing that the differences obtained with different fitting methods are actually very small. As both models provided similar results, either can be used for accurate estimation of merchantable volume in the region of Galicia.


Data for the present study was provided from research plots belonging to the Instituto Nacional de Investigaciones Agrarias (INIA), management plans from Cercedilla, Navacerrada and El Paular, and from Felipe Bravo’s doctoral thesis. The authors thank all who provided data. We also thank Dr. Christine Francis for correcting the English grammar of the text.


  • Barrio M., Diéguez-Ar, a U., Castedo-Dorado F., Álvarez J.G. and von Gadow K., 2007. Merchantable volume system for pedunculate oak in northwestern Spain. Ann. For. Sci. 64: 511–520 [CrossRef] [EDP Sciences] [Google Scholar]
  • Bates D.M. and Watts D.G., 1988. Nonlinear regression analysis and its applications, Wiley, New York. [Google Scholar]
  • Belsley D.A., 1991. Conditioning diagnostics, collinearity and weak data in regression, Wiley, New York [Google Scholar]
  • Bi H., 2000. Trigonometric variable-form taper equations for Australian eucalyptus. For. Sci. 46: 397–409 [Google Scholar]
  • Bruce R., Curtis L. and van Coevering C., 1968. Development of a system of taper and volume tables for red alder. For. Sci. 14: 339–350 [Google Scholar]
  • Burkhart H.E., 1977. Cubic-foot volume of loblolly pine to any merchantable top limit. South. J. Appl. For. 1: 7–9 [Google Scholar]
  • Cao Q.V., Burkhart H.E. and Max T.A., 1980. Evaluations of two methods for cubic-foot volume prediction of loblolly pine to any merchantable limit. For. Sci. 26: 71–80 [Google Scholar]
  • Castedo F., Barrio M., Parresol B.R. and Álvarez J.G., 2005. A stochastic height-diameter model for maritime pine ecoregions in Galicia (northwestern Spain). Ann. For. Sci. 62: 455–465 [CrossRef] [EDP Sciences] [Google Scholar]
  • Castedo-Dorado F., Diéguez-Ar, a U. and Álvarez-González J.G., 2007. A growth model for Pinus radiata D. Don stands in north-western Spain. Ann. For. Sci. 64: 453–465 [Google Scholar]
  • Catalán G., Gil P., Galera R.M., Martín S., Agúndez D. and Alía R., 1991. Las regiones de procedencia de Pinus sylvestris L. y Pinus nigra Arn. subsp. salzmannii (Dunal) Franco en Espana, ICONA, Ministerio de Agricultura, Pesca y Alimentacion, Madrid. [Google Scholar]
  • Ceballos L., 1966. Mapa Forestal de Espa na, 1:400,000, Direccion General de Montes, Caza y Pesca Fluvial, Ministerio de Agricultura, Madrid. [Google Scholar]
  • Clutter J.L., 1963. Compatible growth and yield models for loblolly pine. For. Sci. 9: 354–371 [Google Scholar]
  • Clutter J.L., 1980. Development of taper functions from variable top merchantable volume equations. For. Sci. 26: 117–120 [Google Scholar]
  • Corral-Rivas J.J., Diéguez-Ar, a U., Corral S. and Castedo F., 2007. A merchantable volumen system for major pine species in El Salto, Durango (Mexico). For. Ecol. Manage. 238: 118–129 [CrossRef] [Google Scholar]
  • Demaerschalk J., 1972. Converting volume equations to compatible taper equations. For. Sci. 18: 241–245 [Google Scholar]
  • Demaerschalk J.P. and Kozak A., 1977. The whole-bole system: a conditioned dual-equation system for precise prediction of tree profiles. Can. J. For. Res. 7: 488–497 [CrossRef] [Google Scholar]
  • Diéguez-Ar, a U., Castedo F., Álvarez J.G. and Rojo A., 2006. Compatible taper function for Scots pine plantations in northwestern Spain. Can. J. For. Res. 36: 1190–1205 [CrossRef] [Google Scholar]
  • Draper N.R. and Smith H., 1998. Applied Regression Analysis, 3rd ed., Wiley, New York. [Google Scholar]
  • Fang Z. and Bailey R.L., 1999. Compatible volume and taper models with coefficients for tropical species on Hainan Island in Southern China. For. Sci. 45: 85–100 [Google Scholar]
  • Fang Z., Borders B.E. and Bailey R.L., 2000. Compatible volume-taper models for loblolly and slash pine based on a system with segmented-stem form factors. For. Sci. 46: 1–12 [Google Scholar]
  • FAO, 2006. Global Forest Resources Assessment 2005. Progress towards sustainable forest management, FAO Forestry Paper 147, Rome. [Google Scholar]
  • Goulding C.J. and Murray J., 1976. Polynomial taper equations that are compatible with tree volume equations. N. Z. J. For. Sci. 5: 313–322 [Google Scholar]
  • Huang S., Price D., Morgan D. and Peck K., 2000. Kozak’s variable-exponent taper equation regionalized for white spruce in Alberta. West.J. Appl. For. 15(2): 75–85. [Google Scholar]
  • Husch B., Miller C.I. and Beers T.W., 1982. Forest Mensuration, 3rd ed., Krieger Publishing Company, Malabar, Florida. [Google Scholar]
  • Jiang L., Brooks J.R. and Wang J., 2005. Compatible taper and volume equations for yellow-poplar in West Virginia. For. Ecol. Manage. 213: 399–409 [CrossRef] [Google Scholar]
  • Jordan L., Berenhaut K., Souter R. and Daniels R.F., 2005. Parsimonious and completely compatible taper, total and merchantable volume models. For. Sci. 51: 578–584 [Google Scholar]
  • Kozak A., 1988. A variable-exponent taper equation. Can. J. For. Res. 18: 1363–1368 [CrossRef] [Google Scholar]
  • Kozak A., 1997. Effects of multicollinearity and autocorrelation on the variable-exponent taper functions. Can. J. For. Res. 27: 619–629 [CrossRef] [Google Scholar]
  • Kozak A., 2004. My last words on taper functions. For. Chron. 80: 507–515 [Google Scholar]
  • Kozak A. and Kozak R.A., 2003. Does cross validation provide additional information in the evaluation of regression models? Can. J. For. Res. 33: 976–987 [Google Scholar]
  • Kozak A. and Smith J.H.G., 1993. Standards for evaluating taper estimating systems. For. Chron. 69: 438–444 [Google Scholar]
  • Kozak A., Munro D. and Smith J., 1969. Taper functions and their application in forest inventory. For. Chron. 45: 278–283 [Google Scholar]
  • Larson P.R., 1963. Stem form development of forest trees. For. Sci. Monogr. 5: 1–42 [Google Scholar]
  • Lizarralde I. and Bravo F., 2003. Crown and taper equations for scots pine (Pinus sylvestris L.) in northern Spain. In: Vacik H., Lexer M.J., Rauscher M.H., Reynolds K.M., and Brooks R.T. (Eds.), Decision support for multiple purpose forestry. A transdisciplinary conference on the development and application of decision support tools for forest management, April 23–25, 2003, University of Natural Resources and Applied Life Sciences, Vienna, Austria, CD-Rom Proceedings. [Google Scholar]
  • Martin A.J., 1984. Testing volume equation accuracy with water displacement techniques. For. Sci. 30: 41–50 [Google Scholar]
  • Max T.A. and Burkhart H.E., 1976. Segmented polynomial regression applied to taper equations. For. Sci. 22: 283–289 [Google Scholar]
  • MMA, 2005. Anuario de Estadística Forestal, Dirección General para la Biodiversidad, Ministerio de Medio Ambiente, Madrid. [Google Scholar]
  • Muhairwe C.K., 1999. Taper equations for Eucalyptus piluraris and Eucalyptus grandis for the north coast in New South Wales, Australia. For. Ecol. Manage. 113: 251–269 [CrossRef] [Google Scholar]
  • Myers R.H., 1990. Classical and modern regression with applications, 2nd ed., Duxbury Press, Belmont, California. [Google Scholar]
  • Neter J., Wasserman W. and Kutner M.H., 1990. Applied linear statistical models: regression, analysis of variance and experimental designs, 3nd ed., Irwin, Boston, Massachusetts. [Google Scholar]
  • Newnham R., 1988. A variable-form taper function. For. Can., Petawawa Natl. For. Inst., Inf. Rep. PI-X-83. [Google Scholar]
  • Newnham R., 1992. Variable-form taper functions for four Alberta tree species. Can. J. For. Res. 22: 210–223 [CrossRef] [Google Scholar]
  • Novo N., Rojo A. and Álvarez J.G., 2003. Funciones de perfil del tronco y tarifas de cubicación con clasificación de productos para Pinus sylvestris L. en Galicia. Investig. Agrar. Sist. Rec. For. 12: 123–136 [Google Scholar]
  • Parresol B.R. and Thomas C.E., 1996. A simultaneous density-integral system for estimating stem profile and biomass: slash pine and willow oak. Can. J. For. Res. 26: 773–781 [CrossRef] [Google Scholar]
  • Pillsbury N.H., McDonald P.M., Simon and V., 1995. Reliability of tanoak volume equations when applied to different areas. West. J. Appl. For. 10: 72–78 [Google Scholar]
  • Reed D. and Green E., 1984. Compatible stem taper and volume ratio equations. For. Sci. 30: 977–990 [Google Scholar]
  • Riemer T., von Gadow K. and Sloboda B., 1995. Ein Modell zur Beschreibung von Baumschäften. Allg. Forst Jagdztg. 166: 144–147 [Google Scholar]
  • Rojo A., Perales X., Sánchez-Rodríguez F., Álvarez-González J.G. and von Gadow K., 2005. Stem taper functions for maritime pine (Pinus pinaster Ait.) in Galicia (Northwestern Spain). Eur. J. For. Res. 124: 177–186 [CrossRef] [Google Scholar]
  • Ryan T.P., 1997. Modern regression methods, John Wiley & Sons Inc., New York. [Google Scholar]
  • SAS Institute Inc., 2004. SAS/ETS 9.1 user’s guide, SAS Institute Inc, Cary, N.C. [Google Scholar]
  • Saunders M.R. and Wagner R.G., 2008. Height-diameter models with random coefficients and site variables for tree species of Central Maine. Ann. For. Sci. 65: 203. [Google Scholar]
  • Schwarz G., 1978. Estimating the dimension of a model. Ann. Stat. 6(2): 461–464. [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
  • Sharma M. and Zhang S.Y., 2004. Variable-exponent taper equations for jack pine, black spruce, and balsam fir in eastern Canada. For. Ecol. Manage. 198: 39–53 [CrossRef] [Google Scholar]
  • Thomas C.E., Parresol B.R., Lê K.H.N. and Lohrey R.E., 1995. Biomass and taper for trees in thinned and unthinned longleaf pine plantations. South. J. Appl. For. 19: 29–35 [Google Scholar]
  • Valentine H.T. and Gregoire T.G., 2001. A switching model of bole taper. Can. J. For. Res. 31: 1400–1409 [CrossRef] [Google Scholar]
  • West P.W., Ratkowsky D.A. and Davis A.W., 1984. Problems of hypothesis testing of regressions with multiple measurements from individual sampling units. For. Ecol. Manage. 7: 207–224 [CrossRef] [Google Scholar]
  • Zakrzewski W.T. and MacFarlane D.W., 2006. Regional stem profile model for cross-border comparisons of harvested Red pine (Pinus resinosa Ait.) in Ontario and Michigan. For. Sci. 52: 468–475 [Google Scholar]
  • Zimmerman D.L. and Nú nez-Antón V., 2001. Parametric modeling of growth curve data: an overview (with discussion). Test 10: 1–73 [CrossRef] [Google Scholar]

All Tables

Table I

Summary statistics for the tree data set.

Table II

Parameter estimates (approximate standard errors in brackets) of the models analyzed.

Table III

Goodness-of-fit statistics and condition number of the models analyzed.

Table IV

F-test of the differences mountain ranges obtained with the model of Fang et al. (2000).

All Figures

thumbnail Figure 1

Mountain range data groups and distribution of Pinus sylvestris L. in Spain (Ceballos, 1966).

In the text
thumbnail Figure 2

Refined data points of relative diameter and relative height plotted with a local regression loess smoothing curve (smoothing factor = 0.3) for each mountain range studied.

In the text
thumbnail Figure 3

An example, for mountain range 2, of d residuals plotted against: Lag1-residuals (left column), Lag2-residuals (middle column), and Lag3-residuals (right column) for the model of Fang et al. (2000) fitted without considering the autocorrelation parameters (first row), and with continuous autoregressive error structures of first and second order (second and third rows, respectively).

In the text
thumbnail Figure 4

Box plots of d residuals (Y-axis, cm) against relative height classes (X-axis, in percent) for the different models and mountain ranges. The plus signs represent the mean of prediction errors for the corresponding relative height classes. The boxes represent the interquartile ranges. The maximum and minimum diameter over bark prediction errors are represented respectively by the upper and lower small horizontal lines crossing the vertical lines.

In the text
thumbnail Figure 5

Box plots of h residuals (Y-axis, m) against relative diameter classes (X-axis, in percent) for the different models and mountain ranges. The plus signs represent the mean of prediction errors for the corresponding relative diameter classes. The boxes represent the interquartile ranges. The maximum and minimum height prediction errors are represented respectively by the upper and lower small horizontal lines crossing the vertical lines.

In the text