In the previous post, I was completing an analysis of a central composite design using the load amount, load conductivity, and load pH as inputs for the study. The initial results identified that all three factors were important; however, there were no pairwise interactions present (nor higher terms, such as quadratic). In the original paper, the authors point out that one of their runs had an abnormally low recovery percentage and so was omitted. Let's take a look at how the model changes when this is done.
First, notice where the point (red triangle) falls in the plot of the actual vs predicted:
It's pretty clear this point is having a significant impact on the model through leveraging. When the model of main effects only is applied to the results, the ANOVA results are
Source
|
DF
|
Sum of Squares
|
Mean Square
|
F Ratio
|
Model
|
2
|
0.06547415
|
0.032737
|
8.7720
|
Error
|
7
|
0.02612395
|
0.003732
|
Prob > F
|
C. Total
|
9
|
0.09159810
|
0.0124*
|
and the parameters estimates, along with their probabilities are
Term
|
Estimate
|
Std Error
|
t Ratio
|
Prob>|t|
|
|
Intercept
|
0.8820829
|
0.019367
|
45.55
|
<.0001*
|
|
Load(93,312)
|
0.1042488
|
0.025469
|
4.09
|
0.0046*
|
|
Load cond(10,30)
|
Biased
|
-0.028196
|
0.024983
|
-1.13
|
0.2963
|
Load pH(6,7.5)
|
Zeroed
|
0
|
0
|
.
|
.
|
The load conductivity and load pH are no longer statistically significant; however, looking at the actual vs predicted, several points are observed to be outside the confidence intervals.
As this was a CCD, we are justified adding some higher order terms. Re-starting the analysis with a response surface model, the ANOVA shows the following results
Source
|
DF
|
Sum of Squares
|
Mean Square
|
F Ratio
|
Model
|
5
|
0.08642340
|
0.017285
|
13.3609
|
Error
|
4
|
0.00517470
|
0.001294
|
Prob > F
|
C. Total
|
9
|
0.09159810
|
0.0132*
|
with the parameter estimated as
Term
|
Estimate
|
Std Error
|
t Ratio
|
Prob>|t|
|
|
Intercept
|
0.9453904
|
0.020768
|
45.52
|
<.0001*
|
|
Load(93,312)
|
0.1032109
|
0.017412
|
5.93
|
0.0041*
|
|
Load cond(10,30)
|
Biased
|
-0.026721
|
0.01594
|
-1.68
|
0.1690
|
Load pH(6,7.5)
|
Zeroed
|
0
|
0
|
.
|
.
|
Load*Load
|
-0.135078
|
0.052907
|
-2.55
|
0.0631
|
|
Load*Load cond
|
Biased
|
0.0373873
|
0.017758
|
2.11
|
0.1030
|
Load cond*Load cond
|
Biased
|
0.023048
|
0.043967
|
0.52
|
0.6278
|
Load*Load pH
|
Zeroed
|
0
|
0
|
.
|
.
|
Load cond*Load pH
|
Zeroed
|
0
|
0
|
.
|
.
|
Load pH*Load pH
|
Zeroed
|
0
|
0
|
.
|
.
|
Now, what to do? The dominant effect is the load and maybe the load*load term (p=0.0631). There are some camps that say a model should only be the statistically significant terms while others would argue that the fit should be of the entire model. Taking only the statistically significant terms, the ANOVA was
Source
|
DF
|
Sum of Squares
|
Mean Square
|
F Ratio
|
Model
|
2
|
0.07851582
|
0.039258
|
21.0059
|
Error
|
7
|
0.01308228
|
0.001869
|
Prob > F
|
C. Total
|
9
|
0.09159810
|
0.0011*
|
What is interesting to note is how the F ratio has slowly increased as we reach our best fit. This comes with the increasing degrees of freedom for the error term. The final model only has two degrees of freedom (load, load^2), resulting in the error term having 7 degrees of freedom to estimate the mean square error (0.001869). The estimated RMSD is obtained by applying the square root to the mean square error term.
Finally, the Prediction Profiler may be used to predict how the recovery changes as a function of load:
The sweet spot is around 200 g/L; after that point, a modest improvement in recovery results with the higher load. Of note: look at how quickly the recovery drops! At 175 g/L, it's down to 90% and then continues to plummet.
No comments:
Post a Comment