by **Boyd** » Tue Mar 21, 2017 11:32 pm

HERE IS SOME USEFUL MATERIAL.

REGARDS

LEO LINGHAM

====================================

1. Calculate the mean, median and mode from the following data relating to production of a steel mill for 60 days

Production(in tons per day)21-2223-2425-2627-2829-30

Number of days71322108

Mean, median, and mode are three kinds of "averages". There are many "averages" in statistics, but these are, I think, the three most common, and are certainly the three you are most likely to encounter in your pre-statistics courses, if the topic comes up at all.

The "mean" is the "average" you're used to, where you add up all the numbers and then divide by the number of numbers. The "median" is the "middle" value in the list of numbers. To find the median, your numbers have to be listed in numerical order, so you may have to rewrite your list first. The "mode" is the value that occurs most often. If no number is repeated, then there is no mode for the list.

• Find the mean, median, mode, and range for the following list of values: 1. [21-22] TONNES PER DAY------7 DAYS

2. [23-24] TONNES PER DAY------13 DAYS

3. [25-26] TONNES PER DAY------22 DAYS

4. [27-28] TONNES PER DAY------10 DAYS

5. [29-30] TONNES PER DAY------8 DAYS

1.[21.5] AVE.TONNES PER DAY------150.50 TONNES

2.[23.5] AVE.TONNES PER DAY------305.50 TONNES

3.[25.5] AVE.TONNES PER DAY------561.00 TONNES

4.[27.5] AVE.TONNES PER DAY-----275.00 TONNES

5.[29.5] AVE.TONNES PER DAY------236.00 TONNES ================= 1528 TONNES

============================================

The mean is the usual average, so:

GRAND TOTAL 1528 / 60 DAYS = 25.47

=========================================

The median is the middle value, so I'll have to rewrite the list in order:

150.50---236---275---305.50---561

There are 5 numbers in the list, so the middle one will be the(5 + 1) ÷ 2 = 6÷ 2 = 3RD number:

So the median is 275 / 10= 27.5. Copyright © Elizabeth Stapel 2006-2008 All Rights Reserved

The mode is the number that is repeated more often than any other, so THER is NO mode.

#######################################################

2. A restaurant is experiencing discontentment among its customers. It analyses that there are three factors responsible viz. food quality, service quality and interior décor. By conducting an analysis, it assesses the probabilities of discontentment with the three factors as 0.40, 0.35 and 0.25 respectively. By conducting a survey among the customers, it also evaluated the probabilities of a customer going away discontented on account of these factors as 0.6, 0.8 and 0.5, respectively. With this information, the restaurant wants to know that, if a customer is discontented, what are the probabilities that it is so due to food, service or interior décor?

3. The monthly incomes of a group of 10,000 persons were found to be normally distributed with mean equal to 15,000 and standard deviation equal to 1000. What is the lowest income among the richest 250 persons?

NO EXPERTISE IN Q2 , Q3

4. Write short notes on the following:

a. Test of goodness of fit

The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern. Two values are involved, an observed value, which is the frequency of a category from a sample, and the expected frequency, which is calculated based upon the claimed distribution. The derivation of the formula is very similar to that of the variance which was done earlier(chapter 2 or 3). The idea is that if the observed frequency is really close to the claimed(expected) frequency, then the square of the deviations will be small. The square of the deviation is divided by the expected frequency to weight frequencies. A difference of 10 may be very significant if 12 was the expected frequency, but a difference of 10 isn't very significant at all if the expected frequency was 1200. If the sum of these weighted squared deviations is small, the observed frequencies are close to the expected frequencies and there would be no reason to reject the claim that it came from that distribution. Only when the sum is large is the a reason to question the distribution. Therefore, the chi-square goodness-of-fit test is always a right tail test. The test statistic has a chi-square distribution when the following assumptions are met • The data are obtained from a random sample • The expected frequency of each category must be at least 5. This goes back to the requirement that the data be normally distributed. You're simulating a multinomial experiment(using a discrete distribution) with the goodness-of-fit test(and a continuous distribution), and if each expected frequency is at least five then you can use the normal distribution to approximate(much like the binomial). If the expected The following are properties of the goodness-of-fit test • The data are the observed frequencies. This means that there is only one data value for each category. Therefore, ... • The degrees of freedom is one less than the number of categories, not one less than the sample size. • It is always a right tail test. • It has a chi-square distribution. • The value of the test statistic doesn't change if the order of the categories is switched. • The test statistic is

Interpreting the Claim

There are four ways you might be given a claim. 1. The values occur with equal frequency. Other words for this are "uniform", "no preference", or "no difference". To find the expected frequencies, total the observed frequencies and divide by the number of categories. This quotient is the expected frequency for each category. 2. Specific proportions or probabilities are given. To find the expected frequencies, multiply the total of the observed frequencies by the probability for each category. 3. The expected frequencies are given to you. In this case, you don't have to do anything. 4. A specific distribution is claimed. For example, "The data is normally distributed". To work a problem like this, you need to group the data and find the frequency for each class. Then, find the probability of being within that class by converting the scores to z-scores and looking up the probabilities. Finally, multiply the probabilities by the total observed frequency.(It's not really as bad as it sounds). ==============================================

b. Critical Region of a test

When you formulate and test a statistical hypothesis, you compute a test statistic(a numerical value using a formula depending on the test). If the test statistic falls in the critical region, it leads us to reject our hypothesis. If it does not fall in the critical region, we do not reject our hypothesis. The critical region is a numerical interval.

Suppose we have a null hypothesis H0 and an alternative hypothesis H1. We consider the distribution given by the null hypothesis and perform a test to determine whether or not the null hypothesis should be rejected in favour of the alternative hypothesis.

There are two different types of tests that can be performed. A one-tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter(which can be any change- increase or decrease).

We can perform the test at any level(usually 1%, 5% or 10%). For example, performing the test at a 5% level means that there is a 5% chance of wrongly rejecting H0.

If we perform the test at the 5% level and decide to reject the null hypothesis, we say "there is significant evidence at the 5% level to suggest the hypothesis is false".

One-Tailed Test

We choose a critical region. In a one-tailed test, the critical region will have just one part(the red area below). If our sample value lies in this region, we reject the null hypothesis in favour of the alternative. Suppose we are looking for a definite decrease. Then the critical region will be to the left. Note, however, that in the one-tailed test the value of the parameter can be as high as you like.

Example

Suppose we are given that X has a Poisson distribution and we want to carry out a hypothesis test on the mean, ?, based upon a sample observation of 3.

Suppose the hypotheses are:

H0: ? = 9 H1: ? < 9

We want to test if it is "reasonable" for the observed value of 3 to have come from a Poisson distribution with parameter 9. So what is the probability that a value as low as 3 has come from a Po(9)?

P(X ? 3) = 0.0212(this has come from a Poisson table)

The probability is less than 0.05, so there is less than a 5% chance that the value has come from a Poisson(3) distribution. We therefore reject the null hypothesis in favour of the alternative at the 5% level.

However, the probability is greater than 0.01, so we would not reject the null hypothesis in favour of the alternative at the 1% level.

Two-Tailed Test

In a two-tailed test, we are looking for either an increase or a decrease. So, for example, H0 might be that the mean is equal to 9(as before). This time, however, H1 would be that the mean is not equal to 9. In this case, therefore, the critical region has two parts:

Example

Lets test the parameter p of a Binomial distribution at the 10% level.

Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the coin is fair. If the coin is fair, p = 0.5 . Put this as the null hypothesis:

H0: p = 0.5

H1: p ? 0.5

Now, because the test is 2-tailed, the critical region has two parts. Half of the critical region is to the right and half is to the left. So the critical region contains both the top 5% of the distribution and the bottom 5% of the distribution(since we are testing at the 10% level).

If H0 is true, X ~ Bin(10, 0.5).

If the null hypothesis is true, what is the probability that X is 7 or above?

P(X ? 7) = 1 - P(X < 7) = 1 - P(X ? 6) = 1 - 0.8281 = 0.1719 Is this in the critical region? No- because the probability that X is at least 7 is not less than 0.05(5%), which is what we need it to be. So there is not significant evidence at the 10% level to reject the null hypothesis. ==========================================

c. Exponential Smoothing Method

Simple Exponential Smoothing Method

This forecasting method is most widely used of all forecasting techniques .. It requires little computation.

This method is used when data pattern is horizontal(i.e., there is no neither cyclic variation nor trend in the historical data).

The equation to calculate an exponential smoothing is:

Ft=?At-1+(1-?)Ft-1

where

Ft – Forecast for the period t.

At-1 Actual value of the time-series in the prior period,

Ft-1 Forecast made for the prior period,

??? Smoothing constant between zero and one.

The value of ? determine the degree of smoothing and how responsive the model is to fluctuation in the time-series data. The value for ? is arbitrary and is determined both by the nature of the data and the feeling by the forecaster as to what constitutes a good response rate. A smoothing constant close to zero leads to a stable model while a constant close to one is highly reactive. Typically, constant values between 0.01 and 0.3 are used. Using Excel

About the Exponential Smoothing dialog box

Input Range

Enter the cell reference for the range of data you want to analyze. The range must contain a single column or row with four or more cells of data.

Damping factor

Enter the damping factor you want to use as the exponential smoothing constant. The damping factor is a corrective factor that minimizes the instability of data collected across a population. The default damping factor is 0.3.

Labels

Select if the first row and column of your input range contain labels. Clear this check box if your input range has no labels; Microsoft Excel generates appropriate data labels for the output table.

Output Range

Enter the reference for the upper-left cell of the output table. If you select the Standard Errors check box, Microsoft Excel generates a two-column output table with standard error values in the right column. If there are insufficient historical values to project a forecast or calculate a standard error, Microsoft Excel returns the #N/A error value.

Note The output range must be on the same worksheet as the data used in the input range. For this reason, the New Worksheet Ply and New Workbook options are unavailable.

Chart Output

Select to generate an embedded chart for the actual and forecast values in the output table.

Standard Errors

Select if you want to include a column that contains standard error values in the output table. Clear if you want a single-column output table without standard error values.

#####################################################