Math 1530: Project 2
Math & Computer Science Building

Data & Statistics Projects

Project 2  

Minitab

Group Expectations

Meetings:  Each group should meet at least twelve hours before each report is due. 

1st Draft:  Before meetings and at least 48 hours before the report's due date on the syllabus, each group member should post a draft version of their part of the report in their D2L group discussions.  Before meetings and at least 36 to 24 hours before each report is due, the group leader should post to the group project's dropbox a draft of the entire report.  All group members will be responsible for grading the dropbox-posted project with the rubric and editing the project whenever the rubric is not met.  The group should go through the entire report, discussing each part and matching each part to the rubric.  Each group member is responsible for the accuracy of all calculations; the clarity, completeness, and accuracy of the answers to all questions; and the overall report presentation, style, and quality.  If any group member feels a part of the report does not meet rubric specifications, is incorrect, or is lacking in any way, that part of the report should be discussed and revised by the group at this group meeting.

Evaluate your peers by submitting one peer review for each member of your group.  Individual grades will greatly depend on turning in your own evaluations by the due date and in the evaluations of you by fellow group members.

The group leader will immediately (within 2 to 3 days of each previous project or pre-project being due) assign all parts to the various group members, keeping the Wrapping It Up part for himself or herself and splitting the remaining parts as equally as possible based on the point distribution in the rubric and the part divisions already assigned in the project description.  The group leader is responsible for ensuring group members post their respective assigned parts of the project at least two days prior to the project due date - or even earlier if the leader specifies an earlier time when assigning parts, at the leader's discretion.  If a group member is late, the leader will reassign that member's part to active group members and will let the instructor know immediately.

Other members will be responsible for completing all assigned parts at least two days prior, or even earlier, if specified, and posting those parts to the group discussion board.  If the group leader has not assigned roles for the project within two to three days of the previous project or pre-project being due, then the group contact the instructor know immediately.  Any student who does not understand completely his or her assigned part should immediately ask for clarity and help from group members. Any member turning in his or her part late may receive a zero for the current project if this part was assigned and completed by another member because of the lateness. 

Report Expectations

In formal, professional reporting style, the full report for Project 1 needs to be written to include everything in the parts below, although the questions themselves should not be included. This report should read as if the reader has no knowledge of these questions nor any knowledge that the questions have prompted the writing of the report. The entire report should be in complete sentences and full paragraphs with the exception of tables, charts, graphs, equations, and lists of statistics. Everything from formatting to grammar to style should be as professional as possible.

To be successful with this project, students should:

  1. Use only Minitab, since Minitab will be used in Projects 2 and 3.  The focus for this project will be on Minitab, so 100% of calculations are expected to be computed using only Minitab with the only exception being simple arithmetic.
  2. Only save the project in Minitab when finished with the entire project:  Parts A, B, and C.  Minitab has been known previously not to save all files if the project is saved multiple times at different stages, which is why saving only once at the end is recommended.
  3. Never close graphs or windows in the project, though they may be minimized to get them out of the way.
  4. When rounding, give at least three significant digits for final answers and at least four significant digits for answers that will be used in future calculations.
  5. Present a neat, organized and clearly communicated project document.  Students have the latitude to present work on this project in any professional manner.
  6. Don't assume what the question is asking unless certain. Looking up items can dramatically improve the group grade.  There is an index in the back of your book and at the end of the digital textbooks in CourseCompass where one can easily jump to the page or pages where each term is discussed.  Please feel free to ask questions in the Data & Statistics Projects discussion, especially when clarity is needed on what is being asked.
  7. Students who don't complete and post assignments to group discussions at least two days prior to the syllabus due date for the report will receive between 0% to 50% of the group project grade.
  8. Reports should be professionally written and completely without reference to this set of instructions or questions. Even though all questions and parts listed here should be addressed using the same or similar terminology within the report, preferably in order, the reader of the report should not be aware that the report was guided by a set of questions or instructions.

***This video was made in a previous term, and some of the functions you are to do have been updated since then.  Be sure to read and explicitly follow the instructions in Part A.


Instructions for Minitab 17 Only

Please note that all instructions for Minitab Express, which works on both Mac and PC, are located within D2L Content under Project 2 within the descriptions of the download links for Express. Please email your instructor if you are having trouble finding these instructions.

Click in the Minitab session window & type the following in the session window:

            Your Group Name
            List of Group Members' Names
            MATH 1530 Elements of Statistics
            Project #2: Minitab
            Due Date:  (actually type the due date)
  

1 - Look at all of your data from your appendix in Project 1.  For all of the quantitative data, make sure there is nothing but numbers in the cells.  In other words, if you have dollar signs, dashes, or any special characters, you must remove those.  If you have data ranges such as 18-22, 23-27, 28-32 and so on, this will not be acceptable.  Take an average of the class by taking the lower limit of the class you are in (e.g., 18) and the upper limit (e.g., 22) and averaging these two:  20.  Replace all such data ranges with single values.

Once all of your quantitative data is purely numerical with no symbols or letters at all, highlight all of your table in the appendix except for the row and column headers, copy, and paste into an open Minitab project in the spreadsheet area in the white cells starting in Row 1 Column C1.  After pasting, your Row 1 will correspond with data from person 1, Row 2 with person 2, and so on.  Your columns will be C1 or C1-T, followed by C2 or C2-T, and so on.  The -Ts mean you have text in your columns, and these columns can only be used as categorical variables.  If you have quantitative data in a -T column, you have done something wrong, and you should start again with a new project.  If you have too much trouble, you may need to type things in, but I hope it won't get to that point.

Once all data are in and all columns correctly have -T or no -T, you should add concise column labels to represent the data.  For example, I'll transform my hypothetical pre-project report questions into column titles.  Here are the questions:

1.  On a scale of 0 to 10 with 0 being epic fail (bad) and 10 being made of awesome (good), how would you rate the Harry Potter and the Deathly Hallows, Part I movie?

2.  On that same scale, how would you rate the book, Deathly Hallows, if you've read the book?

3.  On a scale of 0 to 10 with 0 being indifferent and 10 being hopelessly devoted, how big of a Harry Potter fan are you?

4.  Who is your favorite character in the movie?

5.  How old are you, if you don't mind disclosing?

6.  Harry/Hermione or Harry/Ginny?

7.  If you could do one spell, what would it be?

And my column titles would be:  Movie Rating, Book Rating, Fanaticism, Favorite Character, Age, H/H or H/G, Spell.

Note:  Be sure that, for categorical data columns, each category is spelled exactly the same way with exactly the same spacing and capitalization.  If not, this will result in duplicate categories.

2 - Stats, Histograms, and Boxplots:  Do Stat > Basic Statistics > Display Descriptive Statistics.  Inside variables, select all possible quantitative variables listed.  Click on Statistics and select Interquartile range, sum of squares, and skewness.  Press OK once.  Click on Graphs.  Select Histogram of data and Boxplot of data.  Select OK.  Select OK again. 

Note:  Check to make sure that each categorical variable is only listed one time.  If you find that a categorical variable appears multiple times, this is probably because of some spacing, spelling, or capitalizing difference in the way you have this variable written in your data column.  You will want to go through and correct this problem, making each category  spelled exactly the same way with exactly the same spacing and capitalization.  Then repeat this Step 2.

This step will give you 1) a histogram, 2) a boxplot, and 3) columns of detailed statistics in the session window for every quantitative variable.  Each histogram and boxplot will have its own window.

3 -  Bar Charts:  Graph > Bar Chart....  From the dropdown menu, select a function of a variable.  Select Cluster from the top row of options, One Y, and choose OK.  The function in the dropdown should already say Mean, which is what we want.  Choose all of your quantitative variables for graph variables and one of your categorical variables for categorical grouping.  Select OK.

4 - Frequency Tables:  Stat > Tables > Tally Individual Variables.  Select all quantitative and categorical variables.  Display Counts and Percents.  Select OK.

5 - Pie Chart:  Graph > Pie Chart. Leave the chart of unique values selected.  Under categorical variables, choose all your categorical variables.  Select the Pie Options button and choose Decreasing volume.  Select Labels, choose the Slice Labels tab, and select category name and percent.  Select the Multiple Graphs button and choose On separate graphs.

6 - To determine the linear regression model and to see it drawn on a scatterplot of data, select Stat > Regression > Fitted Line Plot.  We want to model the relationship between two of your quantitative variables.  So, in the window, double-click your as response variable and your explanatory variable as the predictor variable.   Since we are working with linear regression, under Type of Regression Model, select Linear.  Click OK.

The linear regression model equation for your variables will be displayed at the top of your graph window AND a graph of the scatterplot of the data (red points) with the regression line (blue line) will be drawn.  Don't forget the regression model line contains points that represent the perfect or model data. 

7 - Compute the correlation of your variables  by selecting Stat > Basic Statistics > Correlation. Click on the box label Variables. In the box to the left, double-click your explanatory and then your response variables.  Click OK.  The correlation coefficient, r, for your two variables will be displayed in your session window and will be called the Pearson correlation. 

In this part of the project, choose your quantitative data that looks the most bell shaped, when comparing all of your histograms. 

8 - Construct a normal probability of this normal data value by selecting Graph > Probability Plot . Click on the single picture and OK In the box to the left, double-click your best bell-shaped quantitative variable. Click OK.

9 - Prepare a column for receiving the standardized data by clicking in the gray cell just below the first empty column and typing z.  To standardize, select CALC > Standardize, choose your best bell-shaped quantitative variable for input columns, choose the z column for storing results, and click OK.  You should now have new data in the z column, which are the standardized values (z-scores) of every data value listed in your best bell-shaped quantitative data variable.  

10 - Assume your best bell-shaped quantitative data are normally distributed with a mean equal to the computed sample mean and a standard deviation of equal to the computed sample standard deviation.  Click CALC > Probability Distributions > Normal, select the circle in front of Cumulative Probability, and type the given mean and standard deviation in the appropriate boxes.  Click the circle in front of Input Constant.  Type the constant (data value) of representing the first quartile, Q1, of your data value in the box next to the Input Constant.  Click OK The probability will be displayed in the session window. Repeat using Q3 instead of Q1 as the input constant.

Save your project as Project2 (File > Save Project As > Project2_GroupName) and write down where you saved it !  Submit only Project2_GroupName.mpj to the Minitab dropbox, and triple check that the file you've loaded says mpj.   If the last part is not .mpj, you may not receive credit for the Minitab part. 

After loading your file, reopen straight from the dropbox to make sure that:

  1. All graphs (scatterplot, probability plot, histograms, bar charts, pie charts) are included,
  2. The worksheet with z-scores is included,
  3. The heading information with your name, date, and so on is included, and
  4. All Minitab analysis is included (frequency tables, descriptive statistics, regression analysis and two cumulative distribution functions).

The Report

PART A

Choose the same quantitative variable from Project 1.  Make a bolded heading for the first variable - the variable that you analyzed in Part B of Project 1.  Under that heading, copy and paste that variable's frequency table from Minitab. Copy the statistics row for that data from the Descriptive Statistics in the session window of Minitab, including column headings.  (Descriptive Statistics should have rows for each data variable, but delete all of the rows except for this first quantitative variable you've chosen.)  Copy the histogram and boxplot for this data.  Do the frequency table, stats, histogram, and boxplot in this Project 2 agree with what you found in Project 1?  If not, what is different and why? 

Next, copy the bar chart for this data. For the bar chart, tell whether there seems to be an association looking at the clustered bars.  Note:  Clustered bars will be the same heights (a uniform distribution) for data when no association exists.  If bar heights are significantly different, an association between the displayed variables is implied.

Choose the same categorical variable from Project 1.  Make a bolded heading for the categorical variable.  Under that heading, copy and paste that variable's frequency table from Minitab. Does the frequency table agree with what you found in Project 1?  If not, what is different and why?  Copy the pie chart for this variable.  Does the pie chart look similar to Project 1's pie chart?  If not, what is different and why?

PART B

Copy and paste the scatterplot.  Describe what the scatterplot looks like, the strength of the association of the variables from the scatterplot, and the direction of the association (positive or negative).  Look up Q1 and Q3 for your explanatory variable.  Using the linear regression model equation from the scatterplot, it is predicted that the response variable will be what when the explanatory variable is Q1? What will the response be when the explanatory variable is Q3?

Copy and paste the correlation analysis.  The correlation coefficient, r, for your two variables is ???.   Interpret the meaning of the correlation coefficient, r.  Your interpretation must address both strength & direction of association of the relationship between the variables. 

PART C

In order to use the normal distribution and its associated area under the curve to compute expected percentages or probability, we must assume the data are reasonably normally distributed:  unimodal, symmetric, without skew.  However, a histogram is not reliable enough when the number of data are small because there may not be a sufficient number of data to construct a histogram.  In situations such as these, another graph can be produced:  a normal probability plot.  This plot is a scatterplot that places your data on one axis and what has already been determined to be normally distributed data on the other axis.  If our data line up well with the normally distributed data, we can be safe in assuming our data are also normally distributed. (Hint:  Lining up well means that the data form a single, reasonably straight line.  Multiple, disconnected lines do not mean that the data line up well.)

Copy and paste in the histogram from the data you used for Part C, which is the quantitative data that was your most bell-shaped.  Describe this histogram in terms of 1) modality, 2) symmetry, and 3) skewness. 

Copy and paste your normal probability plot.  Do the data in this plot line up well with each other in one single line?  (Note:  Lining up in multiple, perfectly straight lines is not what we are talking about, here.  Rather, do the data roughly form one single, approximately straight line?)  Does the normal probability plot indicate the data is normal, approximately normal, or not normal? 

Copy and paste the standardized values (z-scores) from the z column you created.  Using the Empirical Rule's definition of outliers, are any of the data values potentially extreme values (outliers)?  If so, which ones are outliers?  Explain why you were able to conclude that you do or do not have outliers.

Copy and paste the cumulative distribution functions for Q1 and Q3. The probability that a data value selected at random would be less than Q1 is ??? The probability that a data value selected at random would be greater than Q3 is ???. 

If the data were perfectly normal, what would the probability be of selected a data value less than Q1?  More than Q3?  (Hint:  In other words, how much proportion of the data do we expect to be below Q1 and above Q3 by the definitions of Q1 and Q3?)  


Wrapping It Up

Have a paragraph at the end detailing what exactly each group member did to contribute to the entire group effort.

You will want the document to have a title page with a title for the paper, Math 1530, the date, the name of the group, and a list of the group members. 

Save your MS Word document as Project2_GroupName.doc or Project2_GroupName.docx

Load the files to the dropboxes.  Open the files straight from the dropboxes so that you can grade the files that are in the dropboxes.  Print the group project rubric and grade your report, correcting anything in the report that fails to fully satisfy the rubric.

After grading and correcting your report, load the report to D2L's Project 2 Report dropbox.  Every single group member needs to reopen the MS Word report straight from the dropbox to make sure that all parts are completely answered and that this report is the latest, best edited version.  Every single group member will be held responsible for making sure that the report is accurate, complete, and of the highest quality by grading the report with the rubric. 

Evaluate your peers by submitting one peer review for each member of your group.  Individual grades will greatly depend on turning in your own evaluations by the due date and in the evaluations of you by fellow group members as well as evidence in what you did in that last report paragraph and in the discussion forum.