Full Report Writeup

Home / Full Report

This page contains the full written report and findings for this project.

Introduction

For many students, choosing a college is one of the biggest financial and personal decisions they will ever make. Tuition prices continue to rise, student debt remains a major issue, and families increasingly want to know whether attending a certain school will lead to stronger financial outcomes after graduation. At the same time, colleges vary significantly in ranking, tuition, size, acceptance rate, and overall prestige. Because of this, we wanted to explore a question that many students already think about during the college search process: how strongly do college characteristics relate to graduate earnings?

Our project focuses on the relationship between institutional characteristics and salary outcomes for graduates across colleges in the United States. Specifically, we examined how factors such as college ranking, tuition costs, enrollment size, and acceptance rates relate to median graduate earnings several years after students leave school. Rather than looking at only one factor in isolation, we wanted to compare multiple institutional characteristics together and determine whether certain trends consistently appeared throughout the data.

This topic felt especially relevant to us as college students because these are the same types of questions we hear constantly from students, parents, and counselors. Many people assume that a higher-ranked or more expensive college will automatically produce higher salaries, but we wanted to see whether the data actually supports those assumptions. We also wanted to create a project that was visually interactive and easy to understand for users without a technical background.

Our final project combines data cleaning, exploratory analysis, and interactive visualizations to better understand the connection between higher education characteristics and financial outcomes after graduation. The goal was not simply to create charts, but to create a professional and informative data story that allows users to interact with the data and form their own conclusions.

Research Questions and Goals

The central research question for our project was:
“How do college characteristics such as ranking, tuition, enrollment size, and acceptance rate relate to graduate salary outcomes?”

From this main question, we developed several smaller goals:

One of our biggest goals was accessibility. We wanted the project to be understandable to someone without a strong background in statistics or data science. Instead of overwhelming users with technical language, we focused on creating clear visualizations and concise written explanations that communicate the main findings effectively.

Another important goal was professionalism. Throughout the semester, we learned that good visualizations are not only about presenting data, but also about presentation quality, readability, organization, and user experience. We tried to design our website so that it felt polished, modern, and interactive rather than looking like a simple collection of classroom graphs.

Data Sources

Our project primarily used data from the U.S. College Scorecard dataset along with ranking information collected from external ranking datasets. The College Scorecard dataset contains detailed information about colleges in the United States, including tuition, enrollment, acceptance rate, and post-graduation earnings.

Some of the main variables used in our project included:

One challenge during the project was combining information from multiple datasets. Different datasets often used slightly different school names, which created matching issues during the merging process. For example, some schools used abbreviations while others used full names. Cleaning and standardizing the data became an important step before visualization could begin.

We also removed missing or incomplete observations when necessary to avoid misleading conclusions. This was especially important for variables such as rankings and salary outcomes because incomplete rows could distort trends shown in the visualizations.

Data Cleaning and Preparation

A significant portion of the project involved cleaning and organizing the data before any visualizations were created. Although data visualizations often appear simple on the surface, we learned that preparing the data properly is one of the most time-consuming parts of the process.

We used Python and Jupyter Notebook to clean and process the datasets. Pandas was used heavily for filtering, renaming columns, merging datasets, and handling missing values. We also standardized naming conventions between datasets so that schools could merge correctly.

One issue we encountered involved ranking data across different years. Some schools had rankings available only for certain years, while others had multiple ranking values across time. To solve this, we created a “most recent rank” variable that prioritized the newest available ranking for each school.

We also had to decide which variables were most useful for our analysis. Initially, we considered including many additional variables, but eventually narrowed the project to variables that were both meaningful and visually interpretable. This helped keep the project focused and prevented the visualizations from becoming cluttered.

After cleaning, the final dataset allowed us to compare schools consistently across multiple institutional characteristics and earnings measures.

Visualization Design and Website Development

An important part of our project was not only creating visualizations, but also presenting them in a way that felt interactive and engaging. Instead of submitting static charts alone, we built a website that organized our findings into a cleaner and more professional format.

When designing the visualizations, we focused heavily on readability and avoiding misleading representations. We selected graph types that matched the data appropriately and allowed patterns to become visually clear.
For example:

One of the strongest parts of the project was the interactive nature of the website itself. Rather than simply describing conclusions in text, users can directly explore the data and identify patterns themselves. This creates a more engaging experience and aligns well with the purpose of data visualization.

We also spent considerable time refining the visual appearance of the website. Early versions of the project looked much more basic, but through multiple revisions, we improved the layout, spacing, labeling, and overall presentation quality. We wanted the final product to look like a professional portfolio-quality project rather than a rough classroom assignment.

Findings and Analysis

Relationship Between College Rank and Earnings

One of the clearest trends in our project was the relationship between institutional ranking and graduate earnings. Higher-ranked schools generally tended to produce higher median earnings after graduation.

However, the relationship was not perfectly linear. While elite institutions often appeared near the top of the salary distributions, there were also many mid-ranked schools with strong salary outcomes. This suggests that ranking alone does not completely determine graduate success.

The visualization also showed clusters of schools with similar rankings but significantly different earnings outcomes. This was one of the more interesting findings because it demonstrates that students should not rely exclusively on rankings when evaluating colleges.

In many cases, specialized institutions or schools with strong STEM and business programs appeared to outperform schools with similar overall rankings. This highlights how program focus and career pathways may influence earnings beyond institutional prestige alone.

Tuition and Salary Outcomes

Another major area of analysis involved tuition costs and graduate earnings. Initially, we expected that more expensive colleges would consistently produce higher salaries. The data partially supported this idea, but the relationship was more complicated than expected.

Many high-tuition schools did show strong earnings outcomes, especially private universities and highly selective institutions. However, the data also revealed substantial overlap between expensive and lower-cost schools.

Some public universities demonstrated relatively high graduate earnings despite lower tuition levels. This finding was particularly important because it suggests that strong financial outcomes are not limited exclusively to the most expensive institutions.

This visualization also raised questions about return on investment. A school with slightly lower earnings outcomes may still provide better overall value if tuition costs are dramatically lower. Although our project did not calculate full return-on-investment metrics, the visualizations clearly showed that tuition alone is not enough to predict future salary outcomes.

Acceptance Rate and Earnings

Acceptance rate also showed an interesting relationship with graduate salaries. Schools with lower acceptance rates generally tended to produce higher earnings outcomes, which aligns with the idea that more selective institutions often have stronger reputations and resources.

At the same time, there were important exceptions. Some schools with moderate or even relatively high acceptance rates still showed strong earnings outcomes. This suggests that selectivity is correlated with salary outcomes, but not entirely deterministic.

One interesting pattern was the large spread of salaries among schools with middle-range acceptance rates. This indicates that factors beyond selectivity, such as academic specialization, geographic location, internship opportunities, and alumni networks, may play major roles in shaping graduate success.

Enrollment Size and Institutional Differences

The relationship between enrollment size and earnings was less direct than some of the other variables. Larger universities did not automatically produce higher or lower salaries.

Instead, the visualization suggested that institutional size alone is not a strong predictor of graduate earnings. Some very large public universities showed excellent salary outcomes, while some smaller schools also performed extremely well.

This finding reinforced one of the major themes of our project: no single variable fully explains graduate earnings outcomes. College success is shaped by a combination of institutional characteristics rather than one defining factor.

Interpretation and Broader Insights

One of the biggest takeaways from our project is that college outcomes are far more complex than simple rankings or prestige labels. While higher-ranked and more selective schools often perform well financially, the data consistently showed overlap between different categories of institutions.

This is important because many students assume there is a single “best” path toward financial success after graduation. Our analysis suggests that the reality is much more nuanced. Students should consider multiple factors, including cost, academic programs, career goals, geographic location, and long-term financial value.

Another important insight is that data visualization itself can make complicated information much easier to understand. Tables of numbers alone would not communicate these trends nearly as effectively. Through interactive visualizations, users can quickly identify patterns, outliers, and relationships that would otherwise be difficult to recognize.

The project also demonstrated how visualization design choices affect interpretation. Careful choices involving scale, labeling, color, and layout helped ensure that the data remained readable and accurate without becoming visually overwhelming.

Challenges and Limitations

Although the final project turned out successfully, we faced several challenges throughout the process.

The largest challenge involved data preparation and merging datasets from multiple sources. School naming inconsistencies created matching issues that required significant cleaning and manual adjustments.

Another challenge was deciding how much information to include. We initially explored a larger number of variables, but eventually realized that too many variables made the visualizations difficult to interpret. Narrowing the project focus improved the clarity of the final product.

We also encountered technical challenges while building the website itself. Formatting visualizations, improving responsiveness, and organizing the layout required multiple revisions before the final design felt polished.

There are also limitations to the project. Median earnings alone cannot fully measure the value of a college education. Salary outcomes are influenced by many external factors, including major choice, geographic region, economic conditions, and personal career decisions.

Additionally, institutional ranking systems themselves are imperfect and subjective in some ways. Because of this, the project should not be interpreted as a definitive ranking of schools, but rather as an exploration of relationships between institutional characteristics and financial outcomes.

References

U.S. Department of Education. College Scorecard Data. https://collegescorecard.ed.gov/data/.

Ranking datasets and institutional information collected from U.S. News.

Python Libraries Used:

Tableau was used for interactive visualizations.