Data Science Option

The ability to manipulate and understand data is increasingly critical to discovery and innovation. The vast majority of scientific and engineering disciplines, including molecular engineering, have entered an era in which discovery is no longer limited by the collection and processing of data, but by the management, analysis, and visualization of data. Therefore, the next-generation of scientists need to be prepared to manipulate and understand large, dynamic data sets.


The Molecular Engineering (MolE) Graduate Program offers a Data Science Option (DSO) to MolE graduate students so that they can receive credentialed training in the analysis of large datasets. Students who fulfill the requirements (outlined below) will have the option included as part of the degree title that appears on their transcript.

GOAL: Introduce students to the foundations of data science, and provide them with techniques and tools that they can apply to their own research.

This option is primarily designed for students with little or no background in data science, computer science, or coding, and is directed towards students who want to become proficient “tool users” as opposed to “tool builders”.


Students must complete approximately 11-14 course credits (3 courses at 3-4 credits each) and 2 seminar credits.
Note: Many of the data science courses can also be used to satisfy core MolE PhD requirements, so the additional overall course load is limited.

I. Take a course from two of the following three areas:

1. Software development for data science

Recommended courses:
   – Software Development for Data Scientists: (CSE 583)
   – Software Engineering for Molecular Data Scientists: (ChemE 546)
– Data Science and Materials Informatics: (MSE 542)

2. Statistics and machine learning

Recommended courses:
   – Introduction to Machine learning: (CSE 416/STAT 416)
   – Introduction to Statistical Machine Learning: (STAT 435)
– Materials and Device Modeling: (MSE 543)

3. Data management and data visualization

Recommended courses:
   – Introduction to Database Systems: (CSE 414)
   – Data Visualization: (CSE 512/CSE 412)
   – Information for Visualization (HCDE 411/511)
   – Interactive Information Visualization: (INFX 562)
– Big Data for Materials Science: (MSE 544)

II. Attend at least 2 quarters the weekly eScience Community Seminar

III. Fulfill the core MolE Program Research Facet: “Theory, Computation and Modeling”.

Due to the interdisciplinary nature of the MolE PhD, the list of “Theory, Computation and Modeling” courses fluctuate based on the offerings of partner departments, but regularly include courses from BioE, Chem, ChemE, ECE, CSE, MSE, MechE, and Physics. These classes are offered quarterly and are part of the current requirements for MolE graduate students.  

eScience Institute

The MolE data science option is supported by the eScience Institute. Students interested in data science should also check out other activities organized by the eScience Institute such as tool and method-oriented workshops as well as speaker series. Visit for more information. 


All full time Ph.D. students in the MolE program who are in good standing are eligible to participate in the data science option.

Questions should be emailed to the MolE Graduate Program Advisor (Paul Neubert,