Data Science Option

The ability to manipulate and understand data is increasingly critical to discovery and innovation. The vast majority of scientific and engineering disciplines, including molecular engineering, have entered an era in which discovery is no longer limited by the collection and processing of data, but by the management, analysis, and visualization of data. Therefore, the next-generation of scientists need to be prepared to manipulate and understand large, dynamic data sets.


The Molecular Engineering (MolE) Graduate Program offers a Data Science Option (DSO) to MolE graduate students so that they can receive credentialed training in the analysis of large datasets. The goal of this option is to introduce students to the foundations of data science, and provide them with techniques and tools that they can apply to their own research. This option is primarily designed for students with little or no background in data science, computer science, or coding, and is directed towards students who want to become proficient “tool users” as opposed to “tool builders”. 

Students who complete the requirements outlined below will have the option included as part of the degree title that appears on their transcript.


Students must complete approximately 11-14 course credits (3 courses at 3-4 credits each) and 2 seminar credits. Many of the data science courses can also be used to satisfy core MolE PhD requirements, so the additional overall course load is limited.

I. Students must take a course from two of the following three areas:

1. Software development for data science

Highly recommended courses:
   – Software Development for Data Scientists: (CSE 583)
   – Software Engineering for Molecular Data Scientists: (ChemE 546)

2. Statistics and machine learning

Highly recommended courses:
   – Introduction to Machine learning: (CSE 416/STAT 416)
   – Introduction to Statistical Machine Learning: (STAT 435)

3. Data management and data visualization

Highly recommended courses:
   – Introduction to Database Systems: (CSE 414)
   – Data Visualization: (CSE 512/CSE 412)
   – Information for Visualization (HCDE 411/511)
   – Interactive Information Visualization: (INFX 562)

II. Register and attend the weekly eScience Community Seminar for at least 2 quarters

III. Fulfillment of the core MolE Program Research Facet: “Theory, Computation and Modeling”.

Due to the interdisciplinary nature of the MolE PhD, the list of “Theory, Computation and Modeling” courses fluctuate based on the offerings of partner departments, but regularly include courses from BioE, Chem, ChemE, ECE, CSE, MSE, MechE, and Physics.  These classes are offered quarterly and are part of the current requirements for MolE graduate students.  

eScience Institute

The MolE data science option is supported by the eScience Institute. Students interested in data science should also check out other activities organized by the eScience Institute such as tool and method-oriented workshops as well as speaker series. Visit for more information. 


All full time Ph.D. students in the MolE program who are in good standing are eligible to participate in the data science option.

Questions should be emailed to the MolE Graduate Program Advisor (Paul Neubert,