Problem: Organizing an Inexperienced Team
In my final year at Georgia Tech, I had three research assistants who had the potential to be productive laboratory technicians, but they lacked experience and confidence. I needed to create an environment where they could hit the ground running gathering high-quality experimental data, documenting their work, and organizing their data without too much micromanagement. Meanwhile, I had publications and a dissertation to write and needed to ensure the quality of my assistant’s work without spending too much time looking over their shoulders.
Solution: Data Management as a Means to Automate Team Managment
A data management strategy can do more than just make data easier to navigate. A strategy which incorporates automated progress updates allows the team members to focus on their delegated roles rather than administrative tasks. Technicians are freed to focus more on their laboratory experiments, analysts can be updated the instant new data is added, and team meetings become more efficient and focused on the next step of the project.
So I began building a data management strategy based on the needs of our project. I would:
- Design a set of experiments to explore a parameter space.
- Create a data organization system based on that parameter space.
- Build tools to automatically track data placed into the system, eliminating the need for my assistants to report their progress.
- Expand upon those tools to automate data analysis.
I used Design of Experiments (DOE) principles as the basis of our experiment design and my data management strategy. I started by writing a Python script which built a directory tree which organized according to the parameters of each experiment. Each level of the directory tree corresponded to one parameter from the experiment design.
For example: I developed this data management strategy for a study of fracture toughness in various thin-sheet metallic specimens. Thus, the parameter space had three axes: material, thickness, and type of crack. Thus, the directory structure for the project had 3 layers. For example, an experiment on a copper sheet 100 µm thick with a single edge notch would be stored in the directory .\Cu\B100um\SE.
The ordering of the parameters / directory layers was arbitrary, and I simply chose an arrangement that made it easy for my assistants to find where to put the result of their experiments. For the purpose of organizing our data, the directory tree offered an intuitive, human-readable structure. This structure also made tracking progress straightforward.
Once we started collecting data, I wrote a directory crawler script automatically generated reports on my team’s progress based on the data they collected. This eliminated the need to interrupt my assistants’ laboratory work for progress updates and allowed me to observe our large experimental dataset taking shape in real-time and plan our next move. Finally, I augmented the directory crawler to retrieve and analyze experimental data on-demand.
Once the directory crawler could retrieve data, it was straightforward to adapt the tool to analyses. For analytical purposes, it was not always advantageous to view the data in the way it was organized in the directory tree. So I added the ability to filter the data retrieved by the directory crawler based on any arbitrary criteria.
For example, consider our project exploring different materials, sheet thicknesses, and notch types. I could retrieve and compare data from any combination of the three parameters we explored. I could look at all of the single edge-notched experiments on all materials and all thicknesses. Or I could narrow my focus to only single-edge notched specimens from aluminum sheets. All of the data retrieval and visualization was automatic, and I merely needed to identify the scope I wished to examine.
Impact: An Efficient Research Team
Developing my automated team management strategy catalyzed my research team’s progress to such a degree that we accomplished more in one summer than in the previous two or three years. Tasks, like getting a progress report from my assistants, developing a work plan for the week, or applying a new analysis to old data, took only a few minutes, rather than hours of work and meetings. My management framework eliminated my need to micromanage my team, helped my assistants see how their work fit into the larger project and allowed everybody to focus on the science rather than the minutia of organizing the group’s efforts.
My new tools helped me created an optimal environment for my assistants where they always had clear goals, instant feedback on their progress, and a clear record of how their work contributed to the project. At the beginning of every week, I used my directory crawler scripts to compile a big-picture report of the team’s data, identify target areas for the coming week, and set goals for my assistants. I could check on my team’s progress at any time over the course of the week without needing to interrupt their work and then deliver praise or constructive feedback at lunch or the end of the day. This view of my assistant’s daily work even helped me with “sanity management” – making sure my assistants’ days had enough variety and challenges to keep them engaged.
The new management strategy created a highly efficient data collection and analysis pipeline which let me stop worrying about how to collect enough data for the project and shift my focus on developing new theories and models. I had built an efficient data pipeline, and the challenge of analyzing my pipeline’s output sparked my interest in data science. In one summer, my undergraduates collected more high-quality data than some grad students collect in their entire dissertation research. The dataset is so thoroughly documented and well-organized that more than a year later, my old research group is still mining the data for new insights.
Modern data acquisition technology has lowered the barrier to collecting data to such a degree that we can afford to collect data from experiments without every measurement necessarily being aimed at testing a particular hypothesis. Our success that summer hinged on a data management strategy which had the experiment design baked in. By exploring our parameter space in a systematic, disciplined way, we created a dataset conducive to developing and testing new hypotheses and models based on the comprehensive data we had already collected.
Design of experiments allowed us to avoid the traditional hypothesis – experiment – refinement – hypothesis … cycle of the scientific method, where progress is limited by the rate at which experiments can be run. Instead, we explored a large parameter space, and our physical science project became a data science problem. Our progress was only limited by how quickly and insightfully we could develop new models to understand our data.
Postscript: Experiment Specifics
I have kept this post focused on my research team management strategy with minimal discussion of actual research we were working on. I left those details out to avoid complicating the narrative. But for the curious, I thought I would write out some notes on the parameter space we were exploring.
Develop a new fracture toughness parameter for thin, ductile metal sheets where traditional analyses such as K and J cannot be applied.
Experiment Parameter Space
- Sheet material
- Sheet thickness (varied depending on material)
- Starting notch type
- Single edge notch
- Middle notch
- No notch (tensile)
Raw Data Acquired
- Optical micrograph sequences (videos of the experiment)
- Fractographs (micrographs of the fracture surfaces)
The analyses we developed were targeted at characterizing the driving force and resistance to crack growth in the ductile sheets. The goal was to find analytical approaches which were insensitive to certain parameters, such as the starting notch type. (I.e. we could use forensic fractography to identify certain parameters as good negative controls.)
But the real beauty of our approach is that we were able to run multiple hypothesis – antithesis – synthesis cycles without the rate-limiting step of waiting to complete more physical experiments. The dataset was large and robust enough that we could simply test different analyses over many different scopes – from single experiments to a comprehensive analysis of the entire set and everything in between. I suspect that there may be a few more Ph.D. dissertations worth of insight still waiting to be discovered in our dataset.
Here are some examples of analyses I developed. The parenthetical statements list the raw data for each analysis.
- Crack length (micrographs)
- Stress (force & micrographs)
- Crack propagation stress evolution (force & micrographs)
- ANOVA of crack propagation stress convergence (force, micrographs & set groupings based on parameter space)
- Strain distributions (digital image correlation deformation mapping)
- Work of fracture (force, displacement, & crack length)
- Deformation energy distribution (force & deformation maps)
- Specific deformation energy accumulation rate (force, deformation maps, and crack length)