September 9, 2019
Yellowbrick Advisory Board Meeting Held on September 9, 2019 from 2030-2230 EST via Video Conference Call. Rebecca Bilbro presiding as Chair, Benjamin Bengfort serving as Secretary and Edwin Schmierer as Treasurer. Minutes taken by Benjamin Bengfort.
Attendees: Benjamin Bengfort, Larry Gray, Rebecca Bilbro, Tony Ojeda, Kristen McIntyre, Prema Roman, Edwin Schmierer, Adam Morris, Eunice Chendjou
Agenda
A broad overview of the topics for discussion in the order they were presented:
Welcome (Rebecca Bilbro)
Summer 2019 Retrospective (Rebecca Bilbro)
GSoC Report (Adam Morris)
YB API Audit Results (Benjamin Bengfort)
OpenTeams Participation (Eunice Chendjou)
Fall 2019 Contributors and Roles
Yellowbrick v1.1 Milestone Planning
Project Roadmap through 2020
Other business
Votes and Resolutions
There were no votes or resolutions during this board meeting.
Summer 2019 Retrospective
In traditional agile development, sprints are concluded with a retrospective to discuss what went well, what didn’t go well, what were unexpected challenges, and how we can adapt to those challenges in the future. Generally these meetings exclude the major stakeholders so that the contributors to the sprint can speak freely. Because the Board of Advisors are the major stakeholders of the Yellowbrick project, there was an intermediate retrospective with just the maintainers of the Summer 2019 semester so they could communicate anonymously and frankly with the Board. Their feedback as well as additions by the advisors follow.
Accomplishments
tl;dr we had a very productive summer!
Larry stepped into the new coordinator role, setting the tone for future coordinators
We had two new maintainers: Prema and Kristen
Approximately 12 new contributors
65 Pull Requests were merged
35 tweets were tweeted
Only 95 issues remain open (down from over 125, not including new issues)
3 Yellowbrick talks in DC, Texas, and Spain
We completed our first Google Summer of Code successfully
Perhaps most importantly, Version 1.0 was released. This release included some big things from new visualizers to important bug/API fixes, a new datasets module, plot directives and more. We are very proud of the result!
Shoutouts
To Larry for being the first coordinator!
To Prema for shepherding in a new contributor from the PyCon sprints who enhanced our SilhouetteScores visualizer!
To Kristen for working with a new contributor to introduce a brand new NFL receivers dataset!
To Nathan for working with a new contributor on the cross-operating system tests!
To Benjamin for shoring up the classifier API and the audit!
To Adam and Prema for spearheading the GSoC application review and to Adam for serving as guide and mentor to Naresh, our GSoC student!
To Rebecca for mentoring the maintainers!
Challenges
It was difficult to adjust to new contributors/GSoC
“Contagious issues” made it tough to parallelize some work
Maintainer vacation/work schedules caused communication interruptions
Balancing between external contributor PRs and internal milestone goals
This milestone may have been a bit over-ambitious
Moving Forward
These are some of the things that worked well for us and that we should keep doing:
Make sure the “definition of done” is well defined/understood in issues
Balancing PR assignments so that no one gets too many
Using the “assignee” feature in GitHub to assign PRs so that it’s easier to see who is working on what tasks
Use the maintainer’s Slack channel to unify communications
Communication in general – make sure people know what’s expected of them and what to expect of us
Getting together to celebrate releases!
Pair reviews of PRs (especially for larger PRs)
Semester and Roadmap
The fall semester will be dedicated to completing Yellowbrick Version 1.1. The issues associated with this release can be found in the v1.1 Milestone on GitHub.
The primary milestone objectives are as follows:
Make quick methods prime time (and extend the oneliners page)
Add support for sklearn Pipelines and FeatureUnions
The secondary objectives are at the discretion of the core contributors but should be along one of the following themes:
A neural-network specific package for deep learning visualization
Adding support for visual pipelines and other multi-image reports
Creating interactive or animated visualizers
The maintainers will create a Slack channel and discuss with the Fall contributors what direction they would like to go in, to be decided no later than September 20, 2019.
Fall 2019 Contributors
Name |
Role |
---|---|
Adam Morris |
Coordinator |
Prema Roman |
Maintainer |
Kristen McIntyre |
Maintainer |
Benjamin Bengfort |
Maintainer |
Nathan Danielsen |
Maintainer |
Lawrence Gray |
Core Contributor |
Michael Chestnut |
Core Contributor |
Prashi Doval |
Core Contributor |
Saurabh Daalia |
Core Contributor |
Bashar Jaan Khan |
Core Contributor |
Rohan Panda |
Core Contributor |
Pradeep Singh |
Core Contributor |
Mahkah Wu |
Core Contributor |
Thom Lappas |
Core Contributor |
Stephanie R Miller |
Core Contributor |
Coleen W Chen |
Core Contributor |
Franco Bueno Mattera |
Core Contributor |
Shawna Carey |
Core Contributor |
George Krug |
Core Contributor |
Aaron Margolis |
Core Contributor |
Molly Morrison |
Core Contributor |
Project Roadmap
With the release of v1.0, Yellowbrick has become a stable project that we would like to see increased usage of. The only urgent remaining task is that of the quick methods - which will happen in v1.1. Beyond v1.1 we have concluded that it would be wise to understand who is really using the software and to get feature ideas from them. We do have a few themes we are considering.
Add a neural package for ANN specific modeling. We already have a text package for natural language processing, as deep learning is becoming more important, Yellowbrick should help with the interpretability of these models as well.
Reporting and data engineering focused content. We could consider a text output format (like .ipynb) that allows easy saving of multiple visualizers to disk in a compact format that can be committed to GitHub, stored in a database, and redrawn on demand. This theme would also include model management and maintenance tasks including detecting changes in models and tracking performance over time.
Visual optimization. This tasks employs optimization and learning to enhance the quality of the visualizers, for example by maximizing white space in
RadViz
orParallelCoordinates
, detecting inflection points as with thekneed
port inKElbow
, or adding layout algorithms for better clustering visualization inICDM
or the inclusion of word maps or trees.Interactive and Animated visualizers. Adding racing bar charts or animated TSNE to provide better interpetibility to visualizations or adding an Altair backend to create interactive Javascript plots or other model visualization tools like pyldaviz.
Publication and conferences. We would like to continue to participate in PyCon and other conferences. We might also submit proposals to O’Reilly to do Yellowbrick/Machine Learning related books or videos.
These goals are all very high level but we also want to ensure that the package makes progress. Lower level goals such as adding 16 new visualizers in 2020 should be discussed at the January board meeting. To that end, advisors should look at how they’re using Yellowbrick in their own work to consider more detailed roadmap goals.
Minutes
In her welcome, Rebecca described the goal of our conversation for the second governance meeting was first to talk about how things went over the summer, to celebrate our successes with the v1.0 launch and to highlight specific activities such as GSoC and the audit. The second half of the meeting is to be used to discuss our plans for the fall, which should be more than half the conversation. In so doing she set a technical tone for the mid-year meetings that will hopefully serve as a good guideline for future advisory meetings.
Google Summer of Code
Adam reports that Naresh successfully completed the GSoC period and that he wrote a positive review for him and shared the feedback we discussed during the v1.0 launch. You can read more about his summer at his blog, which documents his journey.
Naresh completed the following pull requests/tasks:
Added train alpha and test alpha to residuals
Added an alpha parameter to
PCA
Added a stacked barchart helper and stacking to
PoSVisualizer
Updated several visualizers to use the stacked barchart helper function
Updated the
DataVisualizer
to handle target type identificationAdded a
ProjectionVisualizer
base class.Updated Manifold and PCA to extend the
ProjectionVisualizer
Added final tweaks to unify the functionality of
PCA
,Manifold
, and other projections.
We will work on sending Naresh a Yellowbrick T-shirt to thank him and have already encouraged him to continue to contribute to Yellowbrick (he is receptive to it). We will also follow up with him on his work on effect plots.
If we decide to participate in GSoC again, we should reuse the idea list for the application, but potentially it’s easiest to collaborate with matplotlib for GSoC 2020.
API Audit Results
We conducted a full audit of all visualizers and their bases in Yellowbrick and categorized each as red (needs serious work), yellow (has accumulated technical debt), and green (production-ready). A summary of these categorizations is as follows:
There are 14 base classes, 1 red, 3 yellow, and 10 green
There are 36 visualizers (7 aliases), 4 red, 7 yellow, 25 green
There are 3 other visualizer utilities, 2 red, 1 green
There are 35 quick methods, 1 for each visualizer (except manual alpha selection)
Through the audit process, we clarified our API and ensured that the visualizers conformed to it:
fit()
returnsself
,transform()
returnsXp
,score()
returns [0,1],draw()
returnsax
, andfinalize()
returnsNone
(we also updatedpoof()
to returnax
).No
_
suffixed properties should be set in__init__()
Calls to
plt
should be minimized (and we addedfig
to the visualizer)Quick methods should return the fitted/scored visualizer
Additionally, we took into account the number/quality of tests for each visualizer, the documentation, and the robustness of the visualization implementation to rank the visualizers.
Along the way, a lot of technical debt was cleaned up; including unifying formatting with black and flake8 style checkers, updating headers, unifying scattered functionality into base classes, and more.
In the end, the audit should give us confidence that v1.0 is a production-ready implementation and that it is a stable foundation to grow the project on.
OpenTeams
Eunice Chendjou, COO of OpenTeams, joined the meeting to observe Yellowbrick as a model for successful open source community governance, and to let the Advisory Board know about OpenTeams. OpenTeams is designed to highlight the contributions and work of open source developers and to help support them by assisting them in winning contracts and finding funding. Although currently it is in its initial stages, they have a lot of big plans for helping open source teams grow.
Please add your contributions to Yellowbrick by joining OpenTeams. Invite others to join as well!
Action Items
Add your contributions to the Yellowbrick OpenTeams projection
Send invitations to those interested in joining the 2020 board (all)
Begin considering who to nominate for January election of board members (all)
Send Naresh a Yellowbrick T-shirt or thank you (Adam)
Create the Fall 2019 contributors Slack channel (Benjamin)
Start thinking about how to guide the 2020 roadmap (all)
Publications task group for O’Reilly content (Kristen, Larry)