Oct 28, 2021
I gave a talk today at PyData Global about how Git works under the
hood. If you want to think through how to build your own version
control system, and learn more about how Git stores things and why,
check out my slides here!
More
Oct 28, 2020
Earlier today, I gave a talk on Business Skills for Data Science at
ODSC West (virtual). There are lots of talks about technical skills
for data scientists (I’ve given them myself), but I think business
skills are equally important in doing impactful work. Check out my
slides here:
More
May 7, 2019
I went to my third PyCon last weekend, and gave a talk about pre-mortems and post-mortems, and how teams can use them to learn from failure. As promised in that talk, I’m making my slides available for anyone who wants to reference them. Those slides are here:
More
Oct 18, 2016
Note: This post is a follow-up to another post I wrote, which is a
more general introduction to heuristic optimization algorithms. I
recommend you read
my earlier post
before this one.
I wrote this post with helpful input from the awesome ladies
at a Chicago Write/Speak/Code meetup! Kara Carrell wrote a great
post on The Four A’s, and I’ll link to what others wrote as it
comes online.
More
Aug 29, 2016
Last weekend I gave a talk at PyData Chicago! It was called
“Evolutionary Algorithms: Perfecting the Art of ‘Good Enough’” (props
to my thesis advisor, Stefano Allesina, for the catchy title). It was
heavily based on my blog post and workshops I’ve given. Heuristic
optimizers are a fun topic for me because they are so general and
useful, but they’re not really a hot topic, and I think a lot of
people have just never encountered them. They’re a great
addition to a data scientist’s toolkit.
More
Jun 3, 2016
The pipeline for an analysis project can get complicated and
confusing, especially if you’re simulating your own data. I often
create pipelines with several different scripts in different
languages, but it’s easy to forget a step. But a couple of months ago,
I wrote myself a little Markdown file that looks something like this*:
More
Mar 31, 2016
Sometimes I spend significant time in R or Python trying to do
something which is trivial is bash. This is especially useful when I’m
working with very large files that will take a long time to read
in. Why read in an entire file to get the last line, when I could just
use tail -n 1
? Or if I want the line count, why read it in when wc
-l
will get the job done faster?
More
Mar 17, 2016
The Recurse Center puts out a quarterly
publication called Code Words, which publishes articles that try to
capture the fun of digging into
a problem and learning about programming. I wrote a piece on the
grammar of graphics, and how it can provide a language for exploring
and talking about data visualization. It uses examples from R’s
ggplot
package, but the ideas are more general. Check out my
article, and other great articles from RC alums, in the
sixth issue of Code Words!
More
Dec 10, 2015
As far as I can tell, the R community has
no generally-accepted style
guide. Google and
Hadley Wickham
both have style guides, but across and even within CRAN packages,
different naming and spacing conventions abound. You’re likely to find
variables named in camelCase, snake_case, or, interestingly, dot.case. This
last convention is unusual, because unlike many languages, R does
not enforce specific syntactic meaning for dots. Dots can denote
methods for S3 classes, but they don’t have to. This means that R only
cares about dots sometimes, with confusing results.
More
Sep 8, 2015
Last month I wrote a post on
the SQLalchemy engine and session. Now
I’m going to describe how you can set up a mapping for your schema so
that you can populate and query your database.
More
Aug 31, 2015
Today I was trying to solve a SQLalchemy bug in Python when I
discovered a strange behavior. I would set up logging to file, run
some SQLalchemy code, and look at the log. Then I would open the file in emacs and
clear out the log if there wasn’t anything interesting. But then, when
I ran more SQLalchemy, the logging wouldn’t work anymore. Python
didn’t throw an error, it just stopped logging altogether. Even if I
tried to reset my logging by setting up a new session, I couldn’t get
any more log statements without quitting and reopening ipython
altogether.
More
Aug 4, 2015
I’m spending the summer at the Recurse Center, where I’m
working with a group of other awesome programmers to learn and
self-study programming full-time for three months.
More
Aug 4, 2015
Optimization algorithms are one of those things that you might learn about
in an undergraduate CS class, then quickly forget. But if you need a
good answer to an computationally intensive problem, there’s really no
substitute for them. There are optimization algorithms with a strong
mathematical basis (such as gradient descent), but these are generally based on certain
assumptions of how the problem is defined and what your fitness
landscape is like. Heuristic algorithms (such as hill climbers,
simulated annealing, and evolutionary algorithms) make few assumptions
but no guarantees. They are fairly agnostic to the shape and structure of
your solution space and fitness function, but they make no promises
that you will ever find the best solution.
More