This past Saturday, I spent the day teaching about 60 women the basics of programming using the Processing language. It was a lot of fun and a bit nerve wracking since it was my first time.

Since that class I’ve been thinking that I could create a few code examples using Processing for some straightforward tasks might be helpful to people getting started with the Processing language and programming in general. The project below is the first of what I hope will become a series of (lets say 3-5?) and we’ll if they’re useful and used.

On to the Project!

This example project will show you how to do some basic parsing and cleaning of a text file, and then saving out the result.

The idea for this project came out of a problem I had at work where I needed to generate a clean list of names from a text file where each name was listed nine times in a row. Additionally, each line had some some unnecessary data at the end of it which I wanted to clean out.

I’ve created a text file to use with this example project. The file is called “MajorCities.txt”. There are 1000 city names, each listed 9 time in a row (9000 lines in total). A sample of the first few lines are below:

Source file, “MajorCities.txt”:
“TOKYO, Japan” - [425083]
“TOKYO, Japan” - [456198]
“TOKYO, Japan” - [504920]
“TOKYO, Japan” - [527316]
“TOKYO, Japan” - [541281]
“TOKYO, Japan” - [598284]
“TOKYO, Japan” - [560637]
“TOKYO, Japan” - [546684]
“TOKYO, Japan” - [576558]
“JAKARTA, Indonesia” - [244097]
“JAKARTA, Indonesia” - [286973]
“JAKARTA, Indonesia” - [317604]

This is what we are trying to create:

“TOKYO, Japan”
“JAKARTA, Indonesia”
“New York (NY), United States”

These are the steps involved to get from the source file to a cleaned up list:

  1. Read in a text file from your local hard drive.
  2. Select every ninth line so we have each city listed just once and put it into a new array.
  3. Clean out the extraneous information at the end of each city name.
  4. Save the cleaned version to a text file on your hard drive.

I’ve put a lot of comments into the code to explain what each section and line of code is doing.

The code is much more verbose than what I might normally write. This is done on purpose so that it is hopefully easier for beginners to follow along and understand.

So where’s the code?! You can find everything here on GitHub.

Alternatively, you can download the entire project and place it into your Processing sketchbook and then open it within Processing.
Project code on GitHub

Good luck!!

If you’ve made it this far then I hope this post has been helpful to you. Please let me know if you have any suggestions and / or found this useful.


Caveat

I’m aware that there are many choices for working with text that are more suitable than Processing (Perl, Sed/Awk…). However, this post is for people getting up and started with Processing.