Open Source Posts
Andrew Dunstan: Redis FDW Singleton Key tables
From Planet PostgreSQL. Published on May 25, 2013.
I recently mentioned the possibility of tables being drawn from a single Redis object, so you would get one row per key/value pair in the named hash, or one row per element in the named scalar, set, list or zset. This has now been committed, for use with Release 9.2 and later. There are examples in the regression test files. This is going to be particularly important when we get to writable tables, which is the next order of business.For those of you who were in my talk yesterday, the breakage I mentioned has now been fixed.
Hans-Juergen Schoenig: CREATE TABLE – the fancy way
From Planet PostgreSQL. Published on May 25, 2013.
One tiny little feature many users of PostgreSQL have often forgotten is the ability to create similar tables. It happens quite frequently that you want to create a table, which is just like some other one. To achieve that most people will do … CREATE TABLE x AS SELECT … LIMIT 0; This works nicely, [...]Ian Barwick: Custom Background Worker: a practical example
From Planet PostgreSQL. Published on May 24, 2013.
A while back I posted some SQL which helps track of changes to the PostgreSQL settings file. I've found it useful when benchmarking tests with different settings, but unfortunately the pg_settings_log() function needs to be run manually after each setting change. However that sounds like something which a custom background worker (new in 9.3) could handle - basically all the putative background worker would need to do is execute the pg_settings_log() function whenever the server starts (or restarts) or receives SIGHUP .
This turned out to be surprisingly easy to implement. Based off the example contrib module and Michael Paquier's excellent posts , this is the code . Basically all it does is check for the presence of the required database objects (a function and a table) on startup, executes pg_settings_log() on startup, and adds a signal handler for SIGHUP which also calls pg_settings_log() .
Andrew Dunstan: Redis talk slides
From Planet PostgreSQL. Published on May 24, 2013.
Here are the slides from my talk on PostgreSQL and Redis.Selena Deckelmann: The People of Postgres: Tom Lane
From Planet PostgreSQL. Published on May 24, 2013.
This post was originally posted on Medium, a new blogging platform made up mostly of people who aren’t necessarily subscribed to Planet. So, please forgive the obvious statements, as the target audience are people who don’t know very much about Postgres. 
Wednesday May 23, with no fanfare, Tom Lane’s move to Salesforce.com was made public on the Postgres developer wiki.
For 15 years, Tom has contributed code to Postgres, an advanced open source relational database that started development around the same time as MySQL but has lagged behind it in adoption amongst web developers. Tom’s move is part of a significant pattern of investment by large corporations in the future of Postgres.
For the past few years, Postgres development has accelerated. Built with developer addons in mind, things like PLV8 and an extensible replication system have held the interest of companies like NTT and captured the imagination of Heroku.
Tom has acted as a tireless sentry for this community. His role for many years, in addition to hacking on the most important core bits, was to defend quality and a “policy of least surprise” when implementing new features.
Development for this community is done primarily on a mailing list. Tom responds to so many contributor discussions that he’s been the top overall poster on those mailing lists since 2000, with over 85k messages.
Really, he’s a cultural touchstone for a community of developers that loves beautiful, correct code.
Someone asked: “What does [Tom’s move] mean for Postgres?”
You probably don’t remember this:
Salesforce.com bases its entire cloud on Oracle database,” Ellison said, “but its database platform offering is PostgreSQL. I find that interesting.
When I read that last October, I was filled with glee, quickly followed by terror. I love my small database community, my friends and my job. What if Oracle shifted it’s attention to our community and attacked it, directly? So far, that hasn’t happened.
Instead, Salesforce advertised they were hiring “5 new engineers…and 40 to 50 more people next year” for a “huge PostgreSQL project.”
Tom’s move probably won’t change much for the day-to-day operation of Postgres itself. Hopefully, things are about to get real at Salesforce.
I’m a major contributor to Postgres. I started in 2006, learning about relational databases through work at a small bike parts manufacturer and ERP. My contributions include code, starting conferences, encouraging user group leaders and introducing Postgres to communities that otherwise would never hear from us. I’m a data architect at Mozilla.
Robert Haas: Query Planning Gone Wrong
From Planet PostgreSQL. Published on May 23, 2013.
Over the past few years, I've been making notes on pgsql-performance postings, specifically those postings which relate to query performance issues. Today, I gave a talk at PGCon on the data I've been able to gather.If you attended the talk, please leave feedback through the PGCon web site or feel free to leave a comment below with your thoughts. If not, you can find the slides on my Presentations web page. A few people asked me to post the raw data on which the talk was based, including links to the original threads. I have created a Query Performance section on my Google Site and posted the information there.
The version posted on the web site incorporates a few minor corrections as compared to what I presented in the talk; and I have left out (for the sake of politeness) the cases I attributed to user error. There were actually only 2 such cases, not 3 as I said in the talk, but either way it seems more polite not to post specific links. Please contact me if you find other mistakes in what I have posted and I will correct them.
Many thanks to all those who said nice things about my talk!
Bruce Momjian: New Presentation Online
From Planet PostgreSQL. Published on May 23, 2013.
I delivered my presentation "Nulls Make Things Easier?" today at PGCon, so I have placed my slides online. The presentation is based on a series of eleven blog posts about NULLs I did a few months ago.
Greg Smith: Seeking Revisited: Intel 320 Series and NCQ
From Planet PostgreSQL. Published on May 23, 2013.
Running accurate database benchmark tests is hard. I’ve managed to publish a good number of them without being embarrassed by errors in procedure or results, but today I have a retraction to make. Last year I did a conference talk called “Seeking PostgreSQL” that focused on worst case situations for storage. And that, it turns out, had a giant error. The results for the Intel 320 Series SSD were much lower in some cases than they should have been, because the drive’s NCQ feature wasn’t working properly. When presenting this talk I had a few people push back that the results looked weird, and I was suspicious too. I have a correction to publish now, and I think the way this slipped by me is itself interesting. The full updated SeekingPostgres talk is also available, with all of the original graphs followed by an “Oops!” section showing the next data.
Native Command Queueing is an important optimization for seek heavy workloads. When trying to optimize work for a mechanical disk drive, it’s very important to know where the drive is currently at when deciding where to go next. If you have a read for that same area of the drive in the queue, you want to read that one now, get the I/O out of the way while you’re nearby, and then move to another physical area of the disk.
However, on a SSD, you might think that re-ordering commands isn’t that important. If reads are always inexpensive, taking a constant and small period of time on a flash device, their order doesn’t matter, right? Well, that’s wrong on a few counts. The idea that reads always take the same amount of time on SSD is a popular misconception. There’s a bit of uncertainty around what else is happening in the drive. Flash cells are made of blocks larger than a single database read. What happens if you are reading 8K of a cell that is being rewritten right now, because someone is updating another 8K section? Coordinating that is likely to pause your read for a moment. It doesn’t take much lag at SSD speeds to result in a noticable slowdown. Partially due to contention concerns, and partially due to nature of I/O, keeping the command queue full is still very important to keeping the drive usefully busy all of the time.
On the 120GB Intel 320 Series drive I used for testing, the drive tops out at around 28MB/s of transfers if you’re not pipelining requests via NCQ. It goes a whole lot faster than that once the queue is full:

You might think such a huge difference would be immediately obvious in all test results, right? It’s not though, and that’s how the error slipped by me. Normally all of my tests are done by two similar machines, and then I validate they match. I did that for some of the Seeking Postgres results, such as the write heavy tests. For comparison, here are results from database’s pgbench tool executing its standard, TPC-B-like write test:

The write rate test is barely impacted by whether NCQ is turned on or off, so it wasn’t obvious that one drive had the feature enabled while the other didn’t. I was using this to validate my test server was operating similar to a second system with one of these drives. But I picked the one test here where NCQ doesn’t really matter.
The general conclusion of the original presentation is that the Intel SSDs are much faster than regular disk, but still a good bit slower than the more expensive FusionIO flash. That I knew to be true from real-world workloads, so I’d have been surprised if things didn’t turn out that way. But it turns out that is true whether or not NCQ is working. The Intel 320 line in these results is better with NCQ than without, but the relative ranking isn’t any different now. It’s just the case that the Intel SSD is more competative in some tests than I gave it credit for.
The seeking read results show a much large gap with NCQ enabled:

You might notice a small drop in TPS on that brown line at low scales. That’s a test error I can’t correct for at this point. The original server I used for these tests was gone before I figured out what was wrong. The replacement has the same type of CPU chip, but it’s clocked a bit slower. (Was an Intel i7 870, now is an Intel i7 860) That’s why the CPU limited results at low scales dropped. On any of the I/O limited tests, that original CPU and the slower new one are almost identical, so I still think I’m being fair here.
Finally, I turned the random seek throughput into a business oriented question by asking how long it would take to refill all of RAM after something like a server reboot. My original test placed the Intel drive as taking 5 minutes to read 16GB of random data with 32 clients reading. This is exactly what NCQ helps with, and the correctly working drive only takes 1 minute to refill cache:

Thankfully, I don’t have to say I was completely wrong before. The relative ranking of the various storage options is still the same. The FusionIO drive I tested was and still is at the top of heap, especially if you need high write throughput. But the worst case for reading on the Intel 320 series drives (and the very similar 710 series) is much closer to specifications than my tests showed.
With this old territory sorted out, next up I’m testing Intel’s latest enterprise drive, the DC S3700, which replaces the 710 drives in their lineup. Initial test results look great so far; detailed ones are coming soon.
Andrew Dunstan: Blackhole FDW
From Planet PostgreSQL. Published on May 23, 2013.
My Blackhole FDW talk seemed to go well. The line about no unintentional data loss got the best laugh. Here are the slides.Besides being a bit of fun, this did have a serious purpose - creating a skeleton for building an FDW, including the writable API. The code has the contents of the docs on writing an FDW as comments in the appropriate functions, to help a new FDW writer.
The code is on bitbucket.
Pierre Ducroquet: Review – “Instant PostgreSQL Starter”
From Planet PostgreSQL. Published on May 23, 2013.
Thanks to Shaun M. Thomas, I have been offered a numeric copy of the “Instant PostgreSQL Backup” book from Packt publishing, and was provided with the “Instant PostgreSQL Starter” book to review. Considering my current work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one…
Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book.
Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need.
So, what’s inside ? I’d say what you need to kick-start with PostgreSQL.
The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book.
The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start…
The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never teach someone how to do recursive queries before teaching him joins, it’s like going from elementary school to university in fourty pages. But the positive part is that an open-minded and curious reader will have a great teaser and nice tracks to follow to increase his knowledge of PostgreSQL. Mentioning hstore is really cool, that’s one of the PostgreSQL feature one have to know…
To sum up my point of view about this book : it’s a nice book for beginners, especially considering the current NoSQL movement and people forgetting about SQL and databases. It’s a bit sad we don’t have more books like this one about PostgreSQL. I really hope Packt publishing will try to have a complete collection, from introduction (this book) to really advanced needs (PostgreSQL High Performance comes to mind) through advanced SQL queries, administration tips and so on… They have a book about PostgreSQL Server Programming planned next month, I’m really looking forward to this one.
Andrew Dunstan: Buildfarm download location
From Planet PostgreSQL. Published on May 23, 2013.
It was just pointed out to me that the download link on the buidfarm server front page wasn't updated when I fixed the other links after switching to publishing them on the buildfarm server itself. That's been fixed now. The only valid link for downloading the client is http://www.pgbuildfarm.org/downloads/. Sorry for any confusion.Andrew Dunstan: Developer meeting went well
From Planet PostgreSQL. Published on May 23, 2013.
There seems to be a consensus, which I share, that the annual PostgreSQL Developers Meeting went much better this year that in the previous couple of years.One item of note: the commit fest managers are going to be much more vigilant about making sure that if you have signed up for a review you will work on it right away, and about removing reviewers who are not producing reviews. So you will be able to have much more confidence that if someone is signed up as a reviewer for a patch they will actually be doing the work.
After the meeting and the obligatory visit to the Royal Oak, a number of us went out and had a pleasant Indian meal, and then I came back to the hotel, fixed a customer problem, and wrote up some slides for my proposed lightning talk. More on this later.
Now, on to the conference proper!
Heikki Linnakangas: pg_rewind, a tool for resynchronizing after failover
From Planet PostgreSQL. Published on May 23, 2013.
I’ve been hacking on a tool to allow resynchronizing an old master server after failover. Please take a look: https://github.com/vmware/pg_rewind.
Bruce Momjian: PgCon Developer Meeting Concluded
From Planet PostgreSQL. Published on May 22, 2013.
We just concluded the PgCon Developer Meeting. The two big items from me were that EnterpriseDB has dedicated staff to start work on parallelizing Postgres queries, particularly in-memory sorts. I have previously expressed the importance (and complexity) of parallelism. Second, EnterpriseDB has also dedicated staff to help improve Pgpool-II. Pgpool is the swiss army knife of replication tools, and I am hopeful that additional development work will further increase its popularity.
The Developer Meeting meeting notes (summary) has lots of additional information about the big things coming from everyone next year.
Michael Paquier: Postgres 9.3 feature highlight: new verbose error fields
From Planet PostgreSQL. Published on May 22, 2013.
PostgreSQL is already pretty useful for application developers when returning to client error messages by providing a certain level of details with multiple distinct fields like the position of the code where the error occurred. However this was lacking with the database object names, forcing the client application to deparse the error string returned by [...]THANK YOU!
By DjangoCon Europe from Planet Django. Published on May 22, 2013.

Wow, what an incredible week. We still can’t believe we actually pulled it off.
We spent the last year planning, organizing and working to deliver you the best possible experience. More importantly, we were having a lot of fun and we took a lot of risks that paid off. It’s been an amazing year and we will never forget this last week with you.
We owe a huge THANK YOU to everyone who helped us along the way and words can’t express how grateful we are. We couldn’t make it without you. Thank you for trusting our ideas and backing us so generously from the very beginning. We were just crazy kids without any background who wanted to do fun things for Django community and your trust gave us power to do the impossible.
You are DjangoCon. You made the last week unforgettable for us and everyone else. The conference would not even be near this level of awesomeness if it wasn’t for you. Thanks for being so awesome. We think that since last week, the word ‘awesome’ belongs to DjangoCon forever.
If you still can’t fight your post-DjangoCon-depression, make sure to remember Django Circus Story, watch videos and see all the photos. We will keep you posted with new fun videos this week and publish all the talks in 3 weeks on our Youtube channel. If you posted a review or blogpost about DjangoCon somewhere, do let us know about that!
Other than that, we still think we can do even better. If you want to see us raising the bar even higher, make sure to follow Makerland.
See you soon, somewhere!

South 0.8, Migrations and DjangoCon
By Andrew Godwin from Planet Django. Published on May 22, 2013.
A new release of an old friend, and more news on django.db.migrations.
I've wanted to get a new release of South out for ages, so I'm delighted that I've finally done so. South 0.8 is now available on PyPI - there's not a great many new changes, the most notable (and the reason for the major version bump) being Python 3 support.
Aymeric Augustin was instrumental in getting that support implemented, so I'd like to thank him for his work on it. On a related note, support for Python 2.5 is being dropped - if you still need that, you'll need to stick with the 0.7.x series.
The other notable change is support for index_together, one of the new improvements in Django 1.5 and something that should have been released a while ago. There's still no first-party support for AUTH_USER_MODEL - it'll work fine as long as you're not distributing third-party apps with migrations. The overall solution to that is something that will have to be implemented in the rewrites that are underway.
db.migrations
Those rewrites are coming along well, however. Last week I was at DjangoCon EU, in Warsaw, Poland, and I had a fantastic time, as you can read in my blog post about it. In particular, I had some good discussions with fellow core developers and Django and South users, to clear up some more thoughts I was having.
At the sprints, I got quite a bit more code implemented for db.migrations - as always, you can see the progress on my GitHub branch.
Most progress was on the "state" code and field freezing, so I'd like to discuss that.
State
The "state" part of db.migrations is the part which is responsible for the in-memory running of migrations to build correct versions of models.
In essence, it runs each of the actions in your migrations on fake versions of models (represented by a class called ModelState) in memory, and at the end it can then render those states into full models, to use for a data migration or pass to the schema migration functions.
The basic format is reasonably simple - there's just a class that represents a model, with attributes for all the things models can have, like their options (the things you put in Meta) and their name.
Fields, however, are more tricky. The problem South has faced since its inception is how you take a set of fields and serialise them - something that has finally been fixed.
The Good, The Bad and the Source Code Parser
You see, there's no way, given an instance of a Field, to tell how you reconstruct it. Sure, you can tell what class it is, and some values are obvious (like field.max_length), but getting the value that you passed in to a ForeignKey for its relationship is trickier.
The first versions of South solved this in a very simple way - they opened up your models.py file, read the source code, and chopped out the field definition using string manipulation. Needless to say, this was very fragile, and didn't work with any kind of conditional around fields.
The next (and currently shipping) approach was to inspect the fields' attributes using something called modelinspector. This was a built-in set of rules which South has about how to work out a fields arguments just by inspecting its attributes.
While this works well for core Django fields, there's no way of knowing how third-party fields work without shipping rules for them with South (which a few apps have) or by declaring them yourself when you declare the field.
The way these custom rules were declared was difficult to understand and not immediately obvious, and so there have been a lot of complaints with the current method about custom fields and South not really playing well together.
In particular, South wouldn't just accept a custom field even if it was a simple subclass - you had to tell South it was safe to use using a list of regular expressions on field path names. While it's worked till now, it's clearly not the best solution.
Introducting deconstruct()
The new solution is now in my branch - passing this responsibility onto the fields themselves. The API a field is required to provide has grown an additional function: deconstruct().
This function takes no arguments, and returns four arguments needed to recreate the field: its attribute name (what field name it was assigned to on the model), a path to import it from, positional arguments and keyword arguments.
The base implementation of this on Field is the most complex one and handles all the default arguments. New field classes will just need a simpler override, like the one for DecimalField, which adds on the new arguments.
I'll be writing up full documentation on this into the Django docs as part of my branch, but just keep in mind that all custom fields will need to provide this method soon, or they will not be usable with migrations. I plan to submit pull requests to a decent number of third-party apps that use custom fields with this method implemented for them, to help kickstart adoption.
Back to State
This all means that the state tracking can now work - it has methods to take either a model or a whole AppCache and turn it into a ModelState or ProjectState object, which can then rebuild models or AppCaches respectively.
This is what will power the autodetection - South will render the most recent version of the models it has, and compare them to the ones you currently have in your project. If there's any material database differences - a new field, a model has gone, db_table is changed - it will generate the appropriate migration.
Some changes don't affect the database, of course. verbose_name never touches the database, and much to people's surprise, neither does default - Django implements all defaults in Python rather than in the database, as otherwise there's no way to allow arbitary callables as a default value (something which is causing some pain doing serialising, let me tell you).
Context Managers
The other change that might effect users is that I've changed SchemaEditor to be a context manager, as suggested by a few people last week. That means that you now use it like so:
with connection.schema_editor() as editor:
editor.create_model(Foo)
editor.delete_model(Bar)
What's next?
Now that's all in place, the work of getting migrations to load from disk, create in-memory models and then run them through the schema editor is next - essentially, bringing together the past few weeks' work into a functioning whole.
Some of that code is already in place - a disk loader already reads classes from disk, and a recorder already has code to mark migrations as applied or not - but there's some more work in deciding the user interface for migrations in terms of commands.
Should the migrate command stay? Should it all be rolled into syncdb? Should they both go in favour of a third option? Some planning is needed. Any opinions are welcome, either via email or Twitter.
Hans-Juergen Schoenig: Kostal Pico to PostgreSQL
From Planet PostgreSQL. Published on May 22, 2013.
Everybody needs a little toy to play with, so I thought: Why not buy a toy helping me to get rid of my energy bill? So, I ordered a 10.5 kwp photovoltaic system for my house. The system was shipped with a Kostal Pico inverted to make sure electricity can be used directly by the [...]Jignesh Shah: How to do Postgres Replication and Failover in VMware vFabric Data Director 2.7?
From Planet PostgreSQL. Published on May 21, 2013.
Denish Patel: Inserting JSON data into Postgres using JDBC driver
From Planet PostgreSQL. Published on May 21, 2013.
One of the clients of OmniTI requested help to provide sample application to insert JSON data into Postgres using Java JDBC driver . I’m not Java expert so it took a while for me to write a simple java code to insert data. TBH, I took help to write test application from one of our Java engineers at OmniTI. Now, test application is ready and next step is to make it work with JSON datatype ! After struggling a little to find out work around for string escaping in JAVA code, I stumbled upon data type issue! Here is the test application code to connect to my local Postgres installation and insert JSON data into sample table:
postgres=# \d sample
Table "public.sample"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
data | json |
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
Something exploded running the insert: ERROR: column "data" is of type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 42
After some research , I found out that there is no standard JSON type on java side so adding support for json to postgres jdbc is not straight forward ! StackOverflow answer helped me for testing out the JSON datatype handling at psql level. As Craig mentioned in the answer that the correct way to solve this problem is to write a custom Java mapping type that uses the JDBC setObject method. This can be a tricky though. A simpler workaround is to tell PostgreSQL to cast implicitly from text to json:
postgres=# create cast (text as json) without function as implicit;
CREATE CAST
The WITHOUT FUNCTION clause is used because text and json have the same on-disk and in-memory representation, they’re basically just aliases for the same data type. AS IMPLICIT tells PostgreSQL it can convert without being explicitly told to, allowing things like this to work:
postgres=# prepare test(text) as insert into sample (data) values ($1);
PREPARE
postgres=# execute test('{}');
INSERT 0 1
postgres=# select data from sample;
data
----
{}
(1 row)
Awesome ! That worked
Let’s try similar approach in Java application code.
denishs-MacBook-Air-2:java denish$ export CLASSPATH=/usr/share/postgresql/java/postgresql-9.2-1002.jdbc4.jar:
denishs-MacBook-Air-2:java denish$ javac -classpath $CLASSPATH PgJSONExample.java
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
postgres=# select * from sample;
id | data
----+------------------------------------------------------------------------
1 | {"username":"denish","posts":10122,"emailaddress":"denish@omniti.com"}
(1 row)
Yay! It worked as well
Next in my list to figure out installing PL/Java on Mac and/or Linux !! Let me know, if you have instructions for installation and test application using PL/Java.
Leo Hsu and Regina Obe: KNN GIST with a Lateral twist: Coming soon to a database near you
From Planet PostgreSQL. Published on May 21, 2013.
One of the things that really frustrated me about the KNN GIST distance box box centroid operators that came in PostgreSQL 9.1 and PostGIS 2.0 was the fact that one of the elements needed to be constant to take advantage of the index. In PostGIS speak, this meant you couldn't put it in the FROM clause and could only enjoy it in one of two ways.
Continue reading "KNN GIST with a Lateral twist: Coming soon to a database near you"
Ian Barwick: "Instant PostgreSQL Starter" review
From Planet PostgreSQL. Published on May 21, 2013.
Having recently posted some thoughts on Shaun Thomas' " "PostgreSQL Backup and Restore How-to" review ", Packt asked me if I'd like to review the new " Instant PostgreSQL Starter " by Daniel K. Lyons and kindly provided me with access to the ebook version. As I'm happily in a situation where I may need to introduce PostgreSQL to new users, I was interested in taking a look and here's a quick overview.
It follows the same "Instant" format as the backup booklet, which I quite like as it provides a useful way of focussing on particular aspects of PostgreSQL without being bogged down in reams of tl;dr documentation. " Instant Pg Starter " is divided into three sections:
Installation Quick start – creating your first table Top 9 features you need to know aboutShaun M. Thomas: Winning (Free eBooks) is Everything
From Planet PostgreSQL. Published on May 21, 2013.
It occurs to me I forgot to congratulate the winners of the free ebooks. So without further adieu:
- SAB, who seems to host a nice blog geared toward server administration.
- Stephan, who’s looking to improve existing strategies.
- Jeff and his growing PostgreSQL cluster.
- Pierre, who apparently has an experimental PostgreSQL backend for MySQL. Interesting.
Congrats to the winners. But more, I call upon them to pay it forward by contributing to the community, either by corresponding with the excellent PostgreSQL mailing lists, or maybe submitting a patch or two to the code. There’s a lot of ground to cover, and more warm bodies always helps.
Thanks again, everyone!
Josh Berkus: PostgreSQL New Development Priorities 5: New User Experience
From Planet PostgreSQL. Published on May 21, 2013.
So, I started this looking for our five major goals for future PostgreSQL develoment. The last goal is more nebulous, but I think equally important with the other goals. It's this: improve the "new user experience".This is not a new goal, in some ways. Improving installation, one of our previous 5 goals, was really about improving the experience for new users. But the new user experience goes beyond installation now, and competition has "raised the bar". That is, we matched MySQL, but now that's not good enough; we need to match the new databases. It should be as easy to get started on a dev database with PostgreSQL as it is with, for example, Redis. Let me give you a summary of the steps to get up, running, and developing an application in the two platforms:
Redis:
- install Redis, either from packages or multiplatform binaries. No root access is required for the binaries.
- read a 1-page tutorial
- run redis-server
- run redis-cli or install drivers for your programming language
- start developing
- when your app works, deploy to production
- in production, tune how much RAM Redis gets.
- install PostgreSQL from packages or the one-click installer. Root/Admin access is usually required.
- search the documentation to figure out how to get started.
- figure out whether or not your packages automatically start Postgres. If not, figure out how to start it. This may require root access.
- Install drivers for your programming language.
- Figure out how to connect to PostgreSQL. This may require making changes to configuration files.
- Read more pages of documentation to learn the basics of PostgreSQL's variety of SQL, or how to program an ORM which works with PostgreSQL.
- Start developing.
- Deploy to production.
- Read 20 pages of documentation, plus numerous blogs, wiki pages and online presentations in order to figure out how to tune PostgreSQL.
- Tune PostgreSQL for production workload. Be unsure if you've done it right.
So, what can we do about it? Well, a few things:
- better new user tutorials, such as the ones on postgresguide.org
- better autotuning, made a lot easier to implement as of version 9.3.
- a "developer mode PostgreSQL"
Those are the five things I can see which would greatly expand the market for PostgreSQL and keep us competitive against the new databases. Yes, I'm talking really big features, but any two out of the five would still make a big difference for us. There may be others; now that you've seen the kind of big feature I'm talking about, put your suggestions below.
Transactions for web developers - Aymeric Augustin
By Reinout van Rees from Planet Django. Published on May 20, 2013.
Initially he didn't know a lot about transactions, so he researched them in depth. A quote by Christophe Pettus: "transaction management tools are often made to seem like a black art".
He moves from the database (postgres and sqlite) to the interface (psycopg2 and sqlite3) to the framework (django).
Database
A definition: an SQL transaction is a sequence of SQL statements that is atomic with respect to recovery. In SQL 92, a transaction begins with a transaction-initiating statement (almost everything can start a transaction) and it ends with a commit, an explicit rollback (ROLLBACK) or an implicit rollback.
SQL 1999 changed this a bit. It has savepoints. After a savepoint, you can rollback to that savepoint, to a previous savepoint or you can set a new savepoint. Oh, and there is an explicit transaction start statement (START TRANSACTION).
Key findings:
- Statements always run in transactions.
- Transactions are opened automatically.
- Transactions are advanced technology.
Remember the dreaded "current transaction is aborted, commands ignored until end of transaction block" postgresql fault? What it actually means is "a previous statement failed, the application must perform a rollback". You cannot let postgres do any auto-recovery, that would break transactional integrity. It is your application that needs to do it (and it should always do it).
(I didn't hear what the actual solution is). Update: Diederik says in his comment that the solution is to just switch on autocommit for postgres in the database settings.
There's also AUTOCOMMIT. Most databases default to it. It commits every single statement automatically. Normally, you are either in auto-commit mode or inside transactions.
Interface: the python client libraries
Psycopg2 and sqlite3 are wrappers around C libraries. They use the DB API 2.0, PEP 249. It defines connections and cursors. Connections implement transactions, cursors do fetching and setting.
Note: the PEP wants the auto-commit to be off, initially!
Psycopg2 handles it by inserting a BEGIN before every statement, unless there's already a transaction in progress. Even for SELECTs.
Sqlite3 also inserts BEGIN, but not for a SELECT. All other statements get a COMMIT. Even a statement like SAVEPOINT: this is broken by design ("documentation issue").
Key findings:
- The DB API requires the same transactional behaviour as the SQL standard.
- Client libraries for databases that always autocommit have to emulate this behaviour.
- But you can turn it off and use autocommit
Framework (django)
Django 1.5 and earlier runs with an open transaction. For updates/deletes/saves, django does a commit. More or less auto-commit.
There's transaction middleware. One http request = one transaction. Commit on success, roll back on exception. It only works for the default database, though. And depending on the order of your middleware, it may or may not apply.
Django provides a couple of high-level APIs. with transaction.autocommit():, with transaction.commit_on_success():, with transaction.commit_manually():. There is also a low-level API for doing stuff manually.
Key findings:
- OK to forget it, it will change in 1.6.
- The middleware is a reasonable idea.
- The decorators/context managers don't work well, they often cannot be nested.
Django 1.6 uses database-level autocommit, which is what you'd normally expect. There are atomic transactions for requests: only for the view functions. Again. one transaction per http requests.
The high level API is now called atomic. Usable as a decorator and as a context manager. It can be safely nested.
Key learnings:
- django 1.6 will have sane transaction-related functionality.
- Read the documentation at https://docs.djangoproject.com/en/dev/topics/db/transactions/
ERROR: Timed out waiting for data to be extracted. If the problem persists, try simplifying your search patterns.
By Steve Schwarz from Planet Django. Published on May 20, 2013.
Bruce Momjian: Video Interview
From Planet PostgreSQL. Published on May 20, 2013.
I did a video interview while I was at ConFoo, and it is now online.
Josh Berkus: PostgreSQL New Development Priorities 4: Parallel Query
From Planet PostgreSQL. Published on May 20, 2013.
Parallel query is the first priority from those suggested in the comments that I agree should be a major PostgreSQL development priority. I think that Joel Jacobson summarized it neatly: Bring Back Moore's Law. Vertical scaling has always been one of PostgreSQL's strengths, but we're running into hard limits as servers are getting more cores but not faster cores. We need to be able to use a server's full CPU capacity.(note: this series of articles is my personal opinion as a PostgreSQL core team member)
The benefits to having some kind of parallel query are obvious to most users and developers today. Mostly, people tend to think of analytics and parallel query across terabyte-sized tables, and that's definitely one of the reasons we need parallel query. But possibly a stronger reason, which isn't much talked about, is CPU-heavy extensions -- chief among them, PostGIS. All of those spatial queries are very processor-heavy; a location search takes a lot of math, a spatial JOIN more so. While most users of large databases would like parallel query in order to do things a bit faster, PostGIS users need parallism yesterday.
Fortunately, work on parallelism has already started. Even more fortunately, parallel query isn't a single monumental thing which has to be done as one big chunk; we can add parallelism piecemeal over the next few versions of Postgres. Rougly, parallel query breaks down into parallelizing all of the following operations:
- Table scan
- Index scan
- Bitmap scan
- In-memory sort
- On-disk sort
- Hashing
- Merge Join
- Nested loop join
- Aggregation
- Framework for parallel functions
Most of these features can be worked on independently, in any order -- dare I say, developed in parallel? Joins probably need to be done after sorts and scans, but that's pretty much it. Noah Misch has chosen to start with parallel in-memory sort, so you can probably expect that for version 9.4.
Selena Deckelmann: Distributed databases: a series of posts including 2-phase commit in Postgres
From Planet PostgreSQL. Published on May 20, 2013.
There’s a fantastic set of blog posts about distributed databases and network partitioning, starting with this post explaining the perils of trying to “communicate with someone who doesn’t know you’re alive.”
The next post is about Postgres and 2-phase commit. And there are four additional posts in the series.
The whole series worth reading for anyone interested in data stores, consistency and Postgres!
Why I left Heroku, and notes on my new AWS setup
By Adrian Holovaty from Planet Django. Published on May 20, 2013.
On Friday, we migrated Soundslice from Heroku to direct use of Amazon Web Services (AWS). I'm very, very happy with this change and want to spread the word about how we did it and why you should consider it if you're in a similar position.
My Heroku experience
Soundslice had been on Heroku since the site launched in November 2012. I decided to use it for a few reasons:
- Being a sysadmin is not my thing. I don't enjoy it, and I'm not particularly good at it.
- Soundslice is a two-man operation (developer and designer), and my time is much better spent working on the product than doing sysadmin work.
- Heroku had the promise of easy setup and easy scaling in cases of high traffic.
While I was getting Soundslice up and running on Heroku, I ran into problems immediately. For one, their automatic detection of Python/Django didn't work. I had to rejigger my code four or five times ("Should settings.py go in this directory? In a subdirectory? In a sub-subdirectory?") in order for it to pick up my app -- and this auto-detection stuff is the kind of thing that's very hard to debug.
Then I spent a full day and a half (!) trying to get Django error emails working. I verified that the server could send email, and all the necessary code worked from the Python shell, but errors just didn't get emailed out from the app for some reason. I never did figure out the problem -- I ended up punting and using Sentry/Raven (highly recommended).
These experiences, along with a few other oddities, made me weary of Heroku, but I kept with it.
To its credit, Heroku handled the Soundslice launch well, with no issues -- and using heroku:ps scale from the command line was super cool. In December, Soundslice made it to the Reddit homepage and 350,000 people visited the site in a period of a few hours. Heroku handled it nicely, after I scaled up the number of dynos.
But over the next few months, I got burned a few more times.
First, in January, they broke deployment. Whenever I tried to deploy, I got ugly error messages. I ended up routing around their bug by installing a different "buildpack" thanks to a tip from Jacob, but this left a sour taste in my mouth.
Then, one April evening, I deployed my app, and Heroku decided to upgrade the Python version during the deploy, from 2.7.3 to 2.7.4. (In itself, that's vaguely upsetting, as I didn't request an upgrade. But my app code worked just as well on the new version, so I was OK with it.) When the deployment was done, my site was completely down -- a HARD failure with a very ugly Heroku error message being shown to my users. I had no idea what happened. I raced through my recent commits, looking for problems. I looked at the Heroku log output, and it just said some stuff about my "soundslice" package not being found. I ran the site locally to make sure it was working. It was working fine. I had deployed successfully earlier in the day, and I had made no fundamental changes to package layout.
After several minutes of this futzing around, with the site being completely down, after I had just sent the link to some potential partners who, for all I know, were evaluating the site that very moment -- I deployed again and the site worked again. So it was nothing on my end. Clearly just something busted with the Heroku deployment process.
That's when Heroku lost my trust. From then on, whenever I deployed, I got a little nervous that something bad would happen, out of my control.
Around the same time, Soundslice began using some Python modules with compiled C extensions and other various non-Python code that was not deployable on Heroku with their standard requirements.txt process. Heroku offers a way to compile and package binaries, which I used successfully, but it was more work using that proprietary process than running a simple apt-get command on a server I had root access to.
With all of this, I decided it was time to leave Heroku. I'm still using Heroku for this blog, and I might use it in the future for small/throwaway projects, but I personally wouldn't recommend using it for anything more substantial. Especially now that I know how easy it is to get a powerful AWS stack running.
My AWS setup
I'm lucky to be friends with Scott VanDenPlas, who was director of dev ops for the Obama reelection tech team -- you know, the one that got a ton of attention for being awesome. Scott helped me set up a fantastic infrastructure for Soundslice on AWS. Despite having used Amazon S3 and EC2 a fair amount over the years, I had no idea how powerful Amazon's full suite of services really were until Scott showed me. Unsolicited advertisement: You should definitely hire Scott if you need any AWS work done. He's one of the very best.
The way we set up Soundslice is relatively simple. We made a custom AMI with our code/dependencies, then set up an Elastic Load Balancer with auto-scaling rules that instantiate app servers from that AMI based on load. I also converted the app to use MySQL. In detail:
Step 1: "Bake" an AMI. I grabbed an existing vanilla Ubuntu AMI (basically a frozen image of a Linux box) and installed the various packages Soundslice needs with apt-get and pip. I also compiled a few bits of code I needed that aren't in apt-get, and I got our app's code on there by cloning our Git repository. After that instance had all my code/dependencies on it, I created an AMI from it ("Create Image (EBS AMI)" in the EC2 dashboard).
Step 2: Set up auto-scaling rules. This is the real magic. We configured a load balancer (using Amazon ELB) to automatically spawn app servers based on load. This involves setting up things called "Launch configurations" and "scaling policies" and "metric alarms." Check out my Python code here to see the details. Basically, Amazon constantly monitors the app servers, and if any of them reaches a certain CPU usage, Amazon will automatically launch X new server(s) and associate them with the load balancer when they're up and running. Same thing applies if traffic levels go down and you need to terminate an instance or two. It's awesome.
Step 3: Change app not to use shared cache. Up until the AWS migration, Soundslice used memcache for Django session data. This introduces a few wrinkles in an auto-scaled environment, because it means each server needs access to a common memcache instance. Rather than have to deal with that, I changed the app to use cookie-based sessions, so that session data is stored in signed cookies rather than in memcache. This way, the web app servers don't need to share any state (other than the database). Plus it's faster for end users because the app doesn't have to hit memcache for session data.
Step 4: Migrate to MySQL. Eeeek, I know. I have been a die-hard PostgreSQL fan since Frank Wiles showed me the light circa 2003. But the only way to use Postgres on AWS is to do the maintenance/scaling yourself...and my distaste for doing sysadmin work is greater than my distate for MySQL. :-) Amazon offers RDS, which is basically hosted MySQL, with point-and-click replication. I fell in love with it the moment I scaled it from one to two availability zones with a couple of clicks on the AWS admin console. The simplicity is amazing.
Step 5: Add nice API with Fabric. Deployment was stupidly simple with Heroku, but it's easy to make it equally simple using a custom AWS environment -- I just had to do some upfront work by writing Fabric tasks. The key is, because you don't know how many servers you have at a given moment, or what their host names are, you query the Amazon API (using the excellent boto library) to get the hostnames dynamically. See here for the relevant parts of my fabfile.
Ongoing: Update AMI as needed. Whenever there's a new bit of code that my app needs -- say, a new apt-get package -- I make a one-off instance of the AMI, install the package, then freeze it as a new AMI. Then I associated the load balancer with the new AMI, and each new app server from then on will use the new AMI. I can force existing instances to use the new AMI by simply terminating them in the Amazon console; the load balancer will detect that they're terminated and, based on the scaling rules, will bring up a new instance with the new AMI.
Another approach would be to use Chef or Puppet to automatically install the necessary packages on each new server at instantiation time, instead of "baking" the packages into the AMI itself. We opted not to do this, because it was unnecessary complexity. The app is simple enough that the baked-AMI approach works nicely.
Put all this together, and you have a very powerful setup that I would argue is just as easy to use as Heroku (once it's set up!), with the full power of root access on your boxes, the ability to install whatever you want, set your scaling rules, etc. Try it!
A Rhapsody In Warsaw
By Andrew Godwin from Planet Django. Published on May 20, 2013.
A field, a tent, and a large amount of Polish food - the makings of a great conference.
DjangoCon and I have a long history. The very first DjangoCon, back in 2008, was also my very first conference - and I've achieved the slightly dubious honour of having attended every single one.
They are not, of course, the only conferences that I go to; these days I try to speak at a variety of events. I've seen a lot of venues and they're all variations on a theme. That theme, of course, is large rooms full of chairs.
DjangoCon EU 2013, hosted last week in Warsaw, bucked that trend and was probably the best yet - and that's not something I say lightly. Ola Sitarska and the rest of her team went for an inspired gamble that really paid off.

The stage, and Craig Kerstiens. From flickr.com/photos/patrick91
When I first heard of the plans to host this year's DjangoCon EU in a circus tent, I was a little sceptical - after all, conference venues have evolved over many decades to serve the many needs of a large-scale event. Seating, airflow, power, networking, A/V, catering and toilets are all needs of a modern conference population.
The end result, however, was impressive. The circus tent had been outfitted with power, WiFi, lighting, a stage, projectors, audio and even chillers full of drinks, and beats many indoor venues I've spoken in. One entire side of the tent was open to the outside, providing easy airflow and access without making the inside too bright.
There were a couple of small niggles - the flight path of the nearby airport had changed the week before, meaning planes would occasionally interrupt talks, there wasn't quite enough toilet capacity at peak times, and WiFi was sluggish - which is normal for tech conferences. These were all outweighed by the positives, though - and what positives there were.
One of my tweets describes the conference as "like a music festival", which gives some idea of the wonderful attitude everyone had. Hammocks, deckchairs and bean bags were some of the seating options, there was a plentiful supply of free popcorn, and in between talks you could wander down to the fountain, relax on picnic blankets under the trees or dip into an entire freezer of ice cream.
Everyone was in a very good mood, and very relaxed. DjangoCon has always gone for a laid back approach, and here it worked incredibly well. I mostly come to DjangoCon to socialise and meet new people, rather than to learn from the talks, and in that environment it worked very well.

Danny's handstand lessons are almost a DjangoCon staple. From flickr.com/photos/patrick91
I'd like to highlight the catering in particular - it makes such a difference to have snacks and drinks available throughout the day rather than at set times. It was possible at any point in the afternoon to go and get some ice cream, fruit juice or even one of the sandwiches from breakfast.
Not only that, but the catering during the sprints wasn't the usual case of just ordering pizza or sandwiches for everyone - there were proper hot meals, with desserts. Portion size suffered a little since the sprints were so popular, but it was still very tasty, and I'm pleased to see healthier food at a sprint event.
As a speaker, the slick execution of this conference began before I even arrived. As DjangoCon is a community-run conference, staffed entirely by volunteers, speakers are generally expected to pay their own way, sometimes including some of the ticket price. This time, however, not only was admission free but the organisers picked us up from the airport and sorted out the hotel.
Of course, it's not that unusual for a conference to do this at all - I've experienced it many times before - but to do it while still keeping the prices low was impressive, and Ola and her team did it well, keeping us up to date and taking feedback into account very quickly.
One thing I wish more conferences did for their speakers is providing a local SIM card with data - this is especially useful in non-English-speaking countries, where getting one can be tricky. DjangoCon provided one right in the welcome basket, and I used it all week - a data connection is invaluable for navigating a foreign city.

Approaching the circus through the trees
I continue to be really impressed by the way the Django community evolves each year. DjangoCon EU itself gets more impressive every year - and previous years have set a high standard indeed.
Conferences are a very important part of bringing a community together and fostering the cohesion that really helps keep a project like Django going. I'm so glad that DjangoCon exists, and that each one helps push forward projects both old and new. It's somewhat unfortunate that the tech scene is mostly governed by who you know, rather than what, but events like these offer a way to improve both at once.
I'm also amazed how each year another group of volunteers tirelessly steps up to host, and this year was no different - a team of French volunteers stepped up to host it in the French Rivera next year. I can't wait - I wish them the best of luck.
I have many friends who I only ever see at conferences, and so the return of conference season each year is always a delight. It's a privilege to be able to attend and enjoy all these events each year, and to the organisers - not only of DjangoCon EU but all similar events - I'd like to say one thing: dziękuję!
DjangoCon EU 2013
By Horst Gutmann from Planet Django. Published on May 19, 2013.
DjangoCon EU has always been something special. The very first European DjangoCon back in 2009 in Prague was a great start and over the years with Berlin, Amsterstam and Zurich in between in grew and grew and got better and better. This year, DjangoCon made a stop in Warsaw and, to be honest, it was the greatest conference I've ever been to.
Back when the first plans for this year's event started to leak I was all "what on Earth???", but the local team and all the helpers including Ola Sitarska, Ola Sendecka, Kuba Kucharski, Tomek Paczkowski and Jarek Piotrowski made it happen: A tech conference in a park, in a tent, that worked!!!
The conference
Over the years, conferences did evolved into those perfectly organized events where usually only one thing breaks: the WiFi. They usually take place in perfectly AC'd hotels or conference centers and depending on the price you pay you get served hot and cold beverages by people in black & white & bow-ties. Well, DjangoCon EU has always been different being a conference by the community for the community, but this year was like off a whole different planet with the organizers taking a huge risk and it working out perfectly.
The DjangoCon Circus in all its glory!
The tent is probably the most risky venue you can have. If the weather changes and you end up getting hit by rain from all directions, your conference venue might become a nightmare (depending naturally also to some degree on your attendees). But the exact opposite happened last week: Summer!
All week was perfect summer weather. Naturally, the tent became slighted heated up but still quite bearable and there was ice cream at the buffet whenever you wanted some :D
Then there were the talks. My main objectives when going to conferences are to meet up with people I haven't seen for a while and getting new ideas. No point in talking about the fun-part since ... heck, we were in a tent in a park in Warsaw! And right on the first day I moved PyGraz.org from being managed by supervisord to circus after seeing Tarek Ziadé's talk about it. So much for new ideas ;-)
Thematically probably the biggest theme here were databases, though, with at least 5 talks being about how to store your data (mostly in PostgreSQL) the right way:
- Advanced PostgreSQL in Django
- Migrating the Future
- Taming multiple databases with Django
- Getting past the Django ORM limitations with Postgres
- Enterprise Django: transactions for web developers
Good stuff! :-)
So, the conference was awesome, but so was the after-show party: There was a large grill with exceptionally long queues for food and beer. Sadly the local fauna took that as an opportunity to assault us with all the mosquitos it had. Seems like for most this wasn't really a problem but Ulrich and myself were sadly hit rather hard so we left early while the other still had a blast into the morning. But at least I still saw Rob Spectre making the crowd go wild ;-)
Rob Spectre doing a live performance ;-)
The sprints
After everyone had gotten out of the bed on Saturday, the sprint was the next point on the agenda. I guess, more than 200 people attending a Django sprint will gets its own big entry in the Django history-book. Venue in the heart of Warsaw at the HardGamma offices, enough power and WiFi for everyone and despite being nearly twice as many people as anticipated there was also enough food for everyone!
Big thanks to the organizers for such an awesome week! And see you all again next year in France!
Travel appendix
- Get a Play prepaid data SIM card. 20 PLN for 1GB is a steal!
- For 30 PLN per day for a room for two the Hostel Słuzewiec is quite nice.
- OMG, the burger at Meat Love!
- ... while the burger at the Hard Rock Café is overrated ;-)
- Besides TripAdvisor FourSquare is surprisingly useful to find good food in a city you don't know :-)
Hubert 'depesz' Lubaczewski: Explaining the unexplainable – part 4
From Planet PostgreSQL. Published on May 19, 2013.
In this, hopefully 2nd to last, post in the series, I will cover the rest of usually happening operations that you can see in your explain outputs. Unique Name seems to be clear about what's going on here – it removes duplicate data. This might happen, for example, when you're doing: SELECT DISTINCT FIELD FROM [...]Gabriele Bartolini: Configuring retention policies in Barman
From Planet PostgreSQL. Published on May 19, 2013.
In our previous article we went through describing what retention policies are and how they can be enforced on your PostgreSQL server backups with Barman 1.2. In this post, we will go through the configuration aspects.
For the sake of simplicity, we assume a typical scenario which involves taking full backups once a week through the “barman backup” command. Suppose you want to automatically keep the latest 4 backups and let Barman automatically delete the old ones (obsolete).
The main configuration option for retention policies in Barman is “retention_policy” which can be defined both at global or server level. If you want all your servers by default to keep the last 4 periodical backups, you need to add in the general section of Barman’s configuration file the following line:
[barman] ... // General settings retention_policy: REDUNDANCY 4
When the next “barman cron” command is executed (every minute if you installed Barman using RPMs or Debian/Ubuntu packages), Barman checks for the number of available full periodical backups for every server, order them in descending chronological order (from the most recent to the oldest one) and deletes backups from the 5th position onwards.
In case you have several servers backed up on the same Barman host and you want to differentiate the retention policy for a specific server, you can simply edit that server configuration section (or file, see “Managing the backup of several PostgreSQL servers with Barman“) and define a different setting:
[malcolm] description = Malcolm Rocks ssh_command = ssh malcolm conninfo = host=malcolm port=5432 user=postgres dbname=postgres retention_policy: REDUNDANCY 8
However, Barman allows systems administrators to manage retention policies based on time, in terms of recovery window and point of recoverability. For example, you can set another server to allow to recover at any point in time in the last 3 months:
[angus] description = Angus Rocks ssh_command = ssh angus conninfo = host=angus port=5432 user=postgres dbname=postgres retention_policy: RECOVERY WINDOW OF 3 MONTHS
Make sure you have enough space on the disk to store all the WAL files for every server you back up, and always monitor “barman check” through your alerting tools (such as Nagios/Icinga/Zabbix/etc.).
Current implementation of retention policies in Barman has some limitations: retention policies are managed only automatically (not manually – this would require to create a “barman delete –obsolete” command, for example) and there is no decoupling yet between full backups and WAL archive transactional logs (we have already thought of the “wal_retention_policy” option, but at the moment it is not handled).
More detailed information on retention policies can be found on Barman’s documentation website.
Josh Berkus: PostgreSQL New Development Priorities 3: Pluggable Parser
From Planet PostgreSQL. Published on May 18, 2013.
Really, when you look at the long-term viability of any platform, pluggability is where it's at. A lot of the success of PostgreSQL to date has been built on extensions and portability, just as the success of AWS has been built on their comprehensive API. Our future success will be built on taking pluggability even further.In addition to pluggable storage, a second thing we really need is a pluggable parser interface. That is, it should be possible to generate a statement structure, in binary form, and hand that off to libpq for execution. There was recently some discussion about this on -hackers.
If there were a way to hand off expression trees directly to the planner, then this would allow creating extensions which actually had additional syntax, without having to fork PostgreSQL. This would support most of those "compatibility" extensions, as well as potentially allowing extensions like SKYLINE OF which change SQL behavior.
It would also help support PostgreSQL-based clustered databases, by allowing all of the parsing for a particular client to happen on a remote node and get passed to the clustered backends. The pgPool2 project has asked for this for several years for that reason.
More intriguingly, it would allow for potentially creating an "ORM" which doesn't have to serialize everything to SQL, but can instead build expression trees directly based on client code. This would both improve response times, and encourage developers to use a lot of PostgreSQL's more sophisticated features since they could access them directly in their code.
Taking things a step further, we could extend this to allow users to hand a plan tree directly to the executor. This would fix things for all of the users who actually need query hints (as opposed to those who think they need them), as well as taking efficiency a step beyond cached plans.
There are a lot of reasons this would be just as difficult to do as pluggable storage. Currently parsing depends on a context-dependant knowledge of system catalogs, including things like search_path. So I have no idea what it would even look like. But a parser API is something that people who hack on Postgres and fork it will continue to ask for.
Announcement
By DjangoCon Europe from Planet Django. Published on May 18, 2013.
It has been brought to our attention that there may have been a violation of the code of conduct at the DjangoCon speakers dinner. We have started an investigation of the incident. If our investigation reveals that a violation did occurr, we will make a further announcement regarding the action we will take.
Selena Deckelmann: Migrations with Alembic: a lightspeed tour
From Planet PostgreSQL. Published on May 17, 2013.
I’ve got a Beer & Tell to give about alembic. Alembic is a migration tool that works with SQLAlchemy. I’m using it for database migrations with PostgreSQL.
So, here’s what I want to say today:
- Written by SQLAlchemy wiz Mike Bayer
- Here’s the tutorial. Socorro is now using alembic in production with SQLAlchemy 0.6.x. I’m hoping to get us upgraded to 0.8.x soon.
- Here’s what running an upgrade in production for Socorro looks like. Awesome right?
- Here’s what a migration looks like.
- Here’s a configuration file.
- Generating a migration from the command line might look something like:
alembic revision -m "bug XXXXXX Add a new table" --autogenerate
The most difficult thing to deal with so far are the many User Defined Functions that we use in Socorro. This isn’t something that any migration tools I tested deal well with.
Happy to answer questions! And I’ll see about making a longer talk about this transition soon.
Joe Abbate: Pyrseas contributions solicited
From Planet PostgreSQL. Published on May 17, 2013.
Do you use PostgreSQL and truly believe it’s “the world’s most advanced open source database” and that its upcoming 9.3 release will make it even more awesome?
Do you also use Python and believe it’s “an easy to learn, powerful programming language” with “elegant syntax” that makes it an ideal language for developing applications and tools around PostgreSQL, such as Pyrseas?
Then we could use your help. For starters, we want to add support for the MATERIALIZED VIEWs and EVENT TRIGGERs coming up in PG 9.3.
We have also been requested to add the capability to load and maintain “static data” (relatively small, unchanging tables) as part of yamltodb, so that it can be integrated more easily into database version control workflows.
And for the next release, Pyrseas 0.7, we’d like to include the first version of the database augmentation tool which will support declarative implementation of business logic in the database–starting off with audit trail columns. Some work has been done on this already, but it needs integration with the current code and tests.
Or perhaps coding is not your forte, but you’re really good at explaining and documenting technical “stuff”. Then you could give us a hand with revamping the docs, maybe writing a tutorial so that users have a smooth ride using our tools.
Or maybe you have your own ideas as to how improve the PostgreSQL version control experience. We’d love to hear those too.
If you’d like to help, you can fork the code on GitHub, join the mailing list and introduce yourself, or leave a comment below.
Filed under: PostgreSQL, Python, Version control
Lightning talks day 3 - Djangocon.eu
By Reinout van Rees from Planet Django. Published on May 17, 2013.
html5lib
Browsers are terribly forgiving. Python's parsers don't deal with everything, even valid html5 docs. html5lib was a problem. Google code and so and not python 3 compatible.
The new html5lib supports python 3. Github, readthedocs, works fine!
Real time web - Aymeric Augustin
He looked at web sockets in django. He played with tulip, Guido's library for async python. He had 1000 processes calculating a 'game of life' screen and django connected with them just fine and pushed the result to the browser.
PyWaw
PyWaw is a python community in Warsaw. They have now had 24 meetings with about 55 attendees. At the last meeting they even had 100 people attending.
They are not alone in Poland, there are other user groups.
So... go back to your cities and start user groups!
Scrapy
Screen scraping is when you need to get structured information from the web, quickly and with no hassle.
Scrapy takes the hassle out of screen scaping. It takes away the pain of parsing horrible html.
It has perfect documentation and a helpful community.
You can even scrape from amazon, even including logging in.
What can you do? Convert SVG to VML. Stock checker for a market place. Testing your own website.
Motivating users - Aaron Bassett
How to motivate kids aged 7-17 to learn online. Don't give rewards. If you give rewards, that means that the task must be really shit. Rewards don't scale either. After initial success, do you increase the reward?
Everyone is addicted to dopamine, the stuff you get in your head when you like something. Don't give a reward always, because that turns play (which kids like) into work.
They did some tests: random rewards do seem to work. So that's something you can look at.
Better model inheritance - Craig de Stigter
There are three kinds of model inheritance now in Django:
- Proxy models.
- Abstract models.
- Multi-table.
None fit exactly with his usecase.
What he made was django-typed-models. A bit like proxy models, but it does store the type of the object in a type field, so you can figure out what you actually are.
They even use python magery for re-casting objects as a different type: self.__class__ = NewClass :-)
Django-fluent CMS - Diederik van der Boor
It is a CMS he build for his own CMS. Many CMSs are, in the end, monolythic.
He made a CMS that consists of separate parts. If you just want a tree of pages, use django-fluent-pages. If you just want an editable main part of a page, use another app. Etcetera.
See https://pypi.python.org/pypi/django-fluent-pages/, https://pypi.python.org/pypi/django-fluent-contents/
And you can also use django-fluent-dashboard, a more beautiful admin skin.
Update: he's got a website now: http://django-fluent.org/
Adventurer in the land of production environment - Maciej Pasternacki
Is your production enviroment up? Use a monitor like http://pingdom.com.
Django should not run as root. Run it with gunicorn and nginx, for instance.
Get immune to surprise upgrades: pip freeze.
Amulet of life saving: re-spawn when death with supervisord.
Stun immunity: a crontab with @reboot, for instance.
Acquire skill: chef, puppet.
The final battle: the slashdot effect. Gear up: autoscaling, self-healing.
New core committer
Marc Tamlyn is the new core committer!
Invisible and intentional management - Darin Swanson
Your project is not code, your project is your people. Make them happy. Make them do the best they can, no more, no less. Keep them leveling up.
Reward teamwork. It is not about the individual. Don't have individual goals, have team roles instead. Talk about "we" and "us". Lead by example. Help the team. Help everyone do better.
If you're a manager, try to be invisible. You're behind the scenes. Multiplying your impact behind the scenes. Don't take credit, the credit is for the team.
Move people to autonomy. Stay away from command and control. Set degrees of freedom and let people grow.
There's an implicit contract between you and your teammember: I'll give you freedom, you'll share status and information with me.
Discard what doesn't work, double down on what does. Especially regarding teamwork.
Do try to become better yourself, too. Find a mentor, read books, talk with others.
Relationships - Daniele Procida
There are 7E9 people in the world. Is your relationship really the best choice? Mathemetically not. Don't worry. Instead, commit to what you have already chosen and make it the best relation for you.
Same with web frameworks. There are so many... Stop worrying about making the wrong choice, stick with the one you have already chosen and make it the best for you.
Another subject: he wrote https://github.com/evildmp/django-inspector to report on all sorts of pages that his uses have added to his system. Status codes and so.
Arduino.loal
He hacked his landlord's garage door opener. They only had one and there were multiple people that needed to use it. So hack the opener, add an arduino and a webserver to control the garage door. They also added django-social-auth.
An enterprise level garage door opener!
SPDY - Emanuele Palazetti
How to deploy Django over SPDY. How to get that to work? Run django inside jython and thus inside java and SPDY push actually works.
3 simple ways to make your side load faster - Filip Wasilewski
- Database connection pooling. Creating a connection can take quite some time. Connection pooling will come in django core 1.6.
- Cache templates. Especially if you use something lik django-crispy-forms that uses lots of small templates. You only need to enable a template cacher in TEMPLATE_LOADERS in your settings.
- pjax. Push state ajax. That helps a lot.
Salt stack - Chris Reeves
He used to use Puppet, but didn't like the DSL. It was quite slow and wanted something better, stronger, faster.
They came accros Salt. Written in Python. Very fast. It is explicit, you control everything from the master, the clients don't call home themselves.
In your configuration templates you can use jinja2 for loops and so.
See http://docs.saltstack.com/
His verdict: it is consise and clean.
Your webpages are too big
Why should you care?
- Less developed countries.
- Mobile users.
- Overloaded wifi at django conferences.
What can you do?
- gzip compression on the server.
- django-htmlmin for html minification. It is still young and quite buggy at the moment.
- css/js minification. Look at django-pipeline.
- Do you need the full jquery? jquip has 90% of the functionality and 10% of the size. If you need the full version, use a CDN.
- Bootstrap css: don't hit "download" go to "customize" and make yourself a smaller version.
- https://github.com/samastur/image-diet to optimize your images. Works out of the box with easy-thumbnails.
Being a community member - Mark Steadman
He sucks at people stuff. Small groups are OK, but bigger groups are a problem. So that's hard when trying to integrate in a community, also the django community.
He works now on bambu-tools, a huge collection of small useful tools and components. But it needs work and fixes to make it useful for everyone.
Which is, see the first paragraph, hard for him. He'll be at the sprints and he'll do his best!
Django and vagrant and PyCharm
Three kinds of magic:
- Django is beautiful. There's magic inside, but it is beautiful magic.
- Vagrant is non-understandable magic.
- PyCharm was already magic in 2010 and it is even better now.
He now has something even better than magic. He has a miracle. He showed vagrant workin inside PyCharm. Looks quite nice. The debugger even works when the code runs inside the virtual machine.
Jukebox - loci
What to do when different people have different music styles, for instance at a party? Time for democracy. A website running locally on your laptop allows you to log in and vote for numbers. The highest-voted songs will be played first :-)
Classy class based views
Very handy when working with Django's class based views: http://ccbv.co.uk
(Note: I already used a link here in my summary of Russell's class based views talk. See http://ccbv.co.uk/projects/Django/1.5/django.views.generic.edit/UpdateView/ )
Ideas don't solve problems - Lukasc Balcerzak
His first computer came with logo, you could move the cursor with it to draw lines. Infinite possibilities, so no goal.
There are a lot of open source projects. Does it reinvent the wheel? Does it solve a relative simple problem? Those are two ways to rate projects on.
Example one:
- Just try reading a URL with Python. Which built-in library to use? Hard.
- The "requests" library is a small library that solves one relatively simple problem.
Example two:
- Django-guardian extends Django's auth and has shortcuts for basic stuff. Much simpler.
- Django's auth itself is quite elaborate and hard.
Testing class based views - Benoît Chesneau
You can use django.test.client, but that is an integration test. All the middleware and so is used.
For unittests, you can use a request factory. You still test the system, the callable.
We can also do focused unittesting. We can mimic as_view():
view.request = request view.args = args views.kwargs = kwargs
With this, you can test your code much more focused. And you gain speed!
Further reading: http://tech.novapost.fr/django-unit-test-your-views-en.html
Django client certificates - Deni Bertovic
Why would you use client SSL certificates? Isn't user/passwd enough?
The advantage: nginx takes care of authentication.
See https://github.com/denibertovic/django-client-certificates
Arduino - Swift
Normally you have to code in C. But now you can also do it in Python.
See https://github.com/theycallmeswift/BreakfastSerial
He demo'ed it. Very nice! Looks useful and usable and simple. Perfect.
Next conference - Remco Wendt
This is now the fifth year. We have a tradition now! High quality conferences organized for programmers by programmers. Not for profit. Great! And now the fun factor is there, too.
The fun will stay! Next year it'll be France, on the beach, in the south! (They don't know the exact city yet).
Prehistorical Python: patterns past their prime - Lennart Regebro
By Reinout van Rees from Planet Django. Published on May 17, 2013.

Dicts
This works now:
>>> from collections import defaultdict >>> data = defaultdict(list) >>> data['key'].add(42)
It was added in python 2.5. Previously you'd do a manual check whether the key exists and create it if it misses.
Sets
Sets are very useful. Sets contain unique values. Lookups are fast. Before you'd use a dictionary:
>>> d = {}
>>> for each in list_of_things:
... d[each] = None
>>> list_of_things = d.keys()
Now you'd use:
>>> list_of_things = set(list_of_things)
Sorting
You don't need to turn a set into a list before sorting it. This works:
>>> something = set(...) >>> nicely_sorted = sorted(something)
Previously you'd do some_list.sort() and then turn it into a set.
Sorting with cmp
This one is old::
>>> def compare(x, y): ... return cmp(x.something, y.something) >>> sorted(xxxx, cmp=compare)
New is to use a key. That gets you one call per item. The comparison function takes two items, so you get a whole lot of calls. Here's the new:
>>> def get_key(x): ... return x.something >>> sorted(xxxx, key=get_key)
Conditional expressions
This one is very common!
This old one is hard to debug if blank_choice also evaluates to None:
>>> first_choice = include_blank and blank_choice or []
There's a new syntax for conditional expressions:
>>> first_choice = blank_choice if include_blank else []
Constants and loops
Put constant calculations outside of the loop:
>>> const = 5 * a_var >>> result = 0 >>> for each in some_iterable: ... result += each * const
Someone suggested this as an old-dated pattern. You can put it inside the loop, python will detect that and work just as fast. He tried it out and it turns out to depend a lot on the kind of calculation, so just stick with the above example.
String concatenation
Which of these is faster:
>>> ''.join(['some', 'string']) >>> 'some' + 'string'
It turns out that the first one, that most of us use because it is apparently faster, is actually slower! So just use +.
Where does that join come from then? Here. This is slow:
>>> result = '' >>> for text in make_lots_of_tests(): ... result += text
And this is fast:
>>> result = ''.join(make_lots_of_tests())
The reason is that in the first example, the result text is copied in memory over and over again.
So: use .join() only for joining lists. This also means that you effectively do what looks good. Nobody will concatenate lots of separate strings over several lines in their source code. You'd just use a list there. For just a few strings, just concatenate them.
That's the nice thing of Python: if you do what looks good, you're mostly ok.
Django 1.5 Cheat Sheet
By Mercurytide: Django articles from Planet Django. Published on May 17, 2013.
At Mercurytide, we know all too well the difficulties of memorizing shortcuts when you work in different frameworks. You think you’ve mastered it… and then a new version comes along. Mercurytide’s developers have been working with Django 1.5 since its release in February 2013. Our skilled developers have created a solid, quick-start cheat sheet with an easy to reference layout.Dynamic models in Django - Juergen Schackmann
By Reinout van Rees from Planet Django. Published on May 17, 2013.
The classical approach in django is:
- Code development
- You create models.
- Deployment
- Tables and columns are created with syncdb.
- Runtime
- Models and db tables are populated.
This means that models are pretty much static. There is no way to modify them at runtime based on user interactions. You can get something working with for instance hstore in postgresql (see the postgresql talk).
His usecase is for medical forms. The contents of those forms should be able to be defined inside the system. There are strict processes for installing medical software, so you cannot just release a new version with a new field. So you must get it to work at runtime.
The solution could be to use dynamic models, models created at runtime. Sometimes configuration by subject matter experts is better than code customization by developers. Also, dynamic models reduce the number of deployment cycles.
He has some criteria:
- Performance.
- Querability, which means the standard django query stuff should work.
- Django standard tool integration (admin, cache, and so).
- Supported DB backends. If possible, support all django DB backends.
- Complexity/maintainability.
There are a couple of possible solutions:
- Entity attribute value (EAV)
- Colums are stored in separate table rows. Instead of a table having attributes, a table has an attribute table with values. There are at least two apps that provide this. The performance is a problem here.
- Serialized dictionary
- For instance one of the Django JsonField apps. A lot of what is normally database work is now moved to the application. You'll really have to create custom logic in your app to take care of it.
- Runtime schema updates
- Update models at runtime with syncdb of some South functionality or Andrew's new schema migrations for Django work. There are a couple of apps that do this. He also created his own one. The best one seems to be django-mutant.
- Database-specific solutions
- Hstore, django-nonrel. Drawback: it doesn't work with all database backends, of course.
In the end, the runtime schema updates approach looks like the best one.
For more reading: https://code.djangoproject.com/wiki/DynamicModels
Does your stuff scale? - Steven Holmes
By Reinout van Rees from Planet Django. Published on May 17, 2013.
They grew from a two-person company to a 70-person one in two years. Central to that growth were Django and google app engine.
Scalability means both load scalability and functional scalability. You also have to deal with organizational scalability and geographical scalability if you want to grow your organization.
1: Running Django on app engine
It is easy to get confused. Is app engine real? Is it a joke? How to run your django stuff on it?
Their reasons to use it:
- Auto-scaling. They build high-profile stuff and it needs to scale. They had a valentine day site that got a lot of attention on that day and it automatically scaled up without a change in the app. The day after it scaled down automatically, too.
- Services and APIs.
- No sysadmin needed.
Some caveats with app engine: it is a sandbox. You you cannot do "pip install". The filesystem isn't there in the traditional sense; there is a blob storage instead. And it is lock-in, mostly; portability is an issue.
They could work arounds these issues and ended up with a better application as a result.
There are three ways (that they use) of running Django on app engine:
Django non-rel. A ported version of Django, modified for NoSQL. Github, open source. It has a familiar API to Django, so you'll feel at home. It works in production.
A drawback is that the familiarity can be misleading. So you might do things that won't work like M2M relations. And it can feel heavy. Because of the fork/port, it might feel hacky.
Djappengine. A lightweight skeleton around app engine. You don't use django's models. It aims to be the best of both worlds. It also supports NDB, which is app engine's new fast data storage layer.
Drawback: you need to learn a new database API, so you have a higher learning curve.
Django appengine + cloudSQL. You get a fully supported django.
Drawback: there's more setup and it is probably not as scalable as a datastore.
Now to scalability. App engine will already do a lot for you. Some things you yourself must do:
- Plan.
- Cache the hell out of it.
- Offline tasks out of the request loop.
- Prepare load tests and do profiling.
Functional scaling provided by app engine (apart from what django provides):
- Memcache
- Taskqueue
- Mapreduce
- Search
- Images
And you get up to 10 testable versions per app. http://0.yourapp, http://1.yourapp (the previous version) and so on. You can do A/B testing and traffic splitting. It blew his mind when he first discovered it.
2: Scaling an organization + culture
Part of it is organizational culture:
Be a minimalist.
Removed bottlenecks and overhead. Don't get in the way.
Just make good things. You can try (new) things out. You have freedom.
Internal apps. From a pool score app to steering deployments. They also have a big wiki with lots of info in it. It works well for them.
They also build a small Django app to handle all the incoming emailed job applications. One small app build in an afternoon on a beach in Thailand now helps them to hire better people more quickly :-)
You can work from everywhere. Plane, pub, train, at home, in an office, at a beach, whatever. The minimalism helps in scaling.
Important question: what if google shuts it down? Answer: for them, the advantages outweigh the risks. (Note: ouch, this shows what closing Reader did for google's perceived reliability... Everyone in the room was applauding the question...)
The path to continuous deployment - Òscar Vilaplana
By Reinout van Rees from Planet Django. Published on May 17, 2013.
If you've got continuous deployment, you've got stable servers. You make big changes in small increments.
Continuous deployment forces you to do many good things:
- Good tests.
- Repeatable build.
- Well-configured identical machines.
- Automated deployment.
- Migrations and rollbacks
- Etc.
Lots of good things. But let's compare it with lion taming.
Originally, lions were beaten into submission, confused and kept in line with whips. Likewise you'll be beaten if you dare to touch the production machine as it might break.
Now lions are understood better. Conditioning, behavior/signal mapping, reward and trust are the methods now. We understand that deployment is hard. We have behaviour/signal mapping with code/test/green/deploy. Etc.
Continuous deployment: everyone is responsible. Everyone deploys. You automatically learn. Everybody uses the same environment locally for test deployments. The same as on the server.
Testing is core. Slow tests are killing. Fast tests. And all types of tests: unit, functional and acceptance tests. Also automatic code checkers. The light must stay green. Quality must stay high, also test quality.
You need a repeatable build. And it should include not just code, but also configuration and infrastructure. And... always follow the pipeline.
Even in emergencies, follow the pipeline. Peer review, tests, and then the deployment. Don't do manual steps.
Rollback. You must be able to switch back to the previous version.
You can take a canary approach. Canary in the coalmine. Show the new version to a few users. "User testing" in a sense.
Rollbacks in databases; keep it backwards compatible. Never drop columns, for instance. (After a long time, you can remove them safely, of course).
Small changes. Frequent releases mean less risk: if something breaks, you know where to search.
Some tips:
Split your stuff in components. A component is something that has a good API and that can be switched out for a different component. It can also be separately deployed.
This helps with testing, too.
Rehearse releases. Get very good at them!
You need good infrastructure. You must manage it and test it good.
Keep all environments equal. Use vagrant.
Automate everything! And if it is not possible: document it. But know that that's something that's not quite correct yet!
Principle philosophy - Swift
By Reinout van Rees from Planet Django. Published on May 17, 2013.
Principle philosophy: a way to discuss our rules and beliefs that govern our actions. He tells it from his personal experience.
His parents wanted to raise him as a good person. So they thought him good principles (like don't be a quitter, don't steal, etc). This is quite black/white though. We are all more gray/gray.
What about the question "how can I be a good programmer"? Programmers use logic, which sounds black/white again: write tests, don't repeat yourself. Sigh.
Talking about things like this is impossible without Immanuel Kant. He differentiates between reason and instinct. If "be happy" were our life goal, we'd just follow our instincts. So what is reason for, then, apart for doing good? Reason has to do with moral. There are three ways of looking at "doing good":
- Duty. Good things can come from duty. Duty can also lead to non-good things, though. Hm, so this is not it.
- Make a difference between the goal and the outcome. The outcome might be bad even though the goal could be worthy.
- Universal lawfullness. Only do something if you know that everybody thinks it is a good idea.
Does this help with a question like "is testing good"?
Gandhi said that a man is the sum of his actions.
In a sense we are the sum of our experiences. So increase the amount of experience that you have. Either have the experiences yourself, or share them like on this conference. Everything looks different from the trenches: learn from eachother.
Some lessons he learned from a little baseball league experience (where he sucked) as a kid:
- Swing for the fences. Aim for a home run. It allows you take great risks (because you have great goals). It motivates you.
- Set reasonable goals, too. Incremental intermediate goals. Those intermediate goals help you progress.
- You suck... and that's totally OK. You're not good at everything. It gives you a different perspective. And you can still give it your best. Also to that almost-unused old project that you get a bug report for.
Some take-aways:
- Build a strong foundation of principles.
- One size doesn't fit all
- Learn from your experiences and share them.
- Build a great network.
- Ask all the right questions.
The advantages of diversity - Steve Holden
By Reinout van Rees from Planet Django. Published on May 17, 2013.
Open source is great. It is absolutely amazing.
We live in a multi-dimensional world, though it is often presented otherwise.
Some present a simple line-based worldview. Bad-Good for instance. Where do you want to be on the line? Republican-democrat? Ruby-Python? Foreigner-native? Once you think along those lines (...) you tend to start thinking in opposites.
This is the basis for many invalid world views. Just draw a line, cluster according to your preference and you're ready. Linear concepts are not useful. The issue is polariation. In a one-dimensional world, there is no room for complexity.
What about a Venn-diagram based worldview? It allows for a bit more subtlety, but there's still a line on the outside...
The open source world has a lot to teach the rest of the world. It is focused, mostly, on outcomes and results. But it is not representative. It is not even representative of the tech industry generally. In tech, 20% are women, in open source it is more like 2%, for instance.
And... we need diversity! The biggest resource in open source is people. So you'd rather not exclude many people. The most common diversity areas, to give you an idea:
- Ethnicity
- Religion
- Gender
- Culture
- Socio-economic background
Diversity is desirable because each individual is limited. We are all good at some things and bad at others. A group can solve a bigger range of problems And you don't want the group to be too homogeneous.
Typical open source projects will tend to focus on the actual programming and it'll ignore technical writers, designers, training, etc. Django stands out with its documentation. But Python's documentation isn't that good. There's no real emphasis on it in the current Python team. If we don't broaden our community with different skill sets and roles, we'll fall behind. Python is poised to be the #1 language of choice, but we need to improve some things before that can happen.
We ought to involve the community more as open source projects. We should run our projects more professionally. Be more open to involve all of the community more.
We should not accept it anymore to have to read through half-finished documentation and having to fall back to reading source code. "But that takes time to rectify". Well, yes. So involve more people. Get more people with more diverse skill sets to help. Perhaps you can then focus on what you're good at.
It is up to us all. The python world does have an awesome community. But we might just be a bit too smug about how wonderful we are. We should not get complacent and we should keep aiming at increasing our diversity.
Class based views: untangling the mess - Russell Keith-Magee
By Reinout van Rees from Planet Django. Published on May 17, 2013.
Russell is a Django core dev.
Class based views were introduced two years ago, but they weren't greeted with universal acclaim. So he's here to clear up the mess and hopefully make it all more clear for everyone.
History
In the beginning of Django, there were only views. Function-based views. No generic views.
Next, because of DRY, don't repeat yourself, several generic views were added. Listing objects, editing an object, for instance. Editing something happens so often that a generic view inside Django seemed like a good idea.
There are some problems here, though. The configuration you can do is limited by the arguments you can give in your URL configuration. No control over the logic view. You can't pass in an alternative view. There's no re-use between views.
You could "fix" this by adding more and more arguments and allow passing in callables and so, but in the end you're almost building what you'd already get with object oriented class inheritance... So...
Next: class based views. It landed in Django 1.3 after it didn't work out to get it in 1.1 or 1.2.
What went wrong?
Then the wheels fell off. What went wrong?
Fundamental confusion over purpose. There were two problems being solved at the same time. The two: class based views and class based generic views.
Class based views are only a class-based variant on function-based views that handles get/post/put/delete. Classed based views will give you a lot for free. Automatic OPTIONS requests handling. And naive HEAD handling. You wouldn't have that with a function based view. And you can modify it.
Class based generic views use class based views as a base. They're re-writes of the existing function-based generic views. But a bit better and especially much more extensible.
Confusion over implementation choices. The reasons were good, but the reasons weren't clear.
The whole discussion and the choices behind it can be found in the django wiki.
The biggest question is about instantiation. What is being instantiated? How? When? At the start, once, or for every single request? How do you pass in configuration? What's the lifespan of an instance? Can you safely assign something onto self? What are the expectations?
Note: the admin was already always class based. And it had state problems (assigning to self would leak state to other requests).
In the end, all this was what resulted in the MyView.as_view(). as_view() results in a class factory. Otherwise they'd have to change the urls.py contract. A view is currently a callable. It would have to be changed to "a callable or a class". It was a value judgment in the end.
Ravioli code. It wasn't spagghetti code, but ravioli. A package with unknown contents.
The generic class based views are made with a whole bunch of mixin classes. The edit view (UpdateView) consists of 9 (mixin) classes. See ccbv.co.uk.
Why would you go through this 9-level madness? Yes, we have a complex class hierarchy. But the reason is that you can easily customize it.
Ravioli tastes good! Maximum reuse of core logic. Extremely flexible. Easy to add your own functionality. But you need to learn it, that is the price you pay for the power you get. Learning means documentation, so...
Bad documentation. The initial documentation was bad. It is now better, but it needs to be made better still.
The biggest thing that needs fixing after the documentation is how to handle decorators like @login_required.
But... did we solve the right problems with the generic views? Modern websites have different problems. Multiple forms. Conditional forms. Continuous scrolling instead of pagination. AJAX support. PJAX (see yesterday's ajax+django talk). Multiple "actions" per page.
Call to action
In discusions, always make sure you whether you mean CBV or CBGV (class based views or class based generic views).
Suggestion made later during the questions: call the latter just "generic views". The old function based generic views are gone, so...
Docs still can be improved.
Experiment with APIs. Django's admin is a useful case study. Why not do that with an API and make it easier to create your custom admin?
Get Django to play with old friends - Lynn Root
By Reinout van Rees from Planet Django. Published on May 17, 2013.
She works for Red Hat on http://freeipa.org, on identity stuff for Linux.
Note: see her website for instructions and code examples.
Say that your pointy haired boss (or customer) asks you to make an internal web app with all the buzzwords.
So you can't use regular django auth, you'll need single sign on. Luckily since Django 1.5 you can have custom user models, so it'll fit with all your external requirements. One or two pieces of MIDDLEWARE_CLASSES and AUTHENTICATION_BACKENDS later and you play nice with the external single sign on. Django can be a team player.
Webserver? You'll probably have to use apache. So the environment can be kerberos+apache. Add mod_auth_kerb for kerberos support. Add a "keytab" (making sure it is chown'ed to apache).
There's a difference between authentication and authorization. Authentication is "just" logging in, authorization is what you're allowed to do. You'll have to connect to LDAP for that to ask which group(s) the user is a member of.
Setting up your own kerberos environment (for testing) is a pain. Unless you use a ready made vagrant box for it. Instructions are on her website.
Keynote - Daniel Greenfeld
By Reinout van Rees from Planet Django. Published on May 17, 2013.
Django conferences have a tradition: there's an external luminary that gets to give a critical talk on Django. His talk won't be that. He's not external either: he wrote two scoops of Django together with Audrey (see also my review).
Being critical is sometimes easy. Just bash class based views, for instance. Bashing is easy. A rant like Zed Shaw's is fun, but he's not asked because of his rants, but because of his contributions (like books).
Similarly, Django delivers working stuff and that working stuff makes a lot of our work possible. So here are some good points about Django:
Django is everywhere. So many people and companies use it.
Django is powered by Python. Pep8, python is beautiful. And there's the import this zen of python that we use all the time to steer others and ourselves in the right direction.
Django's API wins. It is understandable. No weird names: templates, views, logging, sessions. Django projects also have understandable structures. If there's no views.py or models.py or templates/ directory, you know someone messed something up.
Fat models are great. Just put your business logic all on your models. They do get big this way, however. You can make a separate module with helper functions you call in your model. Just call it with a model attribute. This way you get a reasonably small models.py and another file with very testable small functions. Win!
The API is clear and logical. We're not fighting about architecture, we are getting things done.
Django views are simple callables. (Even class based views).
Django is awesome at deprecation. Code often just keeps working fine for multiple Django versions.
Django has lots of features. For instance, Django's admin is awesome. This is what you use to sell Django to others.
Tip: don't try to use the admin with a nosql database. Just build something from scratch, that's easier than trying to get it working and especially keeping it working.
Django's full stack is awesome. Real projects are being done in unextended python+django. No third party packages. Not everyone goes to conferences and not everyone knows everything on pypi. You can get a lot done with just Django (though external packages help a lot!).
Django is also python. There are over 30000 packages! Even if only 20%
Documentation. Django set the bar for others.
Truth be told, some other projects have better documentation now. Django set the bar and others followed. They're playing by our rules :-)
Django is humble. We have a tradition of invited critical talks. They shape the community, they shape the core committers.
Here are two critical talk summaries I have:
Good criticism is good. They got a lot on their book. That was hard, but the book is much better for it.
The django community is generous. There's almost an unwritten rule: "the more you help people in the Django/Python community, the more the community helps you".
They recieved a lot of help with their book. They also helped others with free books in case they couldn't pay for it. But that required an email to ask for it. Note that Daniel and Audrey did ask for something in return: either buy the book later, donate to charity or help someone.
It worked! Somone gave a free guitar course to someone else. Someone bought a homeless person a dinner. People did projects for schools/churches/whatever. Contributions to open source software.
It worked! People did good work! Lots of small local positive actions. All over the world.
Call to action: be awesome. Make the world a better place.
Djangocon lightning talks day 2
By Reinout van Rees from Planet Django. Published on May 16, 2013.
Sorry if I mangled any of the names, that's the hardest part of blogging lightning talks. Many don't show their name long enough :-)
Single page web apps with django and extjs - Michał Karzyński
Single page apps: you're writing two apps. A front end one and a back end. The routing is done on the client side. The back end just spits out data (JSON api).
ExtJS has a store that handles communication with the backend. So that talks to your JSON API. Plan that API carefully, try to keep it nicely RESTful.
He showed a one minute demo. There is a longer one on his blog.
Don't trust, check - Marcin Mincer & Tomek Kopczuk
Check and question everything. Seek the best way. Not all good solutions are as good as they seem. They compared a standard view with a tastypie view and the regular view was much faster.
They also checked, for their example, whether using jinja2 would be faster than django templates. Yes, it is faster. Despite what the two scoops book says.
So: check everything for your usecase.
Lessons learned - Tom Christie
Tom maintains the Django rest framework project. He tells us a few lessons he learned.
- Be negative
- Everything someone submits to a project increases the maintenance burden for the maintainer. So suggest things that can be removed. Before submitting a bug, first fix an easy one.
- It is your fault
- You haven't yet stepped in and contributed what you want.
- Forget about DRY
- Simplicity is a design goal, DRY only follows from it
- Link everything
- Don't make me search, just provide a direct link
- A deprecation policy makes change easier
- Figuring out a formal deprecation policy actually makes making changes easier.
- There's no such thing as a core dev
- All of us have what we need to know to be a core developer. They only have the extra commit bit to actually commit the change, but almost everyone can do the work.
Community and learning - Karol Majta
Karol is a mechanical engineer that has some python knowledge. He's new to Django and provides some community feedback from that background.
If you're new to Django, you're new. You need experience. You need to learn. But because you don't know much, it is easy to learn more! And... the community is great at helping you learn. Positive feedback!
Two phase commit - Grzegorz Nosek
Two phase commit is a quite unknown database feature. Everyone knows database transactions.
In SQL it is somethign like PREPARE TRANSACTION 'foo' before a bigger set of changes, COMMIT PREPARED 'foo' afterwards.
This two phase commit is not only for databases, you could also use it in regular python code, for instance when creating files.
Django-downloadview - Benoît Bryon
You manage files with django, for instance for authentication, permissions. FileFields or ImageFields, but it can be also local files, remote URLs or generated files.
django-downloadview provides class based views for almost every usecase. You can also extend and modify those views for your own use cases.
Django is not efficient for streaming files, so you need x-sendfile (apache, lighthttpd) or x-accel (nginx). There's a middleware for that!
(Personal note: investigate that one; looks very useful).
PHP-like django - Markus Tomqvist
DHP is PHP in django :-) You've got a {% code %} template tag that you can write your django code in. Totally dirty and he's never going to finish the project.
It works by calling eval() on the extracted code and by copy/pasting some django wsgi stuff.
Everyone was laughing.
Django pony checkup - Erik Romijn
Last year he spoke about making secure websites.
Lots of those things are remote-checkable. So he wrote the django pony checkup website which you can pass the URL of a site.
He actually ran it on 3707 django websites. The score is not good.
- 7% runs in debugmode.
- 97% has no clickjacking protection.
- 83% has no HTTPOnly session.
Run http://ponycheckup.com/ on your sites!
What is new in Django CMS - Benjamin Wohlwend
The new release isn't out yet; he already shows what's new.
The 2011 version had front-end-editing, but it has some problems which they aim to fix in the new version.
The 3.0 goals:
- Make it beautiful.
- Keep the end user out of the /admin.
- We don't want to interfere with your markup.
- Front end editing suitable for experimenting and playing with it. It should be safe.
- It should be fun.
He showed a demo. Sure looks polished and nice.
L20n, localization 2.0
There's content localization (whicn l20n does not do), l20n does UI localization.
Gettext is the one used now. It is English-centric. It has limited plural-handling, for instance. All those countries have different rules. English just has single/plural, many languages have 1, 2 3/4, 5+, for instance.
There's a lot that needs fixing. L20N attempts to fix it.
He showed a couple of examples. Wow it sure is work for many languages. But it seemed to work quite well.
See http://l20n.org/
django-mail-factory - Rémy Hubscher
Mails with django: you need html and plain text emails, attachments, etc. Then you need to check whether the mails are coming out OK. So you mail it and someone has to look at it.
What we want? Preview html and plain text emails, possibly send one as a test. And good warnings about missing context variables.
You can register all your different emails at django-mail-factory and define the context variables that they need.
Spreading Django - Markus Zapke-Gründemann
How can more people learn Django? You could give tutorials, give talks, hold workshops. For this you need someone to do it.
He prepared http://django-introduction.keimlink.de, which gives a nice introduction. This needs an English translation.
http://django-marcador.keimlink.de provides a tutorial. This one is already multilingual.
He wants to use transifex for the translations later on.
Translating models - Jef Geskens
Jef makes websites in Belgium. That is a problem. They need at least 3 versions: French, English and Dutch. So that also means translating text inside models. Some of those models are their own models, but a lot is in existing external apps and he doesn't want to modify them.
He started django-datatrans that can handle all that without changing anything to the underlying models.
Python deployer - Jonathan Slenders
They originally used Fabric, but they missed some features. So they started python-deployer.
He showed a demo.
Setting up your django project in 60 seconds
You want to get up and running as quick as possible. There are lots of things, though, that you need to do every time.
Take the time to put those initial steps in some sort of ready-made skeleton for new projects. Initial settings, especially if you split them up in separate files. Initial empty south migration. Fabfile, makefile, things like that.
The web of stuff - Zack Voase
By Reinout van Rees from Planet Django. Published on May 16, 2013.
A plane flew over (noisily) at the start of his presentation. He put our work in perspective by saying that that was a 80 ton plane and that we're just building websites :-)
Possibilities
Computers used to take up whole rooms, now you have a smartphone. Big data is really big data now. Moore's lawworks both ways, though, so you have really small computers now. An arduino for instance.
He often makes comparison to the human body. All over our body, sensors give off signals that go into the central nervous system. The brain processes it and gives signals back to muscles if necessary. Sensing, feedback, understanding, reaction.
Stuff can talk to the cloud. Like a sensor in your body talks to your mind, stuff can treat the cloud as a brain. The cloud is what allows small tools to be smart.
Stuff does often need a human to interact with it. Like a smartphone. There's all sorts of people thinking about how to "liberate the computers from their human overlords". Why cannot computers sense and act on their own account?
So how do you bridge the gap betwen sensing and acting of stuff? How do you use Django for it? There's a lot available online about sensing and about acting, but not the communication in between.
The communication medium itself is a bit of a problem. You don't want to have a telephone data contract for every single small piece of stuff. A physical connection isn't always handy either.
His preferred communication medium is Twilio for sending SMSs. The stuff has low memory, so the message length limit is fine.
He showed a demo with a card reader that read his London transport card and sended an SMS to his Django site. The card reader was a combination between an arduino, a 'shield' sms sender and an RFID reader. The django app then submits it to foursquare. (The last part didn't work, probably due to a local foursquare problem, but the django app did have all the data he send from his card reader). Nice.
SUCCESS: after the lightning talks he did it again and now it worked!
Personal development
He had never done any hardware work until four months ago. No compiling for arduino. It sounded a bit scary to him.
It is normal, if you start as a beginner, you're slowly getting better if you keep at something. Then you automatically learn more and thus learn that there's a lot you don't know. That's the dip in the middle. Those are the people we need to keep on board so that they push through to the expert stage.
When you're in the middle, you know how bad you are (or how good you aren't yet). That's the risky phase were people quit.
Likewise documentation. Tutorials are useful for beginners. Reference material is useful for experts. There's not a lot in the middle and you're bound to be a bit frustrated in that stage.
So if you're going to start experimenting with electronics, you're bound to hit a wall, for instance when calculating complex electronic schemas. Push on anyway: the first time you make a phone call with your own device is totally worth it.
Two books he recommends to get you started:
- Getting started with Arduino.
- The art of electronics.
Apps for advanced plans, pricings, billings and payments - Krzysztof Dorosz
By Reinout van Rees from Planet Django. Published on May 16, 2013.
He runs multiple sites with a common business model: accounts with plan subscriptions. So there's an obvious need for a generic account billing application.
The app should not be too specific, as that limits your business flexibility. Also it should not be too generic: you'll end up with an architecture from hell that way. And there's the billing as such: you need to pay close attention to security and so. Hard problem!
What he's making is django-plans for keeping track of the billing data, the plans, etc. And django-getpaid as payment processing app.
django-getpaid
Some challenges for the actual payment integration:
- He wants it to be generic and lightweight. He doesn't want to pull in half of pypi for a payment processing app.
- He wants a single API so that he can switch payment brokers if needed.
- He wants it to be asynchronous. Synchronous processing blocks too long.
- Multiple currency support.
None of the existing apps were good enough, so he made django-getpaid. It is stable and supports a lot of (Polish) payment systems and is pluggable if you need to add another one.
Pluggability is achieved with special backends you can enable in your Django settings. This way you can easily add more. Each backend can read its configuration from the settings, too (it looks a bit like the database settings).
Django-getpaid works through signals and listeners. You configure the listeners to accept the models that represent an order and to extract the necessary information from them. Yes, that means that is quite flexible. It are your models and you get to specify how to extract information. So getpaid doesn't make many assumptions.
There are template tags for rendering the forms that are needed. Easy to integrate. There are some assumptions django-getpaid makes of the backends. There should be a specially-named PaymentProcessor class, for instance.
django-plans
Core concept is a pricing table. Items you can buy in the rows, kinds of customers in the columns, plans in the cells. Plans can be marked as unavailable; there's a quota system; you can price them; periods; etc.
A tricky thing: switching plans! There are so many things that can happen. Does the customer switch from cheap to more expensive? Or the other way around? Is his current period expired or is it halfway? And so on. So... it should be pluggable.
What also needs to be pluggable: taxation policy. There are lots of differences per country.
Taming ajax and django - Marc Egli and Jérémie Blaser
By Reinout van Rees from Planet Django. Published on May 16, 2013.
Jérémie is a frontend developer and Marc does the backend.
Address/state handling and content rendering are the two main challenges.
Address and state handling
Problems:
- Browser history. If you don't watch out, the back button won't be working.
- Deeplinking should stay possible.
- Crawler visibility: you want them to grab your entire site. But they don't use javascript. So you need a special URL for them
Some solutions:
A hash like http://yoursite.com/#/some/id. Javascript will need to handle everything behind the hash.
Problem: without javascript it isn't visible. You're invisible to crawlers. It is easy to implement, though.
A hashbang like http://yoursite.com/#!/some/id. The difference? Google and others replace the URL with http://yoursite.com/?_escaped_fragment_/some/id. You'll have to configure your website to support it. Deeplinks work this way and crawlers can access the site via links in a search engine sitemap.
It works with almost all browsers. And it covers all three mentioned problems. You have multiple URLs, however. And you'll need to maintain legacy URLs.
In django you could implement it with some middleware that detects the _escaped_frament_ GET parameter.
Pushstate. The URL is a regular URL like http://yoursite.com/some/id. The best example is the github website.
Pro: easier to implement on the backend, good URLs, everything crawlable and deeplinkable. It degrades gracefully.
Drawbacks: no wide support. Even IE9 doesn't support it. 62% of the now-used-browser-clients support it. But... it does work, just slower, as you need to grab a whole new page. Another drawback: it is more work for the frontend developer.
Their approach
They do it with pjax: Push state ajax. A pjax link fetches the whole new page source over ajax and extracts a specified div and the title and modifies the browser history. You improve the speed this way by not needing to re-render the entire page, only one part is updated.
There are some existing implementations, like django-jax, django-easy-pjax and django-ajax-blocks, but they all had problems. So they made their own solution:
- Django: template inheritance and filters and middleware.
- Backbone.js
They have two base templates: one for the regular layout and one for the pjax responses. They build a template filter "pjax" that returns whether it is a pjax request or not and modifies the name of the template that's extended. That way you get a mostly empty page for pjax and the full one for regular requests.
Backbone handles the pjax handling, requesting the new page and replacing divs and so. And it keeps track of the browser history.
Some pitfalls: caching and redirection.
- You use the same URL for your regular and pjax response. So caching can trip you up. Setting a Vary header helps, but not in all browsers. So they're now using a special URL and modify it back to the original URL in middleware.
- Redirections happen transparently for ajax requests. You don't have a chance to intercept them. To work around it, they return json for pjax requests with the redirect info in there.
Content rendering
Client site templates can make your site faster. It would be nice to use the same template on the server and client side. They're using https://github.com/chrisdickinson/plate, which aims to be mostly compatible with Django's template language.
Growing open source seeds - Kenneth Reitz
By Reinout van Rees from Planet Django. Published on May 16, 2013.
He shows us three kinds of (more or less) open source projects.
Type 1: public source
Once upon a time there was an "open source project" called the facebook SDK. Basically it just stopped working one day and nobody could help, despite offers for help on the issue tracker. Hacker news got wind of it and it was on the front page for a while. Facebook's reaction? Disabling the issue tracker... (Later on they fixed it).
That's not open source, that's public source. Often it is abandoned due to loack of interest, change of focus or so. The motivation for having it as open source simply is not clear.
Type 3: dictatorship project
Kenneth is the author of requests. An open source project, very succesful. But all the decisions are made by Kenneth.
That's really more of an dictatorship project. A totalitarian BDFL that owns everything. The dictator is responsible for all decisions. Requests' values lie in its extreme opinions. If he'd involve more people, the value would be dilluted. There are drawbacks. A low bus factor. High risk of burnout: Kenneth is the single point of failure.
Lessons learned
Be cordial or be on your way. As a user, you need to keep all your interactions with the maintainer as respectful as possible. The maintainer put a lot of work in it and they don't owe you any of their time.
As a maintainer, you also must be cordial. Be thankful to all contributions. Feedback is the liveblood of your project, even the negative. You'll need to ignore non-constructive comments. Be careful with the words you choose, sometimes contributors take what you say VERY personally. You might have to educate your users. And: a bit of kindness goes a long way.
Sustainability is almost the biggest challenge. Don't burn out. Try to get others to help.
He quotes Wes Beary: "open source provides a unique opportunity for the trifecta of purpose, mastery and autonomy". Pay equal attention to all of these three. Learn to do less, focus more on your purpose, for instance.
Learn to say no. People ask for crazy features. Or they submit quite sane pull requests that, if you allow them all in, makes your project slow and unfocused. Kenneth wants as few lines of codes in his project. Negative diffs are the best diffs!
Open source makes the world a better place. Don't make it complicated!
Advanced Python through Django: metaclasses - Peter Inglesby
By Reinout van Rees from Planet Django. Published on May 16, 2013.
Metaclasses are a handy feature of Python and Django makes good use of them.
When you create certain kinds of classes in Django, a metaclass will do something to the class before it is created. For forms, the various attributes of the class are converted into a base_fields dictionary on the class.
Similarly, a subclass of Model also fires up a metaclass that does some registering. A foreignkey to another model adds a relation back on that other model, for instance.
As a recap, a class is something that can be instantiated into an object. It can have an __init__() method that does something upon instantiation. type(your_instance) will return the class.
Did you know that you can create classes dynamically? See for yourself:
>>> name = 'ExampleClass'
>>> bases = (object,)
>>> attrs = {'__init__': lambda self: print('Hello from __init__')}
>>> ExampleClass = type(name, bases, attrs)
>>> example = ExampleClass()
Hello From __init__
>>> type(example)
<class '__console__.ExampleClass'>
So... we can actually control how classes are created! You could create a create_class() method that calls type but that modifies, for instance, the name. Or we could take all the attributes and add them to a base_fields dictionary on the instance. Hey, that's what we saw in the first Django form example!
Now, what is type exactly? It is a class that creates classes.
This also means we can subclass it! The most useful thing to override in our subclass is the __new__() method. The __init__() method creates instances from the class, the __new__() creates classes. So again we can modify the name and/or the attributes.
How do you use it in practice? Normally you'd set a __metaclass__ attribute on a class. This tells python to use that metaclass for creating the class. The same for subclasses. This is how our Django form classes use the metaclass specified in Django's base Form class.
Django uses metaclasses in five places: admin media, models, forms, formfields, form widgets. Grep for metaclass in your local django source code once to get a better feel for how Django uses it.
Note on python 3: it uses a slightly different syntax for specifying metaclasses. So Django 1.5 uses six to support both ways in a single codebase.
Warning: don't overuse metaclasses. They can make code difficult to debug and follow. Use Django as a good example of how to use metaclasses. Django saves you a lot of work by using metaclasses in a few locations.
See https://github.com/inglesp/Metaclasses
Nice way of giving a presentation, btw. Some sort of semi-interactive python prompt. The software is online at https://github.com/inglesp/prescons
Bleed for speed - Rob Spectre
By Reinout van Rees from Planet Django. Published on May 16, 2013.
He started with a little history lesson. The sea battle of mobile bay. The admiral (Faragut) ordered the ships straight through the minefield (called "torpedoes" at the time). "Damn the torpedoes, full speed ahead". And it worked.
What does this to have to do with Django? Well, "damn the torpedoes, full speed ahead" feels a bit like how rapid prototyping feels afterwards. He's often involved with hackathons. Lot of quick coding in limited time with a lot of people. He learned a lot about his tools that way (and he often used Django).
There's a time to make a distinction between production and prototype. Sometimes it is better to just try something with a prototype. Throw-away code.
Aaargh! Throw-away code?!? We never throw code away. But it is something we must learn. It is good to let go once in a while. Let your code go. It isn't yourself, it is just some code.
The danger is that prototype code is put into action as production code. With some work, this danger can be prevented.
What about Django? Django is the best for prototyping. For rapid prototyping, Django is better than micro-frameworks like Flask that might seem better at first glance. Here are some reasons:
Django was build for rapid prototyping. It originated at a newspaper! 24 hours to build something.
It is flexible. It was build to bend. He can prepare something for the other people programmign with him and get them going and still keep the code in reasonable shape.
Us. The strength of Django is the community that supports it. Stack overflow questions and answers. The django websites. Books like two scoops of Django (see my review). That's not something you have with many other frameworks!
Tip: read especially chapter 2 and 3 of the two scoops book.
One thing he'd add to the book is stuff like fabfiles and makefiles. Handy for rapid prototyping.
Use stuff that's available. For instance Django's staticfiles app for grabbing together all the css/js/png. Whether it is in one directory or split out over multiple apps. It also helps with production.
Also look at brunch for setting up your javascript app's structure. It works well with Django.
Deployment: you need to show your prototype. Heroku is very quick for prototypes. (He mentioned that they have a data center in Europe now in case you need it).
Deployment: use chef. Lots of recipes. You could also use Salt if you're more into Python. Also lots of stuff available. Both take a while to learn, but it is a very good investment.
Configuration management is an extremely useful skill. Do it well.
Tastypie is the quickest way to get a REST api out of your Django. It is the best for rapid prototyping. Another good one is django-rest-framework. It will take a little bit longer to set up, but once done you're working with actual Django views. And django-rest-framework's browseable API is very helpful when you're working with a couple of others
Social auth connectors: everyone makes one and there are way too many half-working ones. He's got two that he can recommend. django-social-auth is very complete. The other is django-allauth for when speed is important for you.
If you don't want to play fair to others during a hackathon: use celery. It is very unfair to use celery, python and Django. The combination with Django is pretty OK to set up. You can do a lot what others cannot do easily. So use it for rapid prototyping. (Regarding setting it up: there are good chef recipes for it).
TEST. Yes, even during a hackathon. He doesn't advocate full test driven development. It is a balance. But the errors that kill you during a hackathon are the errors you make twice. So, for instance, test that all your views simply return a 200 Ok. This already helps prevent a lot of problems.
Look at AngularJS. Even if you don't use the framework itself. Why? It has a great javascript test runner. Good for testing while rapid prototyping.
Getting past Django ORM limitations with Postgres - Craig Kerstiens
By Reinout van Rees from Planet Django. Published on May 16, 2013.
Tip: subscribe to the postgresql weekly newsletter that Craig makes.
Why postgres? A colleague described it as "it is the emacs of databases". There's just so much available inside postgres.
The problem is Django: it treats all databases the same. It doesn't prefer one over the other. It doesn't give special treatment. Look at all the types that Christophe mentioned yesterday: Django only supports a few of 'em. Likewise indexes.
For instance postgresql's Array type. Django doesn't support it, but it'd be perfect for for instance a list of tags on a model. For many of these types, also for the Array type, you have django apps that add support for them.
Great: hstore. NoSQL in your SQL. A key/value store in your SQL. They use it inside Heroku a lot: it scales fine and works fine. To use it in Django, use django-hstore. Add a data field as hstore to a model and suddenly you can do my_object.data = {'key': 'value', ...}!
Queuing: most people use celery. Postgres is a great queue. There's a celery backend called trunk for it.
Postgresql has great text search. You do need to do some setup in your models, but then it works fine. You'll have to read the docs, though.
Indexes. Many types. So it can be a bit of a mistery which one to use. Btree is the normal one. Generalized inverted Index (GIN) is for multiple values in 1 column. Good for array/hstore types. Generalized Search Tree (GIST) is for full text search, shapes, postgis.
Geospatial: just use geodjango. It uses postgresql/postgis's great geospatial stuff.
Tip: look at django-db-tools, for instance for its read-only-middleware that makes your site read-only (for maintenance, for instance).
Django 1.6 has persistent connections, but the current 1.5 doesn't. It can shave a whole lot of the rendering time of your pages if you have some sort of connection pooler! If you want the 1.6 functionality now, you can use for instance django-postgrespool. This really saves a lot of time.
Summary: postgresql is great, Django's ORM is pretty good. And you can extend it.
(In response to a question: never put session data in your database, it is a good way to kill the database.)
Here's the link to his presentation.
Fractal architectures - Laurens van Houtven
By Reinout van Rees from Planet Django. Published on May 16, 2013.
He worked twisted on. twisted And people tend to talk about subjects that are almost antithetical to how Django does things. The thing that he does different from Django is that he's not using a single data source...
Once a database gets really really too big, putting multiple databaseservers next to eachother doesn't really work. You slowly start to get into expensive Oracle territory.
How he set it up now is what he calls a fractal architecture. The whole accepts requests. The parts of the whole acccept requests. The parts of the parts accept requests. That's why he calls it fractal. You could also call it sharded, but that has a bad name: it is something you do when nothing else works.
The way he looks at the architecture is SMTP. Email. Simple.
He prefers SQLite. Simple and included in the python standard library. Sure, you can use postgres but you'll need a VM to re-create the same environment locally as on your production machine. SQLite is the same everywhere.
In fact, he uses Axiom: an object store on top of SQLite. (Note: he is trying to write documentation for it at https://github.com/lvh/axiombook).
Another advantage of sqlite: it is easy to scale down. There's not much lower you can go than import sqlite3! If you want to use postgres, remember you must install it on each and every part :-)
Important: almost nothing is as fast as a local sqlite store, especially when it is reasonably small and fits mostly in RAM. Just look at the regular comparisons of access time for L1 cache, L2 cache, RAM, SSD, LAN, spinning rust, internet and so. So if you have a local database on an SSD with quite some RAM, it'll blow a network connection to some remote database out of the water.
But... some things don't fit locally. You have to search everywhere, for instance. There are three basic solutions:
- Duplication
- You could duplicate the data over all the parts, but that doesn't work if the data is big.
- Sharding
- Sharding will only work reasonable if the data itself, by nature, is sharded. Sales data per region, for instance.
- Separation
- Separate data for separate calculations in separate (local) stores. This is what he uses.
He mentioned paxos and raft (pdf), but I don't remember what for.
Play nice with others - Honza Král
By Reinout van Rees from Planet Django. Published on May 16, 2013.
Many people think that reusable apps don't work: there's always something you need to change or modify. Honza is going to talk about his experience with ella, a django CMS.
He advocates using model inheritance. from ella.core.models import Publishable and then subclass your specific model (YoutubeVideo, for instance) from it. That Publishable has most of the basic CMS functionality. That way you get most of what the CMS needs for free and you still can extend it.
Showing the new model? You can use different templates easily. render_to_string() and friends accept a list of templates. So you can give it ['publishable.html', 'youtubevideo.html'] and so, using templates named somewhat after the model. This way you can re-use basic templates, but modify them if you want, just by providing a specially-named template. No code changes necessary.
They're using Redis to collect information from the Django database on publishables. This way you don't have any problem with Django's database's behaviour of focusing on a single kind of model at a time.
They also use django-appdata for storing extra data on existing Django models. From the pypi page: extandable field and related tools that enable Django apps to extend your reusable app. Through a registry you can add for instance tags to an existing model. To actually see the field in the admin, you do have to make a new ModelAdmin for that . Django-appdata is a hack, but sometimes hacks are very useful.
Through ella.core.custom_urls they even managed to add URLs (like a URL for adding "+1" functionality) to arbitrary models that support it (through django-appdata).
Warning: with great power comes great responsibility. All this is powerful, so it is easy to make a complete and utter mess out of it. Perhaps it is better to convince the customer to forgo a feature?
Warning: keep the defaults sane. Nice to use Redis to make querying quicker and simpler, but you just forced every developer to have Redis installed locally.
Warning: premature optimalization is the root of all evil. Likewise with extensibility. You normally don't need to make an app extensible. But don't close the door to extensibility. Add it when you need it.
Taming multiple databases with Django - Marek Stępniowski
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Marek works at SetJam: "We came to Django for the views, but stayed for the ORM". Django's ORM is pretty much in the sweet spot. SQLalchemy in comparison is less nice, having to learn a non-sql, non-pythonic language.
At SetJam, they have what they call a backend and frontend. The backend collects data and stores it in the database, the frontend spits it out, mostly via feeds.
They started out with one single big database, but that was hard to optimize. Many backend servers would write to the same database and the frontend server would read from it. Hard to optimize.
Next they added a database slave for reading. That was before Django's multi-db support, so they had if/elses in their settings files based on environment variables.
After Django's multi-db support, they could really support two databases and refer to them in the code with 'DEFAULT' and 'SLAVE'.
Later on they splitted up the database even more. What goes where is handled by two custom database routers: a "MasterSlaveRouter" for the master/slave distinction and an "AppRouter" for shuffling some apps' data to certain databases.
Tip: look at https://github.com/jbalogh/django-multidb-router, especially for the handy decorators (@use_master, for instance) it provides.
At a moment they had problems with Django's transaction decorators: they only work with the default database. They had to call the actual code and pass it the right database.
Similarly, South doesn't work very automatically with multiple databases. South's ticket #370 is still open after three years. He hopes he can get a fix into the new south-in-the-django-core code.
He showed a code example that looked pretty OK. Then he showed what needs fixing to get it to work reliably with multiple databases.
Multidb is awesome, but...
- It needs more documentation.
- Full support for multidb in schema migrations.
- It needs better debugging tools (whiny transaction decorators).
- Attributes like _for_write should be more clear. They're pretty important, but the underscore looks like it is unimportant. (Comment: a core dev discussed with him during the questions; he thought this wasn't necessary).
Djangocon lightning talks day 1
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Sorry if I mangled any of the names, I took a photo of the lightning talk submission form and tried to decypher them :-)
From carrots to Django - Kamila Stephiouska
She tells about the Geek Girls Carrots community. A community for women interested in new technology. 11 cities, 4 special meetings, 1 sprint, 5 kinds of workshops.
They like to promote women working in IT.
The held a "django carrot" recently: 14 hours, 10 mentors, 23 participants. They try to get special guests. Last week Daniel and Audrey came (the writers of two scoops of Django).
They chose Django because of the community.
Don't be afraid to commit - Daniele Procida
Lots of people work with Django. Lots of people program with it. There are barriers to getting them to work on Django. They might not be effective. They might be afraid. They might not communicate effectively.
You also need to manage your code and your environment. Virtualenv fixes the environment, but you need to learn that first. Version control helps with your code, but you first need to learn version control.
Similarly, you need to learn documentation and tests.
And you need to learn to have confidence when interacting with the community.
He organizes a workshop on the first day of the sprint to help people learn this. Virtualenv, pip, git/github, python tests, sphinx, readthedocs.
After the workshop you can start working on a couple of simple tickets that he reserved for workshop attendees.
Elasticsearch - Honza Král
Elasticsearch is cool. Open source, distributed, schemaless, realtime. All the buzzwords. Originally it was for searching, which it still does.
It can also handle faceting (analytics). Aggregating data into facets.
Percolator is new. Trigger-like. A query you store in elasticstore. When you submit something to the store, you can get an alert whether it matched a query.
Stop writing settings files - Bruno
We're django devs, so we like settings files. from local_settings import *, that sort of stuff. The problem is that you can't add to existing settings, you have to overwrite it.
You can also have multiple settings files, importing base.py and production.py and so. You end up with lots and lots of settings files this way.
http://12factor.net advocates strict separation of config from code. Which Django doesn't.
So: expose your configuration as environment variables and use that to get them into your settings.
Look at daemontools' envdir. This lets you put environment variables in files in a defined directory and which sets the variables. You can use the same trick in your settings.py, it is only a few lines of code.
The files can be in version control. Your sysadmin will thank you. Easy to set up with salt/puppet/chef.
Teaching 2.0 - Krysztof Dorosz
How teaching should look like. He teaches at a university, so he nows about teaching.
You don't need to know everything better. You don't need to make one fixed PDF with fixed text and a fixed exercise.
He makes his classes in github. Everything in .rst files. Students can propose fixes and improvements. And they do!
This way you treat your students as collaborators and parners instead!
Configuring python environments with Puppet - Dmitry Trofimov
If you want to test with various python versions, you need to build them all and fit them out with their virtualenv and so. And use various django versions.
He prepared all those combinations with puppet. See https://github.com/traff/python.pp
Migrating the future - Andrew Godwin
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Andrew Godwin attempted to raise 2500 pounds for inclusion of south in Django core with kickstarter. It worked. In fact, he raised 17952 pounds!
Why does South need to be replaced by a new version inside Django itself?
- It started 5 years ago, so there's 5 years of learning done in that period. Some things that made sense at the time aren't the best decision now.
- There's poor support for VCS branching.
- The migration files are huge.
- Migration sets get too large. There are projects with 1000 steps!
The inside-django solution has two parts. The actual migration code and a separate backend. So if you want a different migration engine, you can probably reuse the backend code with its support for multiple types of databases.
The new migration format is more declarative instead of imperative like it is now. This makes them smaller. It also allows you to compute the end result in memory and apply one single migration.
Migrations will have a parent. So you won't have a problem with 0003_aaaa and 0003_bbbb migrations that halfway bite eachother. If a merge can be done automatically, fine, otherwise south/django will warn you.
Squashing will be added. You can squash a set of migrations together so that you can start from one new starting point instead of needing to go through the entire list of migrations.
One thing to watch out for: the Django field API will change a bit because the migration code needs to know how to re-create a field. Watch the django developer mailinglist if you're interested.
Read his blog at http://www.aeracode.org/ if you're interested in the details of everything he encounters.
Having your pony and committing it too - Jacob Burch
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Jacob Burch hopes you can learn from him if you're new at contributing to open source. He won't cover virtualenv, git, django's core code structure. And also not what to get involved in. What's this talk about? About you if you have something you want ("a pony") to get into Django core.
You are initially probably going to be a bit afraid. Jacob showed a couple of quotes about people that were initially not quite sure/certain when committing to Django. Then he showed the names of the people those quotes came from: they're now all core committers :-)
Two balances you have to keep in mind:
- You should be both pro-active and patient. This is a tough balance to strike. If you manage it, it helps a lot.
- You should be both confident and humble. Be humble, but be convinced of your idea. How to help here? The best thing is to run all the tests. It will give you confidence that your solution works (if it does). And it'll make you humble once you realize all the end cases that Django (and thus your fix) needs to support.
There are three broad categories of contributions:
- Bug fixes
- Start with a test condition. Something works or it doesn't. A test that demonstrates an issue is worth 20 emails.
- Major contributions
- Do your homework. Search trac, search the django developer mailinglist, become familiar with the code you're proposing to change. You need a go-ahead beforehand, so discuss it on the mailinglist.
- Minor additions
- Treat it as a major contribution. Only a beforehand go-ahead isn't needed here.
(Jacob did some live coding, trying to get a push into Django. In the meantime, he continued with the presentation by showing himself on video :-) )
Some do's/dont's when mailing about something:
- Don't communicate entitlement. Don't focus only on your own needs.
- Communicate patience. Accept that this is the start of a conversation.
- State the problem clearly.
- Show confidence: propose a clear solution. This really helps the core devs, as they have a clear proposal to work from instead of having to come up with something themselves. Creative energy is expensive energy.
- Show your homework. Ticket numbers, list potential downsides/drawbacks.
- Show humility. If you're unsure of an aspect, just ask.
Code is important, but most of the effort will probably be spend in discussing it. That said, here are some code related suggestions:
- PEP8, unless it is consistently ignored on a certain point. Stay consistent locally.
- Respect existing style.
- Comments are your friend. Don't comment too heavily, but make sure that anything unusual is explained.
- Get some peer review before submitting.
Repeat to yourself: you are not your code. Your ego is not on the line. Separate yourself from your code. Humility is really important. Your patch might not get accepted. You might get negative feedback. Don't take it personally. Your code is not yourself, even though it might feel like your own baby.
If it is not getting reviewed: remember that core devs are busy and might not have had time to review it. A bit of persistence is important, but don't irritate people. Tip: get to know people that can commit on conferences or at sprints. That helps.
Once you do get feedback: iterate quickly and get back quickly on the feedback, otherwise the core dev has to load everything back into their head.
Django Sprint workshop
By DjangoCon Europe from Planet Django. Published on May 15, 2013.
We are thrilled to announce that during a DjangoCon sprint Daniele Procida will lead a workshop “Don’t be afraid to commit”. The workshop is addressed to all of you who want to contribute to open source projects but are not sure how to do it.
The workshop will take participants through the complete cycle of identifying a simple issue in a Django or Python project, writing a patch with tests and documentation, and submitting it.
The workshop will take about 3h starting at 11:00 on Saturday in the same place as Sprints. Since the number of attendes is limited (12 people), please make sure to sign up here: http://djangocon-workshops.eventbrite.com as soon as possible!
Combining Javascript and Django in a smart way - Przemek Lewandowski
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Django is a javascript-agnostic web framework. Nothing is built-in so you can be up to date all the time. Javascript development moves very quickly.
The basic approach is to include some custom inline javascript in the html pages. It quickly leads to illegible code that's hard to work on and hard to distribute.
Javascript has frameworks, too. They give your application structure and take work off your hands. This is the advanced approach. It includes several parts:
- Communication with the server (REST api, websockets).
- Application building: combining and minimizing files.
- Static files management.
- Javascript improvements: coffeescript and so.
What Przemek Lewandowski needed was a powerful javascript framework, coffeescript, testable code, js code minimization and fingerprinting for avoiding caches. And also rapid REST API development.
- Javascript framework
- They started with backbone, but it wasn't enough. They added marionette to backbone, but it still wasn't good enough. There's a lack of a binding mechanism; there are no reusable views; models are poor. AngularJS and Ember are better.
- Coffeescript
- It is controversial, but it helps to write code faster and use less code for it. It performs as well as javascript as it compiles to javascript. They used requireJS for painless coffeescript integration. Requirejs allows for modular code and gives you both a builder and an uglifier.
- Building javascript apps
- In the end they used django-require instead of django-compressor and django-pipeline.
- REST api
- Piston isn't really maintained anymore. Tastypie is reasonable, but django-rest-framework is the nicest one. It uses class based views, so it saves you a lot of work (even though still being very customizable).
- Static files management.
- Django's built-in static files management is good. And you can add extra "storages" to it to get django to store the static files in the cloud, for instance. django-require can be plugged in, too, to add a fingerprint to javascript files to ensure the latest version is always used.
There's more, like Bower, a javascript package manager. He didn't look at this yet. (Note by Reinout: look at http://blog.startifact.com/posts/overwhelmed-by-javascript-dependencies.html for a starting point)
Getting recommendations out of nothing - Ania Warzecha
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Ania Warzecha researched recommendation systems. Recommendations means estimating ratings or preferences for items a user hasn't seen yet. For example books or movies you might also like based on earlier purchases.
There are three kinds of recommendations.
Collaborative recommendations. Mostly created based on actions from other users. Which books are often bought together, for instance.
Simple to implement, but can be slow for big datasets. And doesn't work well on new items and/or new users
Content-based recommendations. Looks for similar items.
Fast and accurate, but tends towards over-specifications regarding needed data.
Hybrid methods. Combining them.
A case study: a Polish car parts website. You normally don't log in there, you just want a part. So older purchases aren't available. They did have a lot of parts and data, so they started with content-based recommendations.
They mixed in some basic user actions. 0=didn't buy, 1=browsed, 2=bought. Later on more elaborate, like points for items found through searching or items placed on wishlists.
They used Redis for its quick addition of user actions, simply pushing an additional score to an item which then gets added in the database.
One thing they needed to do was to merge session keys after a user logs in, merging the before-login session with the logged-in user's session. They didn't want to lose data collected till that point.
Now on to figuring out similar users. Common techniques are Euclidean distance, Pearson correlation and cosine similarity. But the problem was that it was slow. So they made an intermediary cache table in Redis.
Some conclusions:
- Redis is good for fast storing and painless calculations.
- Content-based recommendations are good for big datasets.
- Keep all the data you can keep.
Advanced PostgreSQL in Django - Christophe Pettus
By Reinout van Rees from Planet Django. Published on May 15, 2013.
(See also last year's talk)
Database agnosticism: write once, run on any database. A critical selling point for Django: it runs on many databases. But for others, it is bad. You pay a performance hit for not using database-specific features. So once you have made your choice, really use that database.
Here are some examples of good special things available in postgres.
Custom types
Custom types. If you like types, you'll love postgress. Many built-in types. And many are usable in Django by installing some small app.
- Do you do .lower() in python code or in your SQL? For an email address for instance? Why not use citext, a case insensitive text field provided by postgres.
- Often you want to add various key/value data to an object. Attributes. Extra table with a join? Add fields to the main table? Solution: hstore.
- Postgres has a built-in json type! No need for mongodb :-) It is validated going in. Postgres 9.3 will make it much faster.
- The UUID type is much more efficient than storing a long character string.
- IPv4 and IPv6 addresses.
You can define your own! And it is easy to integrate into Python and Django:
- You adapt it into psycopg2. This'll mean quite some regex'ing, but there are many examples.
- You write a field class for Django.
- You write a formfield and widget for use in forms and the admin.
Indexes
Django's models are great, but the index creation functionality is limited.
- Very cool: partial indexes. You can create an index that only indexes a part of the table. Filter out inactive items, for instance. It might make your index much smaller and quicker.
- Multicolumn indexes. Speeds up selection on multiple columns.
- Expression indexes.
For these things you need to get custom SQL into the database. Using South is the only sane way.
Custom constraints
Django does foreign key constraints in the ORM, not in the database. The only other constraint is uniqueness.
Constraints should be pused into the database whenever possible. The database is much more efficient at it. And you remove one major path that could lead to data inconsistency.
Actually getting the constraints into the database means custom, hairy SQL. Sadly. He's working on something better.
You can use exclusion constraints, like not allowing a room booking if it overlaps with another.
Raw SQL
Christophe's rule: if you are joining more than three tables, use raw SQL. Below three, just use the ORM.
Django has raw query sets that even give you back actual Django model instances. See the django documentation.
Sometimes you just have to dig in and write some 40-line monster SQL to get some operation down from 30 seconds to 10 miliseconds.
Where to put the SQL? In the manager of the model, not directly in the view. You can also wrap it in SQL stored procedures. Again: use south to add stuff to the database if you need to.
Closing comments
- Don't limit yourself because of some hypothetical need to later switch databases.
- Postgresql has lots of advanced features: use them!
Processing payments for the paranoid - Andy McKay
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Everyone should be paranoid when processing payments. The client, the programmer, everyone.
He works on Firefox OS and more especially the marketplace ("don't call it an app store"). The marketplace is powered by Django. And of course it accepts payments. And of course it is open source (even the presentation is on github).
Btw, they have a bug bounty in place. If you find a real bug, mail them and they'll pay you a bounty!
The firefox add-on website already allows donations for firefox add-ons, handled through paypal. 500-2000 dollar per day. But the marketplace will process much larger amounts of money, so they needed to increase their paranoia level.
For online payments, you need tokens and credentials. And they need to be stored somewhere. And suddenly you're a big fat juicy target just waiting to be hacked.
XSS (cross site scripting) is an oft-occurring problem. Django has build-in protection for common cases. There's also content security policy that further limits it.
They also started navigator.mozpay.
Phishing. In-person tricks. For instance for getting your hand on a database for test usage. You do need something for debugging, so they now create an automatic anonymized debug database.
SQL injection and so. They now have a REST api (solitude) for payments. This isolation helps preventing injections. Inside the database, lots is encrypted. And several items are stored outside of the databases. At the moment, the transaction data is separated from the payments data which is separated from the payment provider credentials.
This is defence in depth: hedging against your own stupidity, basically.
Access can happen through requests and oath1. Andy uses curling and slumber.
There is a list of common problem points: OWASP. After reading through it they started django-paranoia which for instance provides paranoid forms: if you submit more key/values than expected, it will be logged. Also something that watches if your user agent changes during a session... IP changes are also logged, but normally they'll be valid. But if the first IP is in Poland and 5 minutes later it is in China...
About the phone: version 1 isn't done yet, but very very nearly. Which means they'll start to have scaling problems soon which need solving :-)
Circus: process and socket manager - Tarek Ziadé
By Reinout van Rees from Planet Django. Published on May 15, 2013.
Tarek Ziadé can't believe he's giving a talk about his circus process manager in an actual circus tent :-)
A typical deployment is with nginx and gunicorn or uwsgi. But you add more and more items right next to your django process(es). Celery or haystack for instance. So you add a supervisor that starts 'em all. An often-used one is supervisord. You could use a system-level tool like upstart, but you need root access for that. You don't need it for supervisord.
Supervisord has some missing features like a powerful web console, clustering, realtime output, remote access and so. Supervisord has some of this, but not good enough. So they (mozilla) started with Circus.
They used several existing libraries, like psutil, zeroMQ, socket.io. psutil is the core of the system. Very handy for interacting with processes. It was a bit slow, but together with the psutils author they managed to make it fast.
ZeroMQ is an async library for message passing, so more or less a smart socket. They use message passing for making the various process data available to the circus tools, like 'circus-top' or 'circusd-stats'.
And because everyting is nicely decoupled, it is possible to add your own plugins for custom interaction. There are already community-provided plugins available.
He showed the web interface: looks nice. Live graph per process with memory and CPU usage. A simple "+" to add an additional process.
One last thing: there are multiple levels of supervision. Supervisord or circus must be started in some way by the system. And gunicorn, launched by supervisord or circus, itself starts up django processes. They added chaussette as a wsgi runner that can also run new processes on already-opened sockets so that they can be managed by circus, too.
2013 EU Djangocon introduction
By Reinout van Rees from Planet Django. Published on May 14, 2013.
I'm at the 2013 European djangocon in Warsaw! Ready for three days of conferencing and, for me also, live blogging :-)
Russell Keith Magee started off the conference. He remembered Malcolm Tredinnick, mentioning his code contributions, but especially his community involvement. Lots of mailinglist messages. Lots of personal involvement, too, as he visited many people and local communities. Not only Django: also chess, for instance. And he build a community here, too: working on the Australian chess community.
He passed away unexpectedly a few months ago.
Make the most of the time you have. It can be over quickly. And especially: be part of communities. Make communities work. And especially make this Django community work. Make friends. Enjoy our friendly community!
Query a Random Row With Django
By Ed Menendez from Planet Django. Published on May 14, 2013.
Here's a gist for a drop-in Django manager class that allows you to return a random row.
Model.objects.random()
It can be used in your models.py like this:
class QuoteManager(RandomManager):
def random_filter(self):
return self.filter(is_active=True)
class Quote(models.Model):
quote = models.TextField()
by = models.CharField(max_length=75)
is_active = models.BooleanField(default=True)
objects = QuoteManager()
def __unicode__(self):
return self.by
Advantages over using the order_by('?') is performance. Random sort at the database seems to be extremely slow on most databases even if the table only has a few thousand rows. Note that the count of records is cached for 5 minutes, so if the table changes often you may want to change that. A limitation is that it only returns one row.
Two scoops of Django book review
By Reinout van Rees from Planet Django. Published on May 14, 2013.
I took the train from Utrecht (NL) to Warsaw today. I only had to change in Amersfoort (NL) and Berlin (DE), so it was a pretty direct connection. 12 hours of train time (which I enjoy). So that's enough time to read through two scoops of Django, the Django book by Daniel Greenfeld and Audrey Roy! Here's my review.
The summary: buy the book and learn a lot. For the longer version, I'll simply go though my notes I made for each chapter.
- Coding style
- The book starts off good, in my opinion, because it tells you to write good and neat code. PEP8. Good not-too-short variable names. And it got me thinking by advocating explicit relative imports ("from .models import SomeModel"). That's what's good about this book: Daniel and Audrey state preferences and tell you best practices and sometimes those best practices won't be your best practices. Or you didn't know something. Anyway, it gets you thinking; which is good and enjoyable.
- Virtualenv
- Hey, virtualenv in chapter two! Nice. Here, like in the rest of the book, I noticed they point a lot at existing documentation and don't provide much explanation. No virtualenv explanation here, for instance, just a pointer to the official docs. It is not necessarily bad, probably even good, but it is something to keep in mind. You'll have to do some work yourself (which will make you retain the knowledge you gain better anyway!).
- Project layout
- I got tickled here. Directories three levels deep? Especially having urls and settings in a subdirectory within a directory with the very same name (the name of the project)? Two chapters later, the settings seem to be moved to a settings/ directory, so mayhap I looked at an older beta version of the book :-)
- Apps
- The gospel that a Django application should do one thing only is repeated here, which is a very, very good thing. I completely agree. Advice like this is what makes it a good book. You get a good mindset out of the book.
- Settings
- The idea to have a directory settings/ with a base.py and then production.py, dev.py, reinout_dev.py and so looks OK. I use a different setup and this proposed one looks better. Half the chapter is about enviroment variables as a means of keeping things like SECRET_KEY and database passwords out of the settings. Yep, that can work. My opinion is that you can also keep it in the settings, provided you keep your code non-public. If you use environment settings, you still need to store the data somewhere. You won't type it in by hand, will you? There's no real suggestion in the chapter to solve this, though the solution of course depends on your chosen setup (and there are too many different kinds of setups to provide a single right answer). Nice touch is the clean suggested ImproperlyConfigured error when a enviroment variable is missing, this shows the care that went into the book.
- Models
- Hey, I learned something new! I didn't know about auto_now and auto_now_add on DateTimeFields! That's what I read books like this for: getting hints like this.
- Views
- Reminder for myself that I took out of these chapters: put less code in views.py. And I ought to look at django-braces for handy Class Based View mixins.
- Forms
- Hm. I probably ought to use forms much more. Especially for those spots in my code where I just take two or three variables directly from the GET or POST and stuff it in some query... Why not use a small form, just for the form validation? Much safer that way and I'd use more of what Django gives me! Again something in the book that educates me :-)
- REST
- It took me a while to spot the difference between the two views that are shown at the start. It turns out that one is a view on the collection of items and the other a view on one single item. The first has list+add, the second view+edit+delete. The names just don't make it clear. I think this chapter is a bit too short. On the other hand, perhaps one extra paragraph and two better class names would be enough.
- Templates
- Flat is better than nested. Solid advice not to go overboard with blocks and template inheritance. Oh, and TEMPLATE_STRING_IF_INVALID is handier than I thought for template debugging as you can add a %s to the string, which shows you the failed expression. This tip is going to help a lot.
- Admin
- They say that the admin should only be used for site admins, not for end users. It is just as easy and probably better to make a couple of quick edit pages or dashboards for your customers.
- Third-party apps
- There are lots of apps you can use. Look at http://djangopackages.com. Did you know it was written by the same people that wrote this book? Now you know why you should read the book.
- Testing
- Most of my favourite/essential packages are mentioned: coverage.py, factory_boy, mock. And the tip to zap the tests.py file and replace it with a tests/ subdirectory full of test files is correct.
- Documentation
- Documentation is mandatory. Even when installation is done with a fabfile or with chef, tell it anyway in the installation docs. And describe what the goal of the app is. Etc. Documentation is mandatory.
- Performance tuning
- Debug toolbar: yes. Hey, but I didn't know yet about django-cache-panel to see what happens in the cache. Sounds handy. This chapter also whacks me on the fingers a bit as I have almost done nothing with sql/db level optimization/fixing/profiling.
- Security
- "Always use https". And lots of other good tips. And, for me, the reminder to use forms (or rather, form validation) more for better security.
- Logging
- Good that the book mentions logger.exception('Something went wrong'), as it logs the message at ERROR level and automatically includes the traceback. No more weird exc_info-like stuff, just logger.exception().
- Django utils
- A handy list of utils that Django already provides such as sluggify, strip_tags and so.
- Deployment
- Gunicorn and mod_wsgi. Personally, I'm happy with gunicorn (when run behind supervisord). Nice isolation. Nice mostly-transparent restarts when things barf.
- Getting help
- Good thing: they tell you to do your homework before asking for help in the usual channels. Very good.
I've got one big gripe with the book. There's probably a good reason for the omission, but I'm missing setup.py. Telling to use a certain requirements.txt in some README is a poor substitute for Python's automatic dependency handling. This is not only good for apps you want to put on PYPI, but also for your own packages.
All in all: valuable book, buy it!
Daniel and Audrey are at Djangocon in Warsaw, so if you're there say them "hi" to thank them for the book.
Be Nicer at DjangoCon!
By DjangoCon Europe from Planet Django. Published on May 12, 2013.
With a few days to DjangoCon, we thought it’d be nice to let you know how we are going to make your stay in Warsaw even NICER.
The Nicer app is here! Available for both Android and iOS, smartphones and tablets. By downloading this app, you will always be connected to us, organizers. We will update you with hot news, unexpected agenda changes and post you little tips so you can spend awesome time in Warsaw.

Get the app of your choice here: http://getnicer.com/apps
Then simply follow DjangoCon Europe 2013. Make sure to turn on push notifications so you won’t miss an important event!
Other than that, make sure to follow us on twitter: @djangocon, tweet using the #DjangoCon hashtag, check out our pictures on instagram and videos on Vine to get a full coverage :)
See you in Warsaw really SOON!
Enabling CORS in Angular JS
By Torsten Engelbrecht from Planet Django. Published on May 12, 2013.
I was recently experimenting with building an API with django-tastypie and make it accessible via CORS, so it can be used from a different host from an AngularJS app.
For the Django part it was relatively straightforward. I could have either written my own Middleware, dealing with incoming CORS requests, but decided to use django-cors-headers in the end. Following the instructions in the github repo and adding my host where AngularJS is hosted to the CORS_ORIGIN_WHITELIST setting did enable the Django server to handle CORS.
With AngularJS it was a little more tricky, mainly because information is spread all over the web. Beside the fact that I was trying to implement a service using ngResource to communicate with the API, the following did enable AngularJS to send its requests with the appropriate CORS headers globally for the whole app:
var myApp = angular.module('myApp', [
'myAppApiService']);
myApp.config(['$httpProvider', function($httpProvider) {
$httpProvider.defaults.useXDomain = true;
delete $httpProvider.defaults.headers.common['X-Requested-With'];
}
]);
_
Just setting useXDomain to true is not enough. AJAX request are also send with the X-Requested-With header, which indicate them as being AJAX. Removing the header is necessary, so the server is not rejecting the incoming request.
Meet our Platinum Sponsor: Mozilla
By DjangoCon Europe from Planet Django. Published on May 12, 2013.

Mozilla hardly requires introduction. They make Firefox. And Thunderbird. And Persona. And other web stuff. They also fight for the users, by keeping the Web open and diverse. All this while being non-profit organization. They also help others, like, you know, sponsoring conferences :-)
What are they up to now? Firefox OS seems to be the new, hot project. A mobile operating system running Linux kernel that you can write applications in technologies you already know: HTML and JavaScript. Combine it with Firefox Marketplace and you have complete mobile ecosystem. Check them out at Mozilla Developer Network.
We are super excited to have two speakers from Mozilla: Andy McKay will speak about processing payments in Marketplace and Tarek Ziadé will talk about… circus!
Django Facebook – 1.5 and custom user model support
By Thierry Schellenbach from Planet Django. Published on May 11, 2013.
Django Facebook now officially supports Django 1.5 and custom user models! Go try it out and upgrade to pip version 5.1.1. It’s backwards compatible and you can choose if you want to keep on using profiles, or migrate to the new custom user model. Installation instructions can be found on github.
Contributing
Thanks for all the contributions! My startup (Fashiolista) depends on a reliable Facebook integration and maintaining it would not be possible without all the pull requests from the community. Contributions are strongly appreciated. Seriously, give Github a try, fork and get started :)
About Django Facebook
Django Facebook enables your users to easily register using the Facebook API. It converts the Facebook user data and creates regular User and Profile objects. This makes it easy to integrate with your existing Django application.
I’ve built it for my startup Fashiolista.com and it’s currently used in production with thousands of signups per day. For a demo of the signup flow have a look at Fashiolista’s landing page (fashiolista.com)
After registration Django Facebook gives you access to user’s graph. Allowing for applications such as:
- Open graph/ Timeline functionality
- Seamless personalization
- Inviting friends
- Finding friends
- Posting to a users profile
Django Facebook helps you quickly develop Facebook applications using Django.
Let me know what features or issues you are encountering!
The Easy Form Views Pattern Controversy
By Daniel-Greenfeld from Planet Django. Published on May 10, 2013.
In the summer of 2010 Frank Wiles of Revsys exposed me to what I later called the "Easy Form Views" pattern when creating Django form function views. I used this technique in a variety of places, including Django Packages and the documentation for django-uni-form (which is rebooted as django-crispy-forms). At DjangoCon 2011 Miguel Araujo and I opened our Advanced Django Forms Usage talk at DjangoCon 2011 with this technique. It’s a pattern that reduces the complexity of using forms in Django function-based views by flattening the form handling code.
How the Easy Form Views pattern works
Normally, function-based views in Django that handle form processing look something like this:
def my_view(request, template_name="my_app/my_form.html"):
if request.method == 'POST':
form = MyForm(request.POST)
if form.is_valid():
do_x() # custom logic here
return redirect('home')
else:
form = MyForm()
return render(request, template_name, {'form': form})
In contrast, the Easy Form Views pattern works like this:
def my_view(request, template_name="my_app/my_form.html"):
form = MyForm(request.POST or None)
if form.is_valid():
do_x() # custom logic here
return redirect('home')
return render(request, template_name, {'form': form})
The way this works is that the django.http.HttpRequest object has a POST attribute that defaults to an empty dictionary-like object, even if the request’s method is equal to "GET". Since we know that request.POST exists in every Django view, and os at least as an empty dictionary-like object, we can skip the request.method == 'POST' by doing a simple boolean check on the request.POST dictionary.
In other words:
- If request.POST dictionary evaluates as True, then instantiate the form bound with request.POST.
- If the request.POST dictionary evaluates as False, then instantiate an unbound form.
Great! Faster to write and shallower code! What could possibly be wrong with that?
The Controversy
Before you jump to convert all your function based forms to this pattern, consider the following argument raised against it by a good friend:
This one of those things where "empty dictionary and null both evaluate as false" can bite you.
There's a difference between "There is no POST data", and "This wasn't a POST".
—by Russell Keith-Magee (paraphrased)
The problem he is talking about is data besides multipart/form-data or application/x-www-form-urlencoded would still end up in the request.POST dictionary-like attribute.
Where is the controversy? Well, I didn't write a retraction until now. Arguably I should have done it earlier. However, since I never ran into the edge case, I didn't see the need. Yet when it comes down to it, the "Easy Forms" approach has an implicit assumption about the incoming object, which in Python terms is not a good thing.
Getting bit by the Easy Form Views method
Here's how it happens:
Before Django 1.5 HTTP methods such as DELETE or PUT would see their data placed into Django's request.POST attribute. The form would fail, but it might not be clear to the developer or user why. HTTP GET and POST methods work as expected.
For Django 1.5 (and later) if a non-POST comes in then the form fails because request.POST is empty. HTTP GET and POST methods also work as expected.
Conclusion
Going forward, I prefer to use Django's class-based views or Django Rest Framework which make the issue of this pattern moot. When I do dip into function-based views handling classic HTML forms, I'm leery of using this pattern anymore. Yes, it is an edge case, but to inaccurately paraphrase Russell, "edge cases are where you get bit".
What I'm not going to do is rush to change existing views on existing projects. That's because personally I've yet to run into an actual problem with using this pattern. As they say, "If it ain't broke, don't fix it." While I'm not saying my code isn't broken, I'm also aware that 'fixing' things that aren't reporting errors is a dangerous path to tread.
Also, next time I get called on something by a person I respect, I'll respond more quickly. Nearly two years is too long a wait.
Update: Changed some of the text to be more succinct and took out the leading sentence.
Stonewall Jackson and documentation
By Reinout van Rees from Planet Django. Published on May 10, 2013.
Today it is 150 years ago that Stonewall Jackson died. Not everyone will recognize the name: it is a general from the American civil war. And a good one at that!
Bear with me, I'll have a programming-related comment to make on documentation :-)
If you know a bit about the second world war, you might have heard about the German general Erwin Rommel. Jackson's fame was a bit like that. If you had to fight Jackson or Rommel, it didn't really matter that you had more men and equipment: he'd beat the crap out of you anyway. Once at a time Jackson's 15000 men ran circles around 60000 opponents and repeatedly beat them. That's 1:4. And they won.
Both Jackson and Rommel seemed to have a Fingerspitzengefühl. They'd known instinctively when to do or not do something. When to lay in wait and when to strike out despite the odds.
Both also seemed to be one-of-a-kind. I mean it in the sense that they could not teach others to do the same. It was all in their own head. It was all dependent upon them. And at least Jackson didn't tell anything to his subordinates; he was secretive. When he died, there was no one to take his place and no one who could emulate his expertly handling of his army.
Here's the link with programming: document your stuff. If it isn't documented, it doesn't exist. I updated a small internal tool two days ago and had to figure out which commandline arguments to pass to it because I had not originally documented it! There was a README, but only with server installation instructions; no local test instructions. And it only described one of the two scripts. Needless to say, I've now corrected this situation.
I wrote the tool originally and I'm the only one working on it. But after I haven't touched it for a year I sure need a reasonable README to get myself back on track. So: document! Try to pass on knowledge.
Btw, Stonewall Jackson died because he was shot by friendly troops. I'm not suggesting programmers should be shot for not-documenting their stuff, but a forceful reminder here or there could be useful :-)
Party with Base!
By DjangoCon Europe from Planet Django. Published on May 09, 2013.
You probably thought one party is good enough for a conference. We’re here to prove you wrong! Awesome guys from Base are helping us organize Base Party on Wednesday, 15th May at 9pm in Klub Balsam.
Base is the only CRM built for people. They believe that by 2020, business software will be radically different. Base is paving the way by building the next generation of CRM software. Their mission is to make you and your team 10x more productive.
Most importantly, they’re building the best tech team in Europe. With a big vision and small, highly talented team, Base is creating an amazing place to work for self-driven and dynamic people. Watch this short video to know them better:
They’re looking for great python developers to join their team. Make sure to drop by Base Party at 9pm, Wednesday in Klub Balsam!
Starting Off
By Andrew Godwin from Planet Django. Published on May 09, 2013.
Welcome to the first of my Django Diaries, where I'll be detailing the progress I'm making on my Schema Alteration project.
After a very successful Kickstarter, I had the unfortunate situation of a couple of successive trips abroad, and so initial work has been a bit more delayed than I would have liked. However, thanks to securing more time to work on the project every week, progress should be faster than planned from now on.
The plan is that these diaries will contain a rough summary of the work I've been doing; they're here both to help engage you (the slightly-too-interested public) in the work I'm doing, as well as providing some transparency.
If you want to hear more about a certain issue, feel free to get in touch with me - see the About page for my contact details. I'd love to explain as much as I can to those who are interested!
Laying the Groundwork
The first task I faced was to go back to my original Django branch and get it up-to-date with the changes in trunk. The only change that affected the schema work was Aymeric Augustin's transaction changes - he's gone in and fixed a lot of the transaction API and cross-database differences with things like autocommit.
As a result, I got to simplify my code somewhat: https://github.com/andrewgodwin/django/commit/6e21a594
After that, the next step was to go in and fix the issues other core developers had with AppCache in the previous release - in particular, the way I was abusing it to make new models at runtime. But first, let me explain a little bit about how AppCache works, for the uninitiated.
AppCache
Note
Other responses may include "templates", "the URL dispatcher" or possibly just "everything"
Ask a core developer what part of Django they dislike most, and chances are good that AppCache will appear somewhere in that list. It's a very old part of Django, and responsible for both knowing what apps are available to the project as well as which models are available.
Django depends far too heavily on it - anything app-related in Django generally touches it, even if it has nothing to do with the ORM. That's a problem being solved by the app-loading branch, which has been going for quite a while but is ever so close to landing.
However, my issues lie elsewhere. The main problem is that any schema migration design is going to have to be able to make historical versions of models - if you have a data migration to run before a schema migration, that data migration needs old model classes as the tables won't yet match the schema your project currently has.
Alas, every time you make a new models.Model subclass in Django, an entry gets placed into the AppCache for that model. This is very useful - it's how ForeignKeys know how to find the other end of their relation, for example - but it means that if we're making three or four old versions of an Author model it's going to trample all over the AppCache and mess everything up.
Resistance is... fine, actually
Note
For those completely unaware, the Borg are an alien race in Star Trek who all share a single hive mind.
Even more excitingly, the AppCache class uses what's known as the "Borg Pattern" - any instance of that class will share state. That means we can't just make a second AppCache to put temporary models in!
The work I did was in two parts: de-borgify AppCache, and allow a per-model app_cache option.
AppCache actually still uses the Borg pattern, I've just moved all the logic down into a BaseAppCache (along with a setting which means additional caches don't try and load models from every app). This means that my code can now just call:
new_app_cache = BaseAppCache()
I might tidy up the class name into something more suitable, we'll see.
The second change is an app_cache option for models:
new_app_cache = BaseAppCache()
class Author(models.Model):
class Meta:
app_cache = new_app_cache
This means you can now assign models to something other than the default AppCache when they're created. Obviously this isn't meant for end-users to develop against; it's so we can make models at runtime into a separate, sandboxed AppCache, with ForeignKey resolution between them still working, but no pollution of the global cache.
You can see most of the changes here: https://github.com/andrewgodwin/django/commit/104ad050 and https://github.com/andrewgodwin/django/commit/75bf394d
Graphs, Graphs Everywhere
Now the groundwork is laid and models are easily creatable at runtime, the next step is to move onto the migrator itself. This will eventually do three main jobs: parsing the available migrations into a big dependency graph, building up versioned models from those migration files, and running the migrations to change the database schema.
It's best to start at the base of all this, which is the dependency graph. This is what migration files get fed into as they're read off disk, and how we work out which migrations to apply to achieve our end goal.
Note
South just takes the filename, ASCII-sorts them, and uses that as the dependency graph for an app.
I'm making a few changes compared to South's original model of this graph; in particular, there won't be implicit dependencies between adjacent numbers (the fact that 0004 depends on 0003 will be recorded in 0004's file) and it'll be possible to "rebase" an app's migrations (throw away historical ones and start afresh).
The numbering dependency decision is so VCS merges can be handled more gracefully - rather than just trying to see a "hole" in the dependency history, it'll be possible to detect that an app has two topmost migrations and prompt the user for action (either an automated rearrange to get a linear history or a manual merge).
The "rebase" operation allows an app with a large number (say, 100) of historical migrations to get a new initial migration added at point 100 - in a way where old installations that are still below the new migration continue to run the old migrations, but any new installation just comes in straight away at migration 100 and runs the initial migration (and then perhaps continues up to 101, 102, etc.).
Note
Since publication, and some suggestions, I've settled on "squash" for the name of this command.
Confusingly, the VCS-merge-automatic-inlining mechanism I outlined above is analogous to what git rebase does, while the rebase command does nothing like it. It's probably worth thinking of a better name for "adding a new initial migration to make tests and new installs faster" - suggestions welcome to @andrewgodwin!
Work on this is going on right now - I've taken a break from it to write this diary - and so next time we'll revisit it and see how it progressed, and if any problems appeared (I'm sure some will).
Also, I'll be giving a talk at DjangoCon EU next week titled "Migrating The Future", with all this kind of detail and more - I hope to see some of you there!
Meet DjangoCon Sprints sponsor!
By DjangoCon Europe from Planet Django. Published on May 08, 2013.

Who doesn’t know yet that we’re doing DjangoCon Europe in a circus tent? But circus is no place for DjangoCon Sprints! We were looking for the kind of space that will accommodate 150 people and at the same time would be located in the heart of Warsaw (it will be weekend so it’s better to be closer to the epicenter of parties, right?).
Thankfully, we found the perfect venue! It’s called GammaFactory - a startup hub/co-working center!! It is based in the.. cheese factory :) Ok, it was a cheese factory, but now it’s a place with a vibrant community around it. If you happen to be a rock climber - the biggest bouldering center in the centre of Warsaw is next door to GammaFactory! So don’t forget your climbing shoes!
HardGAMMA Ventures, which owns GammaFactory, is one of the leading early stage VC funds in Poland (
Visit their site for more info! http://www.hardgamma.com/
They also run a startup accelerator program for technology entrepreneurs called GammaRebels. It focuses on accelerating and developing startups through mentoring, advising and sharing both business & technical knowledge.
See you soon! :)
Meet our Platinum Sponsor: New Relic!
By DjangoCon Europe from Planet Django. Published on May 07, 2013.

Let me introduce you to New Relic, our Platinum Sponsor.
New Relic is application performance management company. They offer web and mobile application monitoring with support for many languages, including Python, of course, as well as Ruby, PHP, .NET, Java, Android and iOS.
With their great product they make life easier for more than forty thousands clients monitoring staggering number of 1.4 million of application instances.
Using New Relic you can easily dive into your application performance breakdown and see what parts are slow. It’s very easy to use and you can install it in matter of minutes, as we did for our conference website!
If that sounds interesting, be sure to check out Amjith’s talk during DjangoCon!
Be sure to check out their website at http://newrelic.com/
Making Django 1.5 compatible with django-bcrypt
By David Cramer from Planet Django. Published on May 07, 2013.
Last night I took the opportunity to upgrade all of getsentry.com to Django 1.5. While most things were fairly trivial to sort out, we hit one less obvious (and pretty critical) bug during the migration surrounding django-bcrypt. This bug would only present itself if you’ve transitioned from …In defense of <canvas>
By Adrian Holovaty from Planet Django. Published on May 06, 2013.
My friend and fellow Chicagoan Evan Miller wrote an excellent blog post over the weekend: Why I Develop For The Mac. It's full of great reasons why his software (which is also excellent, by the way) was written for the desktop, despite the fact that he's a web developer, even the creator of an Erlang web framework.
But I'm compelled to respond to it, specifically his statements about <canvas>:
large <canvas> areas seem laggy on most browsers
So I'm left with <canvas>, and <canvas> is slow.
I have become intimately familiar with <canvas> while developing Soundslice. I'd even venture to say Soundslice is one of the most advanced uses of <canvas> on the web that's not a tech demo -- i.e., it's an application that normal people use. The site uses not one, but nine <canvas> elements stacked on top of each other to make a very rich UI, sort of like Photoshop for guitar tabs. (For a flashy demo of how those canvases interact, watch the tech talk I gave at 37signals, specifically the bit starting at 10:20.)
Here's what I've learned: <canvas> is not slow. In fact, I've been continually surprised by how fast it is -- as long as you take care to do things right. Evan's article mentions the "magical" sensation of instantaneous feedback; I invite you to play with the zoom slider on any Soundslice page (example) to experience this same magic, all drawn dynamically with <canvas>.
Of course, <canvas> is certainly not as fast as the lower-level drawing routines that you can use if you develop a desktop app. No question. But it's fast enough that, unless you're doing something relatively insane, you'll be totally fine.
On Soundslice, we're drawing guitar-chord charts completely on the fly (again, see an example), which is a relatively involved drawing routine -- and it's still near-instant performance. That's across all modern browsers (Chrome, Safari, Firefox and IE 10).
Here are some specific tips I've picked up to make <canvas> performance really shine.
Use requestAnimationFrame
Above all else, do this.
It's a JavaScript API designed to fix a very specific problem: your computer screen can only be redrawn a certain number of times per second (the "refresh rate"), so any calculations that redraw more often than your refresh rate are wasteful.
For example, say you have an event such as mousemove that results in a <canvas> redraw. A mousemove might happen hundreds of times per second, but your screen might only refresh, say, 75 times per second (75 Hz). That means, if your code is naively written, it will try to redraw several times within each actual opportunity to redraw (hundreds of times per second vs. 75 actual redraw opportunities per second).
The requestAnimationFrame API solves this by letting you say, "Execute this code the next time a redraw happens." Which saves your browser from having to do unnecessary work.
When I added this to Soundslice, the site became dramatically faster and more responsive. Here's more info about how to use the API.
Stack canvases
Above, I linked to a video of a tech talk I gave about Soundslice. In that talk, I demonstrated how Soundslice uses several different <canvas> elements, stacked on top of each other as layers, for maximum performance -- and for nice, clean code. Definitely watch the demo at around 10:20 in the video to get a sense of it.
I'm planning to write a separate blog post about this, but the Cliff's Notes version is that you can stack transparent <canvas> elements on top of each other so that you only have to redraw the ones that need to change.
For example, on Soundslice, there's a separate <canvas> for the playhead -- the vertical orange line that tracks the currently played moment of the video. That's a separate <canvas> with a z-index above the other ones, so that redrawing it doesn't require redrawing any of the other stuff. The less you have to redraw, the better.
Bunch calls to fillStyle
When you draw on <canvas>, you first have to tell it which color you're using. You can do that by setting the "fillStyle." It turns out that, each time you change the fillStyle, there's a slight performance penalty. Therefore, you can squeeze out some extra performance by bunching your calls to fillStyle -- that is, rather than drawing a gray thing, then an orange thing, then a gray thing again, you should draw all the gray things, then draw all the orange things.
For example, Soundslice, which is all about annotating YouTube videos, needs to draw dozens, sometimes hundreds, of annotations on the screen at a time. Each annotation might use several different colors -- the text color, the border color, the line color, etc.
My original implementation looped over each annotation and drew each one independently, which resulted in two to five fillStyle calls for each annotation. I changed this to bunch the fillStyles across all annotations -- so that all of the light grays were drawn at the same time, then all the dark grays, etc. -- and the drawing got a few dozen milliseconds faster.
For more background, see the "Avoid unnecessary canvas state changes" section in this great HTML5 Rocks article.
Cache text rendering
In profiling, I've found that rendering text on <canvas> is my next big rendering-related bottleneck on Soundslice. I haven't done this yet, but I'm planning to come up with a way of caching the results of fillText, possibly using this technique.
Final thoughts
A decent argument in Evan's favor is: "Well, if <canvas> is only fast if you use these various hacks, it's not really fast, then, is it?"
Two thoughts on that.
First, well, sure! I'd love it if <canvas> was super fast right out of the box, without needing to use these techniques. No doubt about it. But the reality is, it is fast enough, if you put in the work.
Second, there's the bigger question -- a defining question for the current generation of web developers -- which is: web or native app? I am squarely in the web camp, both for philosophical reasons (such as openness) and practical reasons (such as the fact that Soundslice has only one developer and one designer, and we can't justify building separate apps for separate platforms).
What I love about <canvas> is that it lets us make desktop-quality apps right in the browser, so we can get the benefits of being "of the web" along with the benefits of amazing, fast graphics. Fear not, my friends: <canvas> is great.
UPDATE, May 7, 2013: Evan has posted a thoughtful follow up, reacting to this.
Sendgird Party!
By DjangoCon Europe from Planet Django. Published on May 06, 2013.
Guess who is throwing the best party of 2013?:)
We’re super excited to announce that SendGrid is a part of our DjangoCircus family!! :) Moreover, they were one of the first companies to back us making all this possible! Kudos to all Sendgriders, especially to Swift :)

We’re planning BBQ with Polish KIEŁBASA ;) a lot of great beer and.. the DjangoCon FlipCup tournament! The party will take place in the circus and hopefully outside - if the weather is good. Join us on Friday night at 7pm.
SendGrid is the leader in email deliverability. SendGrid’s cloud-based platform increases email deliverability, provides actionable insight and scales to meet any volume of email, relieving businesses of the cost and complexity of maintaining custom email infrastructures.
For more information, visit www.sendgrid.com.
See you at SendGird Party!
Einladung zur Django-UserGroup Hamburg am 08. Mai
By Arne Brodowski from Planet Django. Published on May 05, 2013.
Das nächste Treffen der Django-UserGroup Hamburg findet am Mittwoch, den 08.05.2013 um 19:30 statt. Dieses Mal treffen wir uns wieder in den Räumen der intosite GmbH im Poßmoorweg 1 (3.OG) in 22301 Hamburg.
Die Organisation der Django-UserGroup Hamburg findet ab jetzt über Meetup statt. Um automatisch über zukünftige Treffen informiert zu werden, werdet bitte Mitglied in unserer Meetup-Gruppe: http://www.meetup.com/django-hh
Da wir in den Räumlichkeiten einen Beamer zur Verfügung haben hat jeder Teilnehmer die Möglichkeit einen kurzen Vortrag (Format: Lightning Talks oder etwas länger) zu halten. Konkrete Vorträge ergeben sich erfahrungsgemäß vor Ort.
Eingeladen ist wie immer jeder der Interesse hat sich mit anderen Djangonauten auszutauschen. Eine Anmeldung ist nicht erforderlich, hilft aber bei der Planung.
Weitere Informationen über die UserGroup gibt es auf unserer Webseite www.dughh.de.
Warsaw Survival Guide
By DjangoCon Europe from Planet Django. Published on May 05, 2013.
DjangoCon is in a less than two weeks. Are you already excited? We definitely are!
Since a lot of you will be in Poland for a first time in their life, we prepared some tips for you. We really want to make sure you will enjoy being in Warsaw this year!
We already covered some basics here: http://djangocircus.com/getaround/ (taxis, buses, trains, sim cards, internet).
Here’s some more.
Currency
Polish currency is PLN (złoty). 1PLN is around ~ €0.24 or ~$0.32. You will find a lot of ATMs (bankomat) in Warsaw, so you don’t need to have a lot of cash with you. It is possible to pay with debit or credit cards in most of the shops (MasterCard, Visa, Visa Electron, Maestro, PolCard etc. are widely used), but there are places where only cash is accepted or it is possible to pay with card when you spend more than 10 or 20PLN.
You can also exchange some money in a bank or in a kantor (you will find a lot of kantors in the city center). If it’s possible ask for banknotes with lower denomination (10, 20, 50).
Power plugs
In Poland we use Type E power plugs. If you use different kind of power plug, remember to buy an adapter. We have also an extra tip for you: if you bring a power strip you’ll be able to charge a lot of your devices using only one adapter :).
Tap water
If you ask Poles if Polish tap water is good enough to drink, they will tell you that you shouldn’t do it. However it’s no longer true. Water in Warsaw (we cannot assure it is a case in other places in Poland) according to newest water tests is safe to drink. It is probably not the tastiest water in the world though, so drinking still water which you can buy in many shops around Warsaw is a better option in our opinion.
Food
Make sure to try a little bit of Polish cusine while being in Warsaw. Some of Polish specialities are: bigos, pierogi, gołąbki, barszcz, żurek, zupa ogórkowa and many, many other.
Language
Young people speaks English quite good in Warsaw. People 50+ may have some troubles with understaning English, but it’s not a rule - in this case try German or Russian if you know one of those languages. In most of restaurants and pubs (and in many shops) people speak English, so we are sure you’ll be fine.
If you want to know how to pronounce Polish words (for example “Służewiec”) here are some useful pages you can read: Polish alphabet, Polish phonology :).
Safety
Be sure to watch your belongings while you are in a crowded place. In many places there are pitpockets. Loosing your wallet and documents won’t be a perfect memory from Poland.
New Committers for Tastypie & Haystack
By Daniel Lindsley from Planet Django. Published on May 04, 2013.
New Committers for Tastypie & HaystackDynamic Fixtures
By Inka Labs from Planet Django. Published on May 03, 2013.
The django unit test framework is great, it allows you to test everything you code in python. The "TestCase" class also supports fixtures. Most people writes fixtures as XML or JSON and loads them in every test. This can make running the test suite a very very slow process. You shuold consider that every time you run a test, the fixtures are loaded in the database, then they are deleted and loaded again by the next test.
We had a simmilar problem some weeks ago. We had all our fixtures in json files (some of them were huge). So we decided to avoid those fixtures. That is why we decided to create our objects (per test) in pure python. Something like:
obj = SomeModel.objects.create(
attr1=1,
attr2=2,....)
related = OtherModel.objects.create(
fk=obj,
other_field=...)
#use related and obj
But, let's be reallistic, this is very boring and tedious. We don't want to spend soo many time creating database objects. Also consider you add a required field in a model, then you should go to all your tests using that model and add a value for that field, making your tests difficult to maintain.
Here is where django dynamic fixture comes to the rescue. This great application allows you to create python database objects very very easy.
# create a database object with random data
obj = G(SomeModel)
# Give some field values
# and fill the remaining fields with random data
obj = G(SomeModel, attr1="field1", attr2="Field2")
# ignoring fields
obj = G(SomeModel, ignore_fields=['field1'])
This is just a very small example of all features of this great application. If you want your tests to run fast and be maintainable you shall give it a try. Check the docs here
Tools we used to write Two Scoops of Django
By Daniel-Greenfeld from Planet Django. Published on May 03, 2013.
Because of the ubiquitousness of reStructuredText in the lives of Python developers and the advocacy of it, it's not uncommon for people to assume we used it to write our book. However, that's not really the case.
The short Answer is we used:
- reStructuredText (RST)
- Google Documents
- Apple Pages
- LaTeX
The long answer is the rest of this posting. Since writing the book was broken up into three major stages 'alpha', 'beta', and 'final', so have I broken up blog article.
Alpha Days
Some of the original alpha material was written in rough draft form as RST since it was what we were used to using. Unfortunately, the PDF generation wasn't to our liking, so we immediately began looking at other options. Since she enjoyed using it at MIT and because it gave us greater individual control, Audrey wanted to switch to LaTeX. I was worried about the challenges of learning LaTeX, so we compromised and moved to Google Documents.
For the most part, Google Documents was great in the early stages. The real-time collaborative nature was handy, but the gem was the comment system. It gave us the ability to have line-by-line written dialogues with our technical reviewers. However, Google Documents makes it nigh-impossible to use WYSIWYG editor styles, add in better print fonts, forced us to cut-and-paste code examples, and finally the PDF export system was flakey on our massive document.
Our original thought was to convert the Google Document output to PDF and then modify it with Adobe InDesign. Upon trying it, we found it had a lackluster user interface that had a steep learning curve and was prohibitively expensive ($550-$700). Our friend and reviewer, Kenneth Love of Getting Started with Django fame, offered to do the conversion work, but we wanted to be able to update our work at will. Awesome as Kenneth might be, we couldn't expect him to drop what he was doing to update the final output of our work whenever we wanted.
Therefore, what we did in the week of January 10th-16th was convert the book to Apple Pages, which is the word processor in Apple iWorks. This was as painful as it sounds. We also discovered the day before launch that Apple Pages doesn't create a sidebar PDF table of contents, which a lot of people enjoy (including ourselves). Tired and exhausted from weeks of 16 hour days, we launched anyway on January 17th with the book weighing in at 5.1 MB.
Beta Experiences
People were so positive it really gave us a boost. Hundreds of people sent us feedback and we were delighted beyond words, with a significant portion sending us commentary/corrections about our writing and code. I'll admin did get tired over a certain 'moat' mistake since I got corrected on it over 50 times. However, the number of code corrections we were getting was higher than expected. It was clear we needed to be able to import the code modules from testable chunks of real code. We had so many kindle/epub requests we also needed the ability to render the text attractively across multiple formats.
After stumbling through RST, Google Documents, and Apple Pages different tools, I finally agreed with Audrey that the challenges of learning LaTeX was worth it. While we could have used RST, we would have had to use LaTeX anyway for our customizations since when RST is converted to PDF it actually uses an interim step of LaTeX!
So while I handled the corrections and feedback from thousands, Audrey built the fundamentals of the LaTeX file structure. Audrey really got her hands dirty by teaching me LaTeX, since my brain is slow and thick. Here's a sample of what I've learned how to do, taken from Chapter 6, Section 1, Subsection 5 (6.1.5):
\subsection{Model Inheritance in Practice: The TimeStampedModel}
It's very common in Django projects to include a \inlinecode{created} and \inlinecode{modified} timestamp field on all your models. We could manually add those fields to each and every model, but that's a lot of work and adds the risk of human error. A better solution is to write a \inlinecode{TimeStampedModel} \index{TimeStampedModel} to do the work for us:
\goodcodefile{chapter_06/myapp/core/timestampedmodel.py}
Take careful note of the very last two lines in the example, which turn our example into an abstract base class: \index{abstract base classes}
\goodcodefile{chapter_06/myapp/core/class_meta.py}
By defining \inlinecode{TimeStampedModel} as an abstract base class \index{abstract base classes} when we define a new class that inherits from it, Django doesn't create a \inlinecode{model\_utils.time\_stamped\_model} table when syncdb is run.
Once I got the hang of LaTeX, then began the hard work of converting the book's current content from Apple Pages That was a couple weeks of grueling effort on my part. Daily I would request a new LaTeX customizations, which Audrey would address. However, as she was working on literally rewriting the content of a dozen chapters including templates, testing, admin, and logging my interruptions became an issue. So we enlisted the help of Italian economist and LaTeX expert Laura Gelsomino. Thanks to her the desired text formatting was achieved.
During the conversion process we also rewrote every single code example, putting them into easily testable projects, and pull them into via use of custom LaTeX commands called \goodcodefile{} and \badcodefile{}.
Eventually I joined Audrey on rewriting and reviewing chapters and on February 28th, the beta was launched. LaTeX generates lean PDFs so the book came in at just 1.6 MB while adding a whopping 50 pages (25% more) of content.
Final Efforts
The final effort was focused on cleanup, new formats, presentation, and art.
For cleanup, our amazing readers gave us so much feedback we could barely keep up. We fought to keep our dialogues with them personal yet brief. With reader oversight we corrected many of the 'quirks' of my writing style (Audrey is a stickler for Strunk and White, I am not). We also made numerous corrections based on feedback and our own observations.
With the guidance of fellow Python author Matt Harrison I wrote scripts that took the archaic HTML generated by LaTeX module tex4ht and rendered it into something that Kindlegen could use to generate Kindle .mobi files. At first the results looked awesome on modern kindles and other new ebook readers, but was terrible on older devices. So I toned back the fancy stuff to what you see today. Getting technical books to look nice on all readers is really, really hard - and unfortunately some publishers take shortcuts that hurt the efforts of the authors. If you have problem with an e-book's format, please consider that before writing a negative review about the final output.
Speaking of mobile editions, we also wrote a second version of each Python example to deal with the smaller format. While libraries exist to do the work for you, since I did a lot of it from scratch (albeit coached by Matt) I had to dig into the lackluster .mobi/.epub documentation to figure out things like .ncx files.
note: If you want to be the self-published author of a technical book I strongly recommend you read Matt's Ebook Formatting: KF8, Mobi & EPUB. Also check out his rst2epub2 library for converting RST files to various formats.
While I worked on the mobile editions, Audrey focused on the print version and adding more art and tiny bit of new content. She focused on clarity and flow, and the result is that the book feels even lighter to read and yet is dense with useful information. To test how the book launched, she would order a copy from the printer and wait several days for it to arrive. Then she would inspect the cover and interior with her incredibly exacting eye. It's a slow process, but Audrey wanted to make absolutely certain our readers would enjoy and use the print edition.
On April 10th we launched the final in PDF, Kindle, and ePub form. The PDF weighs in at 2.7 MB, and the Kindle file is a bit heaver. At some point we'll do the work to reduce file size, but for now we're working on other things.
A week later we announced the launch of the print version of the book. People seem to really like the design and feel of the physical book, and we've even had requests for t-shirts.
Thoughts
Writing a technical book was really hard. Crazy hard. Also very satisfying. We could have made more money doing just client work, but this was a dream come true. Sometimes money doesn't matter.
Whither Two Scoops of Django?
Two Scoops of Django: Best Practices for Django 1.5 will still receive periodic corrections, but won't see new content unless it's security related for Django 1.5. Don't worry though, for when Django 1.6 comes nigh, we'll commence work on Two Scoops of Django: Best Practices for Django 1.6 (TSD 1.6). The plan is to update practices as needed and hopefully add more content on testing, logging, continuous integration, and more. Like it's predecessor TSD 1.6 will be written using LaTeX.
That said, if I ever fulfill my dream of writing fiction I'll just use Matt Harrison's rst2epub2 library.
Concerns about Open Sourcing
We've considered open sourcing our current book generation system, but installation is rather challenging and requires serious Audrey/Laura-level LaTeX knowledge combined with my experience with Python. Unfortunately, from our experience on managing other open source projects, dealing with requests for documentation and assistance would take up a prohibitive amount of our time. Honestly, we would rather write another book or sling code.
Book Generation as a Service?
Another option is turning our system into a service, which would convert existing RST or even MarkDown to LaTeX so it could generate books in the Two Scoops format. Doing this would require at least a month of full-time work on both of our parts, and we have no idea as to the interest level. We think it would be a low amount of interest, but then again, hasn't leanpub done pretty well using this model of business?
In any case we're working on other projects. Maybe even a new technical book...
Python Advent Calendar 2012 Topic
By chrism from plope. Published on Dec 24, 2012.
An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/Mom Goes Nuts
By chrism from plope. Published on Jun 04, 2012.
My mom has gone a little loopy.Why I Like ZODB
By chrism from plope. Published on May 15, 2012.
Why I like ZODB better than other persistence systems for writing real-world web applications.A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code
By chrism from plope. Published on Mar 03, 2012.
A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.In Praise of Complaining
By chrism from plope. Published on Jan 01, 2012.
In praise of complaining, even when the complaints are absurd.2012 Python Meme
By chrism from plope. Published on Dec 24, 2011.
My "Python meme" replies.In Defense of Zope Libraries
By chrism from plope. Published on Dec 19, 2011.
A much too long defense of Pyramid's use of Zope libraries.Plone Conference 2011 Pyramid Sprint
By chrism from plope. Published on Nov 10, 2011.
An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.Jobs-Ification of Software Development
By chrism from plope. Published on Oct 17, 2011.
Try not to Jobs-ify the task of software development.WebOb Now on Python 3
By chrism from plope. Published on Oct 15, 2011.
Report about porting to Python 3.Open Source Project Maintainer Sarcastic Response Cheat Sheet
By chrism from plope. Published on Jun 12, 2011.
Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!Pylons Miniconference #0 Wrapup
By chrism from plope. Published on May 04, 2011.
Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.Pylons Project Meetup / Minicon
By chrism from plope. Published on Apr 14, 2011.
In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.PyCon 2011 Report
By chrism from plope. Published on Mar 19, 2011.
My personal PyCon 2011 ReportFLOSS Weekly Interviews the Pylons Project
By chrism from plope. Published on Feb 02, 2011.
FLOSS Weekly (Free, Libre, Open Source Software) interviews Mark Ramm and I about the Pylons Project and the Pyramid web framework.







