By convention, all times are stored in the database as UTC. ActiveRecord abstracts this by converting dates from UTC to Time.zone when a timestamp is read from the database and doing the opposite when serializing times. That is, if you don’t forget to annotate each Time object that comes into the system with the correct zone.
The local time zone is the one you currently happen to live in, it should not matter at all in the application. Yet, by default, Time objects in ruby use the local time zone. Time.now returns the current local time, which is completely useless and even harmful in development.
I live in Belgium (CET), most of our clients do too. For years, I have been testing with Time.zone = ‘Brussels’. From now on, my testing Time.zone becomes ‘Seoul’.
The choice of a testing time zone is somewhat arbitrary, but it should not be equal to the local time zone. When UTC, Time.zone and the local time zone are different, it is easier to spot a bug where local time is used instead of the current Time.zone.
I also think it is important the time zone offset is far away from my local time zone, it makes me think more about time zones instead of just assuming that ActiveRecord will do its job.
]]>I opened Powermail and discovered that it did not receive my new iCloud mail. Apparently, iCloud does not provide POP access anymore. So I had two options:
Naturally, I chose the second option and devised a cunning plan.
The idea is to periodically fetch emails from the iCloud imap server and store them in a local inbox, which can be accessed over POP using a local mail server. Software of choice: getmail for fetching, cron for periodical execution, dovecot as mail server.
In the remainder of this article, I briefly explain how to configure this. Familiarity with unix and the terminal is required.
It is not my intention to describe the software installation in detail, as this may depend on your specific system and preference. I installed getmail as described on their website and dovecot using macports. Cron is part of Mac OS X and any half-decent unix distribution.
The configuration file for getmail is .getmail/getmailrc in your home directory. Replace the bold words with your settings.
[retriever] type=SimpleIMAPSSLRetriever server=imap.mail.me.com port=993 username=user@me.com password=password mailboxes=("INBOX",) [destination] type=Mboxrd path=/Users/username/.getmail/mbox [options] read_all=false
Once this configuration file is created, we can run the getmail command to check if it’s working. (This is a test run on my system, your output may vary.)
$ getmail getmail version 4.25.0 Copyright (C) 1998-2009 Charles Cazabon. Licensed under the GNU GPL version 2. SimpleIMAPSSLRetriever:user@me.com@imap.mail.me.com:993: 0 messages (0 bytes) retrieved, 168 skipped
To run the getmail command every 10 minutes, we need to install a cron job. Run crontab -e in the terminal and add the following line.
*/10 * * * * /usr/local/bin/getmail > /dev/null 2>&1
The dovecot configuration file resides in /opt/local/etc/dovecot/dovecot.conf when installed through macports. Mine looks as follows:
protocols = pop3 disable_plaintext_auth = yes ssl = no mail_location = mbox:/Users/username/.getmail:INBOX=/Users/username/.getmail/mbox protocol pop3 { listen = 127.0.0.1:11000 } auth default { mechanisms = plain passdb passwd-file { args = /Users/username/.getmail/dovecot-passwd } passdb pam { } userdb passwd { } user = root }
Dovecot also needs a password file, we configured it to be located in .getmail/dovecot-passwd in your home directory. You can choose the local password, it is the password local clients need to use to access your local mailbox.
username:{plain}local-password::::/Users/username/.getmail::userdb_mail=mbox:~/mbox
Dovecot should be restarted after configuration.
You can use your favorite POP3 email client now to access your local inbox.
function fail() { var p = {} var c = p.c = []; c.push({r: (c = [])}) c.push('lala') alert(JSON.stringify(p.c)); // should be [{"r":["lala"]}], but is [] in IE9 } fail()
Looks like a bug to me. Just putting it out here, since I did not find a place on the Microsoft site to report bugs for IE9.
]]>An RCS is built for infinite data retention, as writing source code is expensive and storage is cheap. Backups lose their value with age, nobody cares about the daily backups of 5 years ago.
An RCS is built for tracking a large number of small, interdependent files. Backup files are large and backups from multiple applications are unrelated. Git does not handle large files well and repositories can become unusably slow or use an insane amount of memory when pushing, pulling or checking out.
When a RCS repository gets corrupted, chances are that all backups stored inside become inaccessible. Svn stores revisions as deltas relative to the previous revision. When one delta file becomes unreadable, future revisions are affected. Git does not just store the deltas and is more defensive against corruption with built-in SHA-1 hashing. Furthermore, git repositories can be easily replicated with full history so the chances of corruption are slim. But even with the best RCS tools, there is an extra non-trivial layer between the filesystem and the data, and this layer is a liability.
Every machine that uses backups requires the RCS tool to be installed. This is only a minor inconvenience, but why use yet another tool when the standard unix tools work just fine?
My advice for storing database backups is simple: create timestamped sql dumps periodically and compress them with bzip2. Keep them around as long as your data retention policy requires it or until you run out of space, which will be a long, long time in an era where hard disk space is measured in terabytes.
]]>SELECT companies.id, count(people.id) FROM companies LEFT JOIN people on companies.id = people.company_id GROUP BY companies.id ORDER BY companies.name
Explain revealed “Using index; Using temporary”. Turns out that mysql can only use a single index for grouping and sorting. When sorting and grouping on different columns, a temporary table needs to be created and sorted after grouping.
The solution? GROUP BY companies.name, companies.id. The query now takes under 10 ms.
]]>A logged in user is a prerequisite for many integration tests. Performing this step in the browser requires going to the login page, filling out the form, submitting and waiting for the initial page. Testing the login procedure is of course crucial, but it should not be tested a thousand times.
When using the cookie session store in rails, a logged in user will have a signed cookie containing his user id. On every page load, this cookie is sent back to the web application, where the signature is verified and the user is assumed to be logged in. Integration tests can be optimized by storing the contents of that cookie, and putting it back in the browser when a logged in user is required.
We implemented this optimization in our cucumber integration tests which use selenium. The code is below, hope it helps.
]]>Given /^I am authenticated$/ do u = Factory(:user, :id => 1, :login => "uname", :password => "pass") c_name = Rails.configuration.action_controller.session[:key] if cookie = Thread.current[:selenium_cookie] selenium.create_cookie("#{c_name}=#{cookie}", :path => '/') else visit "/" selenium.wait_for_page 5 fill_in("login", :with => "uname") fill_in("password", :with => "pass") click_button("Inloggen") selenium.wait_for_page 5 response.should contain("Uitloggen") Thread.current[:selenium_cookie] = selenium.cookie(c_name) end end
The database toolkit Sequel can use JDBC data sources, but only when running on JRuby. Although JRuby is compatible with ruby 1.8.7, not every application can be run on it, especially when it depends on gems that define C extensions.
Fortunately, distributed ruby exists. It allows a server to expose an object which its clients can use like any local ruby object. A JRuby server can expose the Sequel object, which other ruby clients can use to access JDBC data sources.
Both server-side and client-side code is pretty straightforward. The client should also depend on the sequel gem, since only objects and not their class definitions can be marshaled over distributed ruby.
# server side require 'drb' require 'java' require 'rubygems' require 'sequel' DRb.start_service 'druby://localhost:20000', Sequel # It might be needed to instantiate the driver here, # so it is available when a connection string is given. DRb.thread.join
# client side require 'drb' require 'rubygems' require 'sequel' DRb.start_service sequel = DRbObject.new nil, 'druby://localhost:20000' db = sequel.connect("jdbc connection string here") db.from('table').each {|r| puts r[:id]} # play!
Inside the JDBC driver, native java exceptions can be raised. JRuby wraps these exceptions in a NativeException class, so ruby can rescue them and provide a stack trace. Distributed ruby provides stack traces for exceptions raised in a remote ruby, but it cannot handle NativeException because the class does not exist in MRI ruby. In short, when an exception is raised by java, the following cryptic error message will appear.
DRb::DRbUnknownError: NativeException
To fix this and get a full stack trace of NativeExceptions, the class needs to be defined in the client.
class ::NativeException < RuntimeError ; end]]>
There is one flaw: it depends on ActiveSupport 3 and has a hidden dependency on ActiveRecord 3 for the database drivers. Although the latter will be resolved by releasing the database drivers as a seperate gem, or inside arel itself, the ActiveSupport dependency prevents using arel in earlier versions of rails. Turns out this dependency is really artificial: arel plays nice with rails 2.3.4 and up. I forked the arel project and added some minor modifications so it can be installed and used in these older versions of rails.
# config/environment.rb config.gem :arel-compat, :lib => 'arel'
Now to integrate arel on a low-level with the models, add an initializer.
# config/initializers/arel_integration.rb class ActiveRecord::Base class << self delegate :[], :to => :arel_table def arel_table @arel_table ||= Arel::Table.new(table_name, :engine => arel_engine) end def arel_engine # Not correct when working with multiple connections. @@arel_engine ||= Arel::Sql::Engine.new(ActiveRecord::Base) end end end
After that the fun starts. Note that arel is low level: when executing an arel query, an arel result will be returned instead of model objects. However, it is easy to use the sql returned by arel to select model objects.
arel = Person.arel_table.where( Person[:first_name].matches("test%").and( Person[:last_name].eq(nil))) Person.find_by_sql(arel.to_sql)
One spec fails on the mysql driver: offset without a limit.
Person.arel_table.skip(10) => SELECT `people`.* FROM `people`
When a limit is specified, it will behave correctly.
Person.arel_table.skip(10).take(5) => SELECT `people`.* FROM `people` LIMIT 10, 5
Since offset is seldomly used without a limit, we did not bother to patch our arel fork.
With the rails ecosystem growing, dependencies become an important issue. A medium sized application can easily depend on 50 gems. Bundler solves the gem resolution problem so an application has a compatible set of gems. However, bundler can only resolve gems whose declared dependencies are compatible. When adding arel to a rails 2.3.4 project, bundler fails.
That is why the other problem with gem dependencies lies with the gem developers themselves. Arel should not depend on ActiveSupport, period. When presented as a “framework to build ORM frameworks”, it should not bring in a massive dependency that is incompatible with some environments.
]]>Since the cool kids are using it, we did not want to be left out and ported our main application from restful_authentication. Aside from being cool, the switch will make porting to rails 3 easier, the latest devise is compatible.
The application uses basic http authentication for private RSS feeds and ical subscriptions. This is pretty common at the service level of an application, machines do not like login forms. Devise works with basic authentication out of the box, but it will only work when the authentication headers are already present in the request. When they are not, devise will return a 302 redirect to the login form and the RSS reader gives up.
The solution is to create a new devise strategy in config/initializers/devise.rb
class HttpAuthenticatableNonHtml < Devise::Strategies::HttpAuthenticatable def valid? not request_format.html? or super end def http_authentication super or '' end end Warden::Strategies.add(:http_auth_non_html, HttpAuthenticatableNonHtml)
Warden needs to be instructed to use the strategy, inside the Devise.config block.
config.warden do |manager| manager.default_strategies.unshift :http_auth_non_html end
The strategy will return a 401 with authentication realm when accessing a protected resource that is not html.
Figuring out this solution required diving into the warden an devise code, which is quite intimidating at first. I created a diagram that hopefully makes it easier to understand the basic working of the authentication stack.
The core implementation problem was merging together the company records. Even though the client talked about removing duplicates, he did not want to lose any related information on a duplicate. We had a feeling the client would soon ask for doing the same with other record types and therefore decided to implement it generically. The code is on github.
Since a table row can only have 1 value for each column, some attributes of duplicate objects will need to be discarded. To control the attribute values on the merged object, the objects to merge need to be ordered. The first object gets priority, when it contains blank attributes, they can be looked up on the remaining objects in order.
Belongs to associations are backed with a foreign key attribute. When merging the attributes, belongs_to associations are already covered.
Suppose we are merging company A and company B and Company has_one :financial_info.
When merging company A with 2 phone numbers and company B with 1 phone number, the resulting company should have all 3 phone numbers. That is, if the phone number of company B is not already on company A as well. Associated objects should be compared and duplicates merged recursively. Comparison may differ for some models. Phone numbers are best compared by flattening them into a string with only numbers, this way the separators do not mess up the comparison.
These associations can be left alone, since they depend on another has_many association that can be merged.
company.merge!(duplicate1, duplicate2, ...)
The object on which merge is called becomes the master object, duplicates are merged into this object and destroyed afterwards. The order in which the duplicates are passed to the merge function matters, since this will determine the priority for merging attributes. When the attribute alpha_code is nil on the master, it will get the value on duplicate1. When not present on duplicate1, it will get the value of duplicate2, and so on.
merge_equal?(object)
Compares self with the object and returns true if they can be considered the same. When records with a has_many association are merged, associated objects are compared and duplicates are destroyed.
merge_attribute_names
The names of the attributes that should be merged. Defaults to all attributes minus id, timestamps and other meta data.
merge_exclude_associations
The names of associations that should not be merged. Can be used to exclude irrelevant or duplicate associations.
Currently, the merge algorithm does not take cycles in the associations into account. Since the reverse belongs_to associations are never considered, this should not be a problem for most ActiveRecord models. An infinite loop may occur when Company has_many :companies and a company points to itself.
To make sure the merge does not leave invalid foreign keys, referential integrity can be used on the database.
]]>