[tahoe-dev] [tahoe-lafs] #731: what to do with filenames that are illegal on some systems

tahoe-lafs trac at allmydata.org
Sun Jun 14 09:49:38 PDT 2009


#731: what to do with filenames that are illegal on some systems
-----------------------------------+----------------------------------------
 Reporter:  zooko                  |           Owner:       
     Type:  defect                 |          Status:  new  
 Priority:  major                  |       Milestone:  1.5.0
Component:  code-dirnodes          |         Version:  1.4.1
 Keywords:  forward-compatibility  |   Launchpad_bug:       
-----------------------------------+----------------------------------------

Comment(by swillden):

 Replying to [comment:3 bewst]:
 > It seems to me that tahoe probably has enough flexibility to store
 ''any'' filename, and many
 > people will only be using it to store and retrieve files to/from the
 same system, so it should
 > "just work" for that use case.

 This is my thought as well, at least for backup use cases.  Tahoe in
 general has a broader usage model, and so solutions appropriate for backup
 may not be adequate for those other use cases, but for backups, I think
 the top priority is ensuring that backups succeed reliably and don't lose
 any data -- including file name data.

 That's why the approach I've chosen for GridBackup (which, BTW, is finally
 starting to write to a grid, Yay!) is to make sure that:

 1.  ALL names can be backed up, regardless of whether or not they make any
 sense on any filesystem in existence.

 2.  When restoring to a system that uses the same encoding as the backup
 source, all names are restored byte-for-byte identically to what was read
 from the file system during backup.

 3.  When restoring to a system that uses a different encoding, I try to
 transcode the names but just error out if it doesn't work.  Eventually my
 plan is to give the user a list of paths that broke and let them decide
 what to name each of them, with some suggestions based on attempts to
 decode the name with all Python-supported codecs.

 During a restore, there's room for human intervention to address naming
 problems, but during backup, I just want to get the data.  I'm taking a
 similar approach to other metadata.  Extended attributes, ACLs, resource
 forks, even POSIX permissions -- there are destination systems to which
 none of these things will make sense, but that's okay.  The backup will
 grab everything and we can deal with how to make use of the data, if
 possible, during restore.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/731#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list