{"id":54,"date":"2009-11-03T13:10:30","date_gmt":"2009-11-03T13:10:30","guid":{"rendered":"http:\/\/www.eriugena.org\/blog\/?p=54"},"modified":"2009-11-03T13:17:29","modified_gmt":"2009-11-03T13:17:29","slug":"using-rdiff-to-optimise-upload-of-similar-files","status":"publish","type":"post","link":"http:\/\/www.eriugena.org\/blog\/?p=54","title":{"rendered":"using rdiff to optimise upload of similar files"},"content":{"rendered":"<p>If you have several similar large files to upload over a slow link you can use &#8216;rdiff&#8217; to optimise the transfer. &#8216;rdiff&#8217; compares files block by block and produces a delta file that contains only the blocks that differ.<br \/>\nIt is therefore of most use when the two files are mostly similar but some parts different such as powerpoint product presentations\u00c2\u00a0 targeted for two different customers.<\/p>\n<p>In this example we have two similar but different documents:<\/p>\n<p>$ dir *.doc<\/p>\n<p>643,584 document-edition-two.doc<br \/>\n634,880 document-original.doc<\/p>\n<p>Generate an MD5 hash to be used later to verify the file integrity<br \/>\n$ md5 document-edition-two.doc<br \/>\n8280AEAFABC0833D5FEC64CE5FEF6237\u00c2\u00a0 document-edition-two.doc<\/p>\n<p>Prepare a &#8220;signature&#8221; file which contains hash codes of each block in the base file.<\/p>\n<p>$ rdiff signature document-original.doc document.sig<\/p>\n<p>Next I use that signature file to see which blocks are different in the second file and extract them to a delta file.<\/p>\n<p>$ rdiff delta document.sig document-edition-two.doc document.delta<\/p>\n<p>$ dir document.delta<\/p>\n<p>78,504 document.delta<\/p>\n<p>Note that the &#8220;delta&#8221; file is only 12% of the size of &#8220;document-edition-two.doc&#8221;, the relative file size depends on how similar the two documents are.<\/p>\n<p>Now I upload the files &#8220;document-original.doc&#8221; and &#8220;document.delta&#8221;<\/p>\n<p>On the server I (or the recipients of the files) run &#8216;rdiff&#8217; to generate the second document from the first and the delta.<br \/>\n$ rdiff patch document-original.doc document.delta document-edition-two-reconstructed.doc<\/p>\n<p>Check the MD5 hash to confirm that the second document has been faithfully reproduced.<\/p>\n<p>$<br \/>\n$ md5 document-edition-two-reconstructed.doc<br \/>\n8280AEAFABC0833D5FEC64CE5FEF6237\u00c2\u00a0 document-edition-two-reconstructed.doc<\/p>\n<p>Download <a href=\"http:\/\/www.eriugena.org\/blog\/wp-content\/uploads\/2009\/11\/rdiff.zip\" title=\"rdiff\">rdiff<\/a> for Windows, compiled with Cygwin\u00c2\u00a0\u00c2\u00a0<a href=\"http:\/\/www.eriugena.org\/blog\/wp-content\/uploads\/2009\/11\/rdiff.zip\" title=\"rdiff\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you have several similar large files to upload over a slow link you can use &#8216;rdiff&#8217; to optimise the transfer. &#8216;rdiff&#8217; compares files block by block and produces a delta file that contains only the blocks that differ. It is therefore of most use when the two files are mostly similar but some parts [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-54","post","type-post","status-publish","format-standard","hentry","category-tools"],"_links":{"self":[{"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/54","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=54"}],"version-history":[{"count":0,"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/54\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=54"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=54"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.eriugena.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=54"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}