Tuesday, November 8, 2011

S3 and the 5GB limit!

s3cmd as of this point still does not support mutli-part uploads to break the 5GB barrier.

But http://sprightlysoft.com/S3Upload/ does!

Now the problem with that is the Etag that used to be the md5 is NOT the md5 anymore once you start using multi-part uploads. It looks like it, but it's not because it has a dash in it.

So a slight modification of s3cmd

@@ -633,7 +633,10 @@
  output(u"   File size: %s" % info['headers']['content-length'])
  output(u"   Last mod:  %s" % info['headers']['last-modified'])
  output(u"   MIME type: %s" % info['headers']['content-type'])
- output(u"   MD5 sum:   %s" % info['headers']['etag'].strip('"'))
+ if info["headers"].has_key("x-amz-meta-multipart-etag"):
+ output(u"   MD5 sum:   %s" % info['headers']['x-amz-meta-multipart-etag'])
+ else:
+ output(u"   MD5 sum:   %s" % info['headers']['etag'].strip('"'))
  else:
  info = s3.bucket_info(uri)
  output(u"%s (bucket):" % uri.uri())

and bam you're back in business with s3Upload and s3cmd

Executable version with Standalone Python 2.5 and cx_freeze : Here


Tuesday, November 1, 2011

New google reader interface

Are you kidding me? Why all the wasted space? It's TERRIBLE...

This would be so much better.

Friday, October 28, 2011

Thread management

Interesting article on the Bulldozer architecture today : http://techreport.com/articles.x/21865

So I thought, why not try this on an i7 as well.

So I tested it with the Euler3D BENCHMARK like they do with 4 different configurations.



Pretty interesting results.
8 cores is fastest, but not much faster due to diminishing returns probably from contention or the algorithm.
0F is slowest definitely because of contention.
55 affinity is faster than the windows scheduler. I noticed that not any one core every hit 100% while the windows scheduler was scheduling stuff, so the time loss could be from threads switching cores.

Tuesday, October 25, 2011

Arrrgh.. I always make my life difficult

A long long time ago, before subversion edge and wandisco ubersvn, there was a time when we had to set up our own svn servers.

I started with the standard apache distro, added SSL, then SVN on top of that. It took me a few days the first try but it was good.

Then eventually I got tired of it, and moved to XAMPP + SVN. And this was good for a long time.

And then.. XAMPP started using VC9 as their compiler, while win32svn stayed with VC6. And things were not GOOD.

So now I have to use XAMPP, download WANDISCO's vanilla svn, rip out their .so's and dump them into XAMPP. Now all is well.

Also XP seems to barf on SSL connections with the win32svn distro. So use WANDISCO's or sliksvn for your binaries if you're still on XP and need SSL.

Friday, October 7, 2011

wtMake

This is an internal tool  for configuration management that I decided would be great to put out in the world. I like build tools. Make, SCons, CMake, Qmake, ANT. I love what they do for you and what great programmers can do with them. I also like Visual Studio. A lot.

Most people tend to pick the build tool they want to use, then generate a vcproj or sln from them. This is great and all when you wanna target your software to build on almost any system with GCC, cl, xcode. Whatever.

For small projects it works great. When a project is large however, programmers like to put stuff in folders and get all organized. Then your configuration changes, and you need to regenerate that vcproj. BAM! It's all flat again. Unless I'm totally wrong and Qmake or Cmake can actually generate vcproj's with filters, I haven't found that documentation.

Truth be told, I work with several proprietary VS plugins for platforms that like to store settings and configurations within the vcproj. What happens when other build tools generate vcprojs? They kinda go missing as well. I could of course go figure out what they're adding and add all these options to the vcproj generators in the bu.... NOPE.

I'm not gonna go and remake all the settings that the fantastic engineers at Microsoft and Sony have done to make it easy for programmers to actually add their platforms as targets in Visual Studio. It's good work. Let's keep it! Programmers are accustomed to being able to look at the vcproj properties in VS and double click toggle boxes and drop down lists. This is good. Let's keep most of that! I'm also not going to make my own build tool from scratch either. Not when I want to make it easy for myself to let's say use incredibuild or another distributed build plugin for VS.

So how do I configure include directories and library directories and additional libs and all these fancy things in the vcproj in an easy and quick manner?! Your probably think this is an easy task, but it's kinda tedious when you have 4 libraries, 3 platforms, with 3 configurations each. That's... a lot of places to add things to. You could use VS property sheets. But, I actually found them kind of a pain to use... cause you're still buried in a ton of different VS menus, then you have to add it to a billion project configurations. Thanks, but no thanks.

The vcproj format is pretty simple and below is a very simplified version of the format. Basically each configuration in configuration element with a name as an attribute. Then every tool is in an element with its options are stored as attribute. In the example below, the VCCLCompilerTool stores preprocessor definitions in the PreprocessorDefinitions attribute. What wtMake does is actually looks through every tool element that has the right configuration and replaces the current attribute with what you set.

<Configuration Name="Release|Win32" >
	<Tool PreprocessorDefinitions="DEFINITION" Name="VCCLCompilerTool"/>
</Configuration>

So I wrote a little tool, that can modify almost any element in a vcproj's tool attribute to the things I need it to be. The syntax is simple and fast.

//comment
:rule
	#vsattribute
		< text you want to go in there.
	!forceattribute ? NameAttribute
		< text you want to go in there.

So let's say we wanted to add freetype to the vcproj. This is the set of rules that I would need to cover everything for freetype and PC.

:pc
	#PreprocessorDefinitions
		< WF_PC32
		< WIN32
		< _CRT_SECURE_NO_WARNINGS
	#AdditionalDependencies
		< Ws2_32.lib xinput.lib libpad.lib
	!AdditionalOptions ? VCCLCompilerTool | VCCLX360CompilerTool
		< /MP
	!MinimalRebuild ? VCCLCompilerTool | VCCLX360CompilerTool
		< false

// Freetype
:freetypeinclude
	#AdditionalIncludeDirectories
		< $(ProjectDir)..\..\..\3rdparty\freetype\include
:freetype360
	#AdditionalLibraryDirectories
		< 3rdparty\freetype\objs\xbox360\vc2008
:freetypepc
	#AdditionalLibraryDirectories
		< 3rdparty\freetype\objs\win32\vc2008
:freetypeps3
	#AdditionalLibraryDirectories
		< 3rdparty\freetype\objs\ps3\vc2008
:freetypedebug
	#AdditionalDependencies
		< freetype239MT_D.lib
:freetyperelease
	#AdditionalDependencies
		< freetype239MT.lib
:freetypedebugps3
	#AdditionalDependencies
		< freetype239MT_D.a
:freetypereleaseps3
	#AdditionalDependencies
		< freetype239MT.a

So now that's fine and dandy. How do we use the rules? Well then, now we have a project template. The format is something like this.

$templatename
	#all
		< rule
	#Configuration|Platform
		< rule

#all is a special tag so that it's added to all configuration.

Here is an example of just the pc portion of a game project template.

$mygame
	#all
		< bulletincludes
		< raknetcompiler
		< wfEngineSource
	#Debug|Win32
		< debug
		< pc
		< directxCompiler
		< directxdebug
		< freetypepc
		< freetypedebug
		< physxpcCompiler
		< physxpc
		< bulletpclibs
		< force
	#Release|Win32
		< release
		< pc
		< directxCompiler
		< directxrelease
		< freetypepc
		< freetyperelease
		< physxpcCompiler
		< physxpc
		< bulletpclibs
		< force
	#Final|Win32
		< final
		< pc
		< directxCompiler
		< directxrelease
		< freetypepc
		< freetyperelease
		< physxpcCompiler
		< physxpc
		< bulletpclibs
		< force

So how do you actually use the tool? Simple.

wtmake example.vcproj -t templatename

if you'd like to use it with the example above :

wtmake mygame.vcproj -t mygame

GOTCHAS!

The tool will completely replace all text in that attribute! So if you have a preexisting project you'd like to modify with it, make sure you have a backup in case you don't get all the configurations the first time!

The tool is modifies only elements that exist, if it doesn't find one. It doesn't add it. I wrote it this way to simplify the actual tool rule syntax. So you don't have to worry about the 100's of other Tool elements.

If you need to force an attribute, like for AdditionalOptions to add the all important /MP switch to cl. You can do that by using a ! for a forceattribute . It will create it if the element has an attribute called "name" that matches. This is the method I used to add /MP to only the VCCLCompiler tool and not to VCCustomBuildTool,VCMIDLTool and all the other ones.

If you are interested, you can download this tool from here. There is an example PC project with a bunch of libraries already preconfigured in the file called wtMakeConfig.

If you want source, please feel free to ask me. I may release it eventually after I clean it up of some dependencies of other libs. Released under zlib.

A perpetual SVN backup method to the cloud

I came up with this method a long time ago to provide redundancy and archival to the svn servers that I operate. I don't think I've found a similar solution online so I thought I would share because it works pretty well for me.

Basic Redundancy

The first step of course is to make sure that you provide a good amount of local redundancy. This is fairly standard procedure. RAID the drives the svn repo is on. Now some people might stop there and say, a mirror is good enough, like one company I remember that lost all it's data because someone disgruntled just did a DROP table in mySQL. For those of us who are more paranoid... we do not stop here.

The next step in my redundancy strategy for svn is to have a mirror server that uses svnsync to maintain a 2nd copy of the repo. If the main server fails, a simple DNS switch and everybody will be back in business. The mirror server runs on a perpetual loop, of syncing waiting a minute, then syncing again. I know that this is probably a little wasteful because it pings the main server so often, but in practice I've found it's much more stable than using post-commit-hooks to send a message to the mirror server. Some online examples of using svnsync tell you to use a post-commit-hook to call svnsync on the main server. I've found that this presents with 2 problems.

First problem that I encountered with that is, svn seems to be designed to not send a commit done message to the client until the entire post commit hook is done. So on particularly large commits, waiting for a sync to the mirror is wasting time on the client side.

The other problem, is that syncing by sending data from one Apache server to another Apache server, is extremely slow. Which is why I always call the svnsync on the mirror server using a UNC file path and not over https.

Archiving to the Cloud

So now I have a redundant main server, and a redundant mirror, that has at most a delay of a minute before it updates itself to the latest revision. Perfect right? Not yet. Any disaster plan should always have an offsite backup solution. I could use an offsite mirror, but that is cost prohibitive based on the size of my repository. It's VERY large. So instead, I archive to the cloud.

The post commit hook is fairly simple. All I do is write the revision number of the commit as a file on the server.   This is so the post commit hook can exit as quickly as possible, because of the aforementioned problem 1. Later at night, the server loops through the directory where all the days commits have been stored and runs a simple batch file on it.

@echo off
call setPath.bat
svnadmin dump E:\svndb -r %1 --incremental | bzip2 > svnarchive.%1.dump.bz 

:UPLOAD
echo "Upload dump!"
s3cmd put svnarchive.%1.dump.bz s3://svnarchive/svnarchive.%1.dump.bz

for /f "delims=" %%v in ('md5sum svnarchive.%1.dump.bz') do set TOOLOUTPUT=%%v
SET LOCAL_MD5=%TOOLOUTPUT:~0,32%
echo "Local md5 = %LOCAL_MD5%"

s3cmd info s3://svnarchive/svnarchive.%1.dump.bz | grep MD5 > rev%1.txt
set /p TOOLOUTPUT= < rev%1.txt
del rev%1.txt
SET REMOTE_MD5=%TOOLOUTPUT:~14,47%
echo "Remote md5 = %REMOTE_MD5%"
if "%LOCAL_MD5%" == "%REMOTE_MD5%" (
	echo "MATCH!"
	goto END
) else (
	echo "DOES NOT MATCH!"
	goto UPLOAD
)

:END
del svnarchive.%1.dump.bz

Simply put, this batch file takes the rev number as an argument, does an incremental dump and bz2's that output. Then it uploads it to S3, it then compares S3's calculated MD5 with the locally calculated MD5 and if they match, voila! Delete the dump and move on to the next revision. If not it tries it again and again, until it succeeds. Using the S3 calculated MD5 is a simple way to keep bandwidth moving in and out of S3 as low as possible. The advantages of using this method is that it is not always safe to make a copy of an svn repo while it is running, so simple copy backup solution on a repo is insufficient. You could do a svn hotcopy before running regular backup software on it, but to hotcopy a large repo could take a long time. This method of doing a per transaction backup is quick and cheap in comparison.


Additional Steps

 In addition to this, regularly run integrity checks are always a good idea. I can run an integrity check on any revision by redumping the revision and checking the MD5 against the one in S3. I re-verify the MD5 of every revision committed in any given week, at the end of the week. Yearly, I also do a full verification from 1 to N. If there are differences discovered, then it's time to check the mirror, and see which one is right. Because the corruption could have come from the main repo, or S3! In the event of a catastrophic failure that takes out both the main and the mirror server, I also have a recovery script that grabs every single revision from the bucket and does a svnadmin load to a new repo. Something like :

for /L %%V in (%1,1,%2) do (
	s3 get svnarchive/svnarchive.%%V.dump.bz /nogui
	bunzip2 svnarchive.%%V.dump.bz
	svnadmin load svndb < svnarchive.%%V.dump
	del svnarchive.%%V.dump
)


Limitations

Amazon has unlimited files number per amazon s3 bucket so theoretically this archival system could go forever, but S3 has a file size limit of 5GB. Anyone who commits 5GB at a time to SVN needs to be shot.

The cloud is not always safe, there have been a couple of reports of corruption and failures with the s3 service, but for the most part it has been a reliable service for me.

Restoring from S3 could take a while, depending on my downstream, but if BOTH mirror and main server went down. I think I have bigger problems, like the big one finally hit Southern California, or someone burned down the building.

This has worked fairly well for me for quite a while now, and I hope this gives someone else ideas on cost effective backups, redundancy and archival. I do know that people recommend having multiple repositories, but the way we have our projects set up, it's much easier to do them in one repo.

Friday, January 7, 2011

VSPTree and Nehalem's

VSPTree uses the Visual Studio 2008 Team System Profiler for profiling and will cause bluescreens due to a bug in windows. There is a hotfix, unfortunately the installer will only install it if you have VSTS installed! Not that standalone... which is what VSPTree uses. :(

http://code.msdn.microsoft.com/KB958842

Alternatives are :

AMD CodeAnalyst

Very Sleepy

Both are simple enough to use, but CodeAnalyst has the advantage of having a VS Plugin... however, that VS plugin crashes VS when it tries to show the results....So they're equal in my mind.

CodeAnalyst does have the ability to filter the results better, both are simple enough to use.