Permutations with repetition using Excel

The back story

Recently I have been trying to limit my use of VBA. It’s not that I was addicted and needed to go to VBA-rehab, on the contrary, I still love VBA till death do us part. But the thing is, every time someone opens an Excel sheet with VBA macros they are reminded of Excel’s vulnerability and the risks of macros.

So I set out to make workbooks that do the same thing, but without VBA. Not always is this possible, or efficient to do so. But when it’s possible, it also comes with great performance and great stability. No code needs to be changed, ever. Of course there are also downsides. It’s not as flexible as VBA, so you’re stuck in a rigid framework that solves one thing and one thing only. But it does it so well, oh my.

My latest endeavor was with permutations. I needed something that would generate all permutations of the tokens F,C,R,A (don’t ask) with repetition. As some know I am an avid speed cuber, that is solving the Rubik’s cube for speed. And the Rubik’s cube is a permutation puzzle. So I have dealt with permutations quite a lot. For those who haven’t paid attention in math class: you should know that permutations come in two flavors: with repetition (Pr) and without repetition (P). The number of permutations (Pr) in these four tokens F,C,R,A is 4^4, or 256. That number is exponential, so it grows so fast that at 5 tokens you are at 3,125 permutations and at 6 tokens at 46,656 permutations. At a set size of 10 tokens you are at 10^10 or 10,000,000,000 (ten billion) permutations. VBA would surely choke on that number of statements to follow. Excel can handle 1 million rows, although I wouldn’t put it to the test with that.

A clean slate

I started out writing a VBA program, that generates all permutations (Pr). The funny thing is if you go online and expect to find a bunch of worked out examples of algorithms, you don’t. Almost all examples you find are about permutations without repetition, which is like working with real objects, since you can’t duplicate real objects. The lottery is a good example of this. The program worked, but it was slow, and cumbersome. So let’s drop VBA and try it without.

CaptureFirst, I set out with some settings (sheet ‘settings’). We define the pattern, and calculate the size of the set (using LEN), and the number of possible permutations (using Length^Length). These values will be used extensively in the functions to generate permutations.

The maximum length of the pattern is set to 5, which totals 3,125 permutations, an amount which Excel can handle in the blink of an eye. You could extend the grid to a width of 7, which would come to 823,543 permutations. I’d be interested to know how fast Excel would generate the output, and how big the file would become. If you try it out please let me know.

Formula frenzy

CaptureGo to the output sheet and look at the grid on the right. There you see in the top row a couple of simple formulas. They are to set the repeat cycle of that column. In the first column you see a token repeated once, in the second column 4 times, in the third column 16 times. We’re multiplying by Length to set the repeat cycle. This is an easy way to generate permutations of any set. Think of how it works with regular counting. You start with 0,1,2,3,4,5,6,7,8,9, and then go one digit to the left, and you repeat, but now in cycles of 10, so that 10,11,12,13,14,15,16,17,18,19 has exactly 10 times a 1. We’re using the same principle.

Our basic generating formula is as follows:


=MID(Pattern,MOD($D4/E$3,Length)+1,1)

If you don’t know MOD, this is a function for modulo, also called the remainder after performing division. The MID function gets a character from a specified position in the string. The column with N is simply to count and use the MOD function properly.

By using absolute referencing we are now able to copy the cell E4 to all other cells in the grid, while keeping a properly working formula. With a pattern of length 4 we can ignore the last column, which is only needed for a pattern of length 5.

In the settings sheet you will see a width and a height. By selecting the range starting at E4 with that width and height, we get exactly all permutations in the set.

Alternatively, on the left, there is a table with all tokens concatenated in one string, for ease of use. The formula for this is:


=IF(D4>=NumItems,"",LEFT(CONCATENATE(E4,F4,G4,H4,I4),Length))

Using FCRA as a pattern we can now see all 256 permutations in column A!

Perfect permutations

CaptureWell, permutations without repetition are actually a subset of permutations with repetition (P < Pr). In a permutation without repetition you don’t have any duplicates. So for the tokens F,C,R,A a valid Pr would be FFFF, but it’s not a member of P. You can only get a member of P by swapping original tokens. So e.g. FRCA is a member of P. That’s why the number of items in P doesn’t grow as fast as in Pr. Four tokens gets to 4*3*2*1=24 permutations. This is called a factorial.

If a token set does not contain duplicates we can easily filter out the permutations we need. E.g. in the set A,A,B,B we still get duplicates in the list, and so we can’t filter. But in F,C,R,A it’s quite possible using Excel. The formula used is a but difficult though, and requires some thought:


=SUM(IF(FREQUENCY(
  MATCH(OFFSET(E4,0,0,1,Length),OFFSET(E4,0,0,1,Length),0),
  MATCH(OFFSET(E4,0,0,1,Length),OFFSET(E4,0,0,1,Length),0))
  >0,1))=Length

Here, OFFSET gives us a dynamically defined range, which is handy, because we don’t know how long the pattern is beforehand. In this formula, Length is the size of the set (aka the length of the string). Both OFFSET and MATCH return multiple values, so it’s impossible to split the formula into more cells, but just for clarity, let’s view it in condensed form:


=SUM(IF(FREQUENCY({FFFA}, {FA})>0,1))=Length

What it does is it totals the frequencies of each character in the set, so in this case it returns 2, and then checks to see if it matches the length (4). If it matches we have a permutation. Note this only works for patterns that have no repeating tokens, like FCRA.

Using named formulas we can simplify the long formula to:


=SUM(IF(FREQUENCY(
  MATCH(tokens,tokens,0),
  MATCH(tokens,tokens,0))>0,1))=Length

…where tokens equals OFFSET(E4,0,0,1,Length)

Now we have a formula to detect permutations. Unfortunately we still have duplicates, because our table always has 5 tokens and we might have a shorter pattern, like our example FCRA. So we use an IF to detect empty cells in column A and we can now use Excel’s filter (On the ribbon choose Data, then Filter) to get all permutations.

Download

permutations with repetition (320KB)

Advertisements

Using named ranges and worksheet functions in Excel VBA

Bold Brackets

A couple of days ago I saw something in an article on StackOverflow, that blew my mind. I can’t find the article anymore, but I do remember what this one neat trick was, that will for ever change your VBA. It’s called a named range, and I found out I had always been doing it wrong. So have you, most likely.

Have you ever written something like this?


s = Application.WorksheetFunction.Sum(Range("A1:A10"))

You thought you were quite smart, using SUM to add some values together that would have taken a loop in VBA. You petted yourself on the back, took a beer, and applauded yourself for you being awesome. Well, you’re not awesome. This is lame. You suck. Ok, maybe not, but watch this:


s = [Sum(A1:A10)]

It’s incredible! This gives the exact same result.  You may wish to prepend with a sheet name, so it’s an exact reference. You can use any kind of name inside the square brackets. So, if A1:A10 is named ‘records’ in Excel you could rewrite this code to


s = [Sum(records)]

Note that with this notation we don’t use double quotes around the name of the range.

Vanishing Variables

CaptureA quite mighty use for this, is that we can now write code with a lot less variables, if we let Excel do the work for us.

  1. make a new sheet
  2. name it ‘variables’
  3. make three columns: name, value, description

Now you can fill the table you just made with all kinds of settings, constants or calculations that you want to use in your elaborate VBA program.

I am a lazy teacher, and I have a lot of students, so I work a lot with short macros that can help me get more spare time. Here’s an example:

    For Each subfolder In FSfolder.SubFolders
        If subfolder Like "Student *" Then
            [studentNr] = Right(subfolder, 6)
            If Not FileExists([sClass]) Then
                FS.createFolder [sClass]
            End If
            FS.MoveFolder subfolder, [newFolder]
        End If
    Next subfolder

In line 3 I fill the named range ‘studentNr’ with a value taken from a folder, which contains a student number. In the next line, I check if a folder for that students’ class has already been made, and if not, I make the folder. The variable [sClass] does not exist in my code. It only exists in my variable table in Excel. The cell contains a VLookup function to find in which class this student is currently enrolled. Similarly I have a [Teacher] variable, also with a VLookup function. NewFolder is simply a concatenation and formatting, which is also easily done in Excel.

As you can see, the values for [sClass], [Teacher] and [newFolder] are filled automatically by Excel, and I don’t have to process anything.

Programming like this in Excel is a new paradigm. You don’t churn out all your code top-to-bottom as you used to. You create sheets with lots of calculations, lookups etc, and then you make a tiny program that links all this together. Excel can do some crazy fast, complex stuff, and you should never have to program those anymore!

Crazy Caveats

Well wasn’t that incredible? You may not be used to programming like this, and I recommend this method only for experts. You should be in full control of the worksheets, or otherwise someone will mess with your program. Also, when you are part of a team, you should make sure this ‘magic’ is elaborately documented in the code (e.g. in a header of the function mention which Excel named ranges are used).

Good luck.

Learn more: FastExcel Blog

 

The selection contains multiple data values… Merging cells in Excel.

This is a super powerfeature I have always missed in Excel: the ability to join the content of cells without losing any data. Sorry, what’s that?? Yes, I know about the ‘merge’ option. It’s lame! Try merging two cells that both have content. Excel answers, delightfully happy, that you will only keep data in the first cell and lose all of the data of the other cells, and go deal with it.

multiple-data
multiple-data

So, I decided to ‘deal with it’ and I present here several macros for your pleasure and entertainment that will solve this once and for all.

Merging, Joining and Combining

Well, that’s all the same of course. Let’s start with a simple macro to join a bunch of cells and put the result in the first cell.

Sub Join()
   Dim out As String
   Dim c As Range

   out = ""
   For Each c In Selection.Cells
      out = out & c.Value
      c.ClearContents
   Next
   Selection.Cells(1, 1).Value = out
End Sub

How does it work? First, we define the variable out which will contain all content. Then we loop over all cells in the current selection and we combine their values using the concatenation operator “&”. Finally we put the value of out in the first cell of the selection.

If you have never created a macro before, you can add them by pressing ALT+F11. After that, make sure to copy this one to the personal macro workbook.

In some occasions I needed to merge, but also keep the data separated by either commas or newlines. So I made a slightly modified version with arguments. Note for noobs: you can’t run these like normal macros, you’d have to go into the VB editor and start them from the immediate window. Or, you can make a short calling-macro similar to the JoinWithBreaks below.

' join the contents of a selection of cells
' and put all of these together in one single cell
Sub Join(Optional s As String = "", Optional wrap As Boolean = True)
    Dim out As String
    Dim c As Range
    Dim n As Integer

    out = ""
    For Each c In Selection.Cells
        n = n + 1
        out = out & c.Value
        If n < Selection.Cells.Count Then
            out = out & s
        End If
        c.ClearContents
    Next
    Selection.Cells(1, 1).Value = out
    Selection.Cells(1, 1).WrapText = wrap
End Sub

Sub JoinWithBreaks()
    Join vbNewLine, True
End Sub

Fillerup

Another problem closely related to the merge problem is when you have a sheet looking like this:

excel fill problem
excel fill problem

And let’s say it extends a long long way to the bottom for your archive of ten years. To create a usable Excel table and e.g. to create a pivot table your table needs to have values in every cell, not just the first one. Humans can reason that cell A3 also belongs to January, but computers and thus Excel cannot. Clicking on the autofill handle in cell A2 fixes this, but doing so for 100 cells is still very tedious work. For that, we can use the following macro, which will automagically fill every empty cell with the value right above it.

Sub Filler()
    Dim c As Range

    For Each c In Selection.Cells
        If IsEmpty(c) And c.Row <> 1 Then
            c.FormulaR1C1 = c.Offset(-1, 0).FormulaR1C1
        End If
    Next
End Sub

Adventures in XML land: combining Excel, XML, HTML and JavaScript

the process
using Excel to generate XML, jQuery and Ajax for animations

In a past long ago I ventured into this barren land, where bugs crept up my sleeves and pants. In most cases after a short scary trip I’d be happy to be back home in PHP or plain JavaScript land.

What am I talking about? Client side “apps”, by which I mean browser based, server-less HTML pages, where everything is done on the client. Examples of such applications are CD Rom viewers, Touch Screen Console applications, and Information Display (like the train station screens).

Recently I got a request to make an Information Display. I started happily to look at how this would work using modern browser based techniques. After a couple of days of experimentation I must say that it was a wonderful journey, and I am convinced this technology will have a great future.

Ten years ago CD Roms with an HTML viewer were simple pages with little interactivity, or they came with a web server to do interesting stuff. Now, with the advent of powerful JavaScript libraries and Office 2007 we can now integrate these into a cool looking almost server-centric interactive display.

I have used the following technologies, and will explain hereafter how:

Excel 2007 and XML

Excel serves as my database. It consists of worksheets of tabular data. The data is exported to an XML file using the XMLTOOLS addin, which you can find on the microsoft site. It is very loosely designed: when I need an extra column, I insert it and start typing. I will have to re-generate an XML mapping then, and export the contents to an XML file. The advantage: the person working with this ‘database’ only needs to know Excel, and how to click on a few buttons.

jQuery and Cycle

I added jQuery most and for all for the Cycle plugin. It allowed me to create stunning visual transitions for ‘slides’. The slides are actually simple divs in an HTML page.

CSS3

I used CSS3 for creating nice looking gradients. Also I used CSS3 for zebra tables and drop shadows on images. Overall it means there is no need whatsoever for images to enhance the visuals. I think that’s how future web development will and should occur.

SVG and/or Canvas

In fact you hardly ever need both, given that these technologies mostly overlap in capability. An advantage of Canvas is that it is actually a program (javascript) instead of a declarative markup. Of course the same argument makes SVG more attractive for its simplicity.

Ultimately I decided to use SVG for both my static vector based images (floor plan) and my animation (a clock). The SVG animation actually looked better than the canvas one and I wasn’t very interested in modifying the default look.

For the floor plan I hunted for a good SVG or Canvas editor. What I found was Google Docs (!). Recently they added a diagram editor, that can export to SVG. The resulting code unfortunately looks like hexadecimal soup, but the good thing is you can easily modify the diagram on Google Docs and export again.

Right now my app only works in FireFox 3.6, and that’s just fine, baby!!

Note: I cannot share the application since it’s made for our organization, but if you need help in setting up one yourself just add to the comments…

Rubik’s cube Average formula in Excel

stopwatchAny speedcuber knows how to calculate his or her average: total all, remove fastest and slowest, and divide. Here’s how you do that in Excel

=( SUM(A1:A12) – MIN(A1:A12) – MAX(A1:A12) ) / (COUNT(A1:A12) – 2)

But that is tedious of course, and your constantly changing the range to find the average of e.g. 5, or a running average. So instead I wrote a little VBA function you can put in a module.

  • Press ALT+F11 to go to the editor
  • Choose Insert > Module
  • Choose Insert > Procedure
  • Type CubeAVG
  • Click Function
  • Click OK

Now edit so it’s like the code below

' cubeavg : calculate speedcubing average
Public Function cubeavg(r)
    Dim total As Double, fastest As Double, slowest As Double, n As Integer
    total = WorksheetFunction.Sum(r)
    fastest = WorksheetFunction.Min(r)
    slowest = WorksheetFunction.Max(r)
    n = WorksheetFunction.Count(r) - 2
    cubeavg = (total - slowest - fastest) / n
End Function

Don’t forget to save the file…

Convert Excel date value to SQL date

When creating SQL statements you’ll often need a date in the ISO 8601 standard format (e.g. 2010-03-26 12:34).

Of course you can change the format in Excel to show it as such, but that doesn’t give you the string you need, e.g. in an insert or update statement.

Here’s an Excel function to make an SQL date value, presuming the date value is in cell A1:

=TEXT(A1,”yyyy-mm-dd hh:MM:ss”)

This circumvents the use of complicated IF and date/time functions. Append a “Z” if you need to indicate the timezone as UTC (i.e. GMT) time.

Here’s a short VBA function to create this type of date

Function SQLDate(d)
SQLDate = WorksheetFunction.Text(d, "yyyy-mm-dd hh:MM:ss")
End Function

Put this code in a new module in your workbook to instantly start using the function in Excel like this: “=SQLDate(A1)”

SQL date in Excel
SQL date in Excel

More information:

http://en.wikipedia.org/wiki/ISO_8601

How to remove duplicates from a list

There are tons of ways to remove duplicates from a list of items, most of which are way too complicated and technical for a noob (if you don’t know what this is, then that’s you) to perform.

Let’s look at some of them, and let me know in the comments if you think these were useful. Here they come, in order of increasing difficulty (geekness)

TextPad

removing duplicates with textpad
textpad sort & remove

TextPad is a free (well, nagware) text editor with so many built in tools I cannot talk about it without crying…

  • open the file in TextPad
  • select Tools > Sort
  • check the box at ‘remove duplicate lines’
  • click OK

Excel

Another program abused by millions (billions?) to do stuff that could be done with a 10 year old cellphone. What the bozos at Microsoft tell you is the dumbest way to do it, because you’re overwriting your original list. Here’s the smart way

  • Excel Pivot
    Excel Pivot

    Select the data

  • Click Data > PivotTable… (Office 2003) or Insert > PivotTable (Office 2007)
  • Click ‘Finish’ to create a new sheet with an empty pivot table
  • Drag the column for which you need to remove duplicates into the left part of the pivottable
  • adding flavor: you can now sort, filter, group, analyze, you name it.

SQL

If your data is in a database, and you have access to SQL, perform the following query:

  • SELECT DISTINCT(column) FROM table
  • in place of ‘column’ you type what you need to be unique
  • in place of ‘table’ you type the table name
  • tip: you can combine more columns by typing (column1, column2, …)

Linux

If you have the file on a linux or unix system, from the terminal (command line) type

  • sort -u file > output
  • where ‘file’ is the name of your file and ‘output’ is the name of the output file.

I’d like to know if you can come up with new and maybe even faster ways!