Permutations with repetition using Excel

The back story

Recently I have been trying to limit my use of VBA. It’s not that I was addicted and needed to go to VBA-rehab, on the contrary, I still love VBA till death do us part. But the thing is, every time someone opens an Excel sheet with VBA macros they are reminded of Excel’s vulnerability and the risks of macros.

So I set out to make workbooks that do the same thing, but without VBA. Not always is this possible, or efficient to do so. But when it’s possible, it also comes with great performance and great stability. No code needs to be changed, ever. Of course there are also downsides. It’s not as flexible as VBA, so you’re stuck in a rigid framework that solves one thing and one thing only. But it does it so well, oh my.

My latest endeavor was with permutations. I needed something that would generate all permutations of the tokens F,C,R,A (don’t ask) with repetition. As some know I am an avid speed cuber, that is solving the Rubik’s cube for speed. And the Rubik’s cube is a permutation puzzle. So I have dealt with permutations quite a lot. For those who haven’t paid attention in math class: you should know that permutations come in two flavors: with repetition (Pr) and without repetition (P). The number of permutations (Pr) in these four tokens F,C,R,A is 4^4, or 256. That number is exponential, so it grows so fast that at 5 tokens you are at 3,125 permutations and at 6 tokens at 46,656 permutations. At a set size of 10 tokens you are at 10^10 or 10,000,000,000 (ten billion) permutations. VBA would surely choke on that number of statements to follow. Excel can handle 1 million rows, although I wouldn’t put it to the test with that.

A clean slate

I started out writing a VBA program, that generates all permutations (Pr). The funny thing is if you go online and expect to find a bunch of worked out examples of algorithms, you don’t. Almost all examples you find are about permutations without repetition, which is like working with real objects, since you can’t duplicate real objects. The lottery is a good example of this. The program worked, but it was slow, and cumbersome. So let’s drop VBA and try it without.

CaptureFirst, I set out with some settings (sheet ‘settings’). We define the pattern, and calculate the size of the set (using LEN), and the number of possible permutations (using Length^Length). These values will be used extensively in the functions to generate permutations.

The maximum length of the pattern is set to 5, which totals 3,125 permutations, an amount which Excel can handle in the blink of an eye. You could extend the grid to a width of 7, which would come to 823,543 permutations. I’d be interested to know how fast Excel would generate the output, and how big the file would become. If you try it out please let me know.

Formula frenzy

CaptureGo to the output sheet and look at the grid on the right. There you see in the top row a couple of simple formulas. They are to set the repeat cycle of that column. In the first column you see a token repeated once, in the second column 4 times, in the third column 16 times. We’re multiplying by Length to set the repeat cycle. This is an easy way to generate permutations of any set. Think of how it works with regular counting. You start with 0,1,2,3,4,5,6,7,8,9, and then go one digit to the left, and you repeat, but now in cycles of 10, so that 10,11,12,13,14,15,16,17,18,19 has exactly 10 times a 1. We’re using the same principle.

Our basic generating formula is as follows:


=MID(Pattern,MOD($D4/E$3,Length)+1,1)

If you don’t know MOD, this is a function for modulo, also called the remainder after performing division. The MID function gets a character from a specified position in the string. The column with N is simply to count and use the MOD function properly.

By using absolute referencing we are now able to copy the cell E4 to all other cells in the grid, while keeping a properly working formula. With a pattern of length 4 we can ignore the last column, which is only needed for a pattern of length 5.

In the settings sheet you will see a width and a height. By selecting the range starting at E4 with that width and height, we get exactly all permutations in the set.

Alternatively, on the left, there is a table with all tokens concatenated in one string, for ease of use. The formula for this is:


=IF(D4>=NumItems,"",LEFT(CONCATENATE(E4,F4,G4,H4,I4),Length))

Using FCRA as a pattern we can now see all 256 permutations in column A!

Perfect permutations

CaptureWell, permutations without repetition are actually a subset of permutations with repetition (P < Pr). In a permutation without repetition you don’t have any duplicates. So for the tokens F,C,R,A a valid Pr would be FFFF, but it’s not a member of P. You can only get a member of P by swapping original tokens. So e.g. FRCA is a member of P. That’s why the number of items in P doesn’t grow as fast as in Pr. Four tokens gets to 4*3*2*1=24 permutations. This is called a factorial.

If a token set does not contain duplicates we can easily filter out the permutations we need. E.g. in the set A,A,B,B we still get duplicates in the list, and so we can’t filter. But in F,C,R,A it’s quite possible using Excel. The formula used is a but difficult though, and requires some thought:


=SUM(IF(FREQUENCY(
  MATCH(OFFSET(E4,0,0,1,Length),OFFSET(E4,0,0,1,Length),0),
  MATCH(OFFSET(E4,0,0,1,Length),OFFSET(E4,0,0,1,Length),0))
  >0,1))=Length

Here, OFFSET gives us a dynamically defined range, which is handy, because we don’t know how long the pattern is beforehand. In this formula, Length is the size of the set (aka the length of the string). Both OFFSET and MATCH return multiple values, so it’s impossible to split the formula into more cells, but just for clarity, let’s view it in condensed form:


=SUM(IF(FREQUENCY({FFFA}, {FA})>0,1))=Length

What it does is it totals the frequencies of each character in the set, so in this case it returns 2, and then checks to see if it matches the length (4). If it matches we have a permutation. Note this only works for patterns that have no repeating tokens, like FCRA.

Using named formulas we can simplify the long formula to:


=SUM(IF(FREQUENCY(
  MATCH(tokens,tokens,0),
  MATCH(tokens,tokens,0))>0,1))=Length

…where tokens equals OFFSET(E4,0,0,1,Length)

Now we have a formula to detect permutations. Unfortunately we still have duplicates, because our table always has 5 tokens and we might have a shorter pattern, like our example FCRA. So we use an IF to detect empty cells in column A and we can now use Excel’s filter (On the ribbon choose Data, then Filter) to get all permutations.

Download

permutations with repetition (320KB)

Using named ranges and worksheet functions in Excel VBA

Bold Brackets

A couple of days ago I saw something in an article on StackOverflow, that blew my mind. I can’t find the article anymore, but I do remember what this one neat trick was, that will for ever change your VBA. It’s called a named range, and I found out I had always been doing it wrong. So have you, most likely.

Have you ever written something like this?


s = Application.WorksheetFunction.Sum(Range("A1:A10"))

You thought you were quite smart, using SUM to add some values together that would have taken a loop in VBA. You petted yourself on the back, took a beer, and applauded yourself for you being awesome. Well, you’re not awesome. This is lame. You suck. Ok, maybe not, but watch this:


s = [Sum(A1:A10)]

It’s incredible! This gives the exact same result.  You may wish to prepend with a sheet name, so it’s an exact reference. You can use any kind of name inside the square brackets. So, if A1:A10 is named ‘records’ in Excel you could rewrite this code to


s = [Sum(records)]

Note that with this notation we don’t use double quotes around the name of the range.

Vanishing Variables

CaptureA quite mighty use for this, is that we can now write code with a lot less variables, if we let Excel do the work for us.

  1. make a new sheet
  2. name it ‘variables’
  3. make three columns: name, value, description

Now you can fill the table you just made with all kinds of settings, constants or calculations that you want to use in your elaborate VBA program.

I am a lazy teacher, and I have a lot of students, so I work a lot with short macros that can help me get more spare time. Here’s an example:

    For Each subfolder In FSfolder.SubFolders
        If subfolder Like "Student *" Then
            [studentNr] = Right(subfolder, 6)
            If Not FileExists([sClass]) Then
                FS.createFolder [sClass]
            End If
            FS.MoveFolder subfolder, [newFolder]
        End If
    Next subfolder

In line 3 I fill the named range ‘studentNr’ with a value taken from a folder, which contains a student number. In the next line, I check if a folder for that students’ class has already been made, and if not, I make the folder. The variable [sClass] does not exist in my code. It only exists in my variable table in Excel. The cell contains a VLookup function to find in which class this student is currently enrolled. Similarly I have a [Teacher] variable, also with a VLookup function. NewFolder is simply a concatenation and formatting, which is also easily done in Excel.

As you can see, the values for [sClass], [Teacher] and [newFolder] are filled automatically by Excel, and I don’t have to process anything.

Programming like this in Excel is a new paradigm. You don’t churn out all your code top-to-bottom as you used to. You create sheets with lots of calculations, lookups etc, and then you make a tiny program that links all this together. Excel can do some crazy fast, complex stuff, and you should never have to program those anymore!

Crazy Caveats

Well wasn’t that incredible? You may not be used to programming like this, and I recommend this method only for experts. You should be in full control of the worksheets, or otherwise someone will mess with your program. Also, when you are part of a team, you should make sure this ‘magic’ is elaborately documented in the code (e.g. in a header of the function mention which Excel named ranges are used).

Good luck.

Learn more: FastExcel Blog

 

Convert Excel date value to SQL date

When creating SQL statements you’ll often need a date in the ISO 8601 standard format (e.g. 2010-03-26 12:34).

Of course you can change the format in Excel to show it as such, but that doesn’t give you the string you need, e.g. in an insert or update statement.

Here’s an Excel function to make an SQL date value, presuming the date value is in cell A1:

=TEXT(A1,”yyyy-mm-dd hh:MM:ss”)

This circumvents the use of complicated IF and date/time functions. Append a “Z” if you need to indicate the timezone as UTC (i.e. GMT) time.

Here’s a short VBA function to create this type of date

Function SQLDate(d)
SQLDate = WorksheetFunction.Text(d, "yyyy-mm-dd hh:MM:ss")
End Function

Put this code in a new module in your workbook to instantly start using the function in Excel like this: “=SQLDate(A1)”

SQL date in Excel
SQL date in Excel

More information:

http://en.wikipedia.org/wiki/ISO_8601

Convert Excel Tables To Lists

note: please see the update on the bottom of this article for an even quicker way to convert a table

In a dark past I was an Excel instructor (among other things). I have trained countless people in the art of Excel number wizardry. I have then become a certified Excel VBA specialist, and I must say, in my years being a professional programmer, this is the skill that has set me apart from all other programmers around me. Sure, people can do Regular Expressions… So can I. Sure, people can do Object Orientation. So can I. But what programmer fancies dumb jumbling of data, and programming a language that has the word ‘basic’ in it? Right. Most programmers I knew were Linux shell companies (pun intended) that had no idea something good was hidden up the sleeve of Microsoft. But ok, sometimes I was able to show them some awesome things, and they would instantly recognize that their world view (Excell is for end-users) was a grave mistake.

What fun it was to teach people Excel, and more so, VBA! To me, it’s the tool of all tools, and it can greatly help anyone who ever works with data (ehm, anyone). So in this first part of a long, long series (I hope) I will show you, the humble ignorant user how to convert an Excel table to a list.

Why? Many times I have gone to companies and helped them with some particular problem. Usually it started out with an analyst/marketer/ceo showing me a bunch of data. The data was always presented as a table, with column headings and row headings, with the data in the middle. That seems like a nice way to present data, yes, it is in fact. The first thing I would do is then convert this table to an ugly list.

So why would you convert it to a list?

Because Excel is in love with lists. Excel craves lists, it’s like Access’ little brother, but it can speak five languages and juggle 4 balls. It’s no database tool (maximum of 65535 rows, 1M in Excel 2007)… but it can transform any list into a deep, deep analysis.

The way this analysis is done later on is with Pivot Tables. I wrote those capitals on purpose. Pivot Tables are so powerful that you can basically give it any data list and it can tell you what’s missing, what’s wrong, what’s unique, what’s the total, what’s the average, you name it. But more on that later on.

Let’s convert!

Warning: code ahead…

A table consists of three parts:

  1. The row headings (left)
  2. The column headings (top)
  3. The data (center)

We will loop through the data cell by cell, and create a row in a new list for each. That’s the basic idea (pun intended).

Before we start, we check some preconditions. We have to make sure that we are inside a set of data, formed into a table. All we do is just check if we have at least two rows and two columns (not the ultimate, but it works).

Sub TableToList()
If ActiveCell.CurrentRegion.Rows.Count < 2 Then
Exit Sub
End If
If ActiveCell.CurrentRegion.Columns.Count < 2 Then
Exit Sub
End If

Then we will need some variables to refer to the various sections of the table

Dim table As Range
Dim rngColHead As Range
Dim rngRowHead As Range
Dim rngData As Range
Dim rngCurrCell As Range

Next. we will need some variables for the data itself

Dim rowVal As Variant
Dim colVal As Variant
Dim val As Variant

Now, we will start pointing our variables to the data, row headings and column headings, like so

Set table = ActiveCell.CurrentRegion
Set rngColHead = table.Rows(1)
Set rngRowHead = table.Columns(1)
Set rngData = table.Offset(1, 1)
Set rngData = rngData.Resize(rngData.Rows.Count - 1, rngData.Columns.Count - 1)

Note that “currentregion” is a handy tool that expands any cell into a surrounding of non-empty cells. So this way your selected cell could be anywhere inside the table when you run the macro. The data part is a bit harder, line 4 and 5 together shift and resize the original table to form the right bottom part, where all the data resides.

ActiveWorkbook.WorkSheets.Add

Next, we create a new sheet in the workbook, to hold the list.

ActiveCell.Value = "Row#"
ActiveCell.Offset(0, 1).Value = "RowValue"
ActiveCell.Offset(0, 2).Value = "ColValue"
ActiveCell.Offset(0, 3).Value = "Data"
ActiveCell.Offset(1, 0).Select

In this sheet, we create a first row, “manually”, where we name the column headings for our list. These column headings are very important for sorting, analysis, pivot tables, export and such. The last statement instantly moves the current cell selection one row down. Notice we’re inserting a special column for Row Number. This is not always necessary, but it doesn’t hurt, and it helps you to always be able to restore the original order of the list.

Now it’s time for the actual grunt work, looping through the table

Dim n As Long
For Each rngCurrCell In rngData
colVal = rngColHead.Cells(rngCurrCell.Column - table.Column + 1)
rowVal = rngRowHead.Cells(rngCurrCell.Row - table.Row + 1)

The “for each rngCurrCell in” is a real beauty in VBA. It just runs through any selection, without worries of overflows, row and column numbers, or calculations. In the loop, we set the value of the current column and row. Note that the rngCurrCell.column and rngCurrCell.row are not relative, it’s the actual number of the column/row. So if the tables starts at C3, the first cel is having column=3 and row=3.

n = n + 1
ActiveCell.Value = n

Here, we upped counter ‘n’ and put it in the list.

ActiveCell.Offset(0, 1).Value = rowVal
ActiveCell.Offset(0, 2).Value = colVal
ActiveCell.Offset(0, 3).Value = rngCurrCell.Value
ActiveCell.Offset(1, 0).Select

We do the same trick again to put a new row in the data list on our new sheet. As you can see this part of code is repeated from the part where we created the header. A small improvement would be to create a function named ‘newRow(n, rv, cv, dv)’ to insert a new row with these values.

If, instead of actual values, you prefer to link to the original cell, you can use

ActiveCell.Offset(0, 3).Value = "=" & rngCurrCell.Worksheet.Name & "!" & rngCurrCell.Address

Finish the loop with:

Next
End Sub

Well, that’s it!

Running your code

Make sure to have a table setup in Excel, and click inside the table, it will be automatically selected.

  • Press ALT+F8
  • Select TableToList
  • Click Run

In Office 2003 you can add a shape to your worksheet, right click, and choose assign macro.

  1. Choose Tools > Customize
  2. Choose Macros > Custom menu item -> drag to toolbar
  3. Right click item
  4. Choose Assign macro…
  5. Choose TableToList

Since Office 2007 this option is not available anymore, but you can still right click the ribbon and choose ‘customize quick access toolbar’. From there you can pick the Macro’s category and add the macro.

A new sheet will be created. Take a look at the list. You can try sorting, filtering, analyzing, totalling, and… pivot tables. A pivot table is a dynamically updating table which automatically totals values from a list, and presents them in… a table. Here’s how to re-create the original table from the list:”

  1. Choose Data > Pivot Table
  2. Choose Finish
  3. Drag ColumnValues to the ‘column fields’
  4. Drag RowValues to the ‘row fields’
  5. Drag Data into ‘Drop data items here’

Voila, the list is back. That is, if it was a list of numbers. Pivot Tables are for numeric operations, if you had text in there, it won’t show anything (anything good).

Download

Download the file here:

Table2List.xla

How to install:

  1. Office 2003: to install an XLA you need to go to Tools > AddIns and select the file with the browse button. Make sure the checkbox is enabled.
  2. Office 2007: Click on the Office Button in the left top, then Excel Options, then Add-Ins. Now select Manage… Excel Addins and click Go. Again browse and enable the addin.

You may wish to copy the file to the suggested Add-Ins folder. If you are on a network and wish to share the add-in with others, make sure to keep it on a network drive.

Once the Add-in is enabled you will see a new button that runs the macro. In Office 2007 the button is under the Add-Ins ribbon tab. You can also press ALT+F8, then type ‘Table2List.xla!TableToList’. The macro will be hidden in the XLA file, so you cannot select it.

An even quicker solution

Abu Yahya (see comments) gave me an even quicker solution. All kudos go to him for this.

First, start the pivot table wizard. Now in Excel 2007 and up you may have a hard time finding it! So, right click the quick access toolbar (it’s the bar with tiny icons on the top). Then choose Customize… and select Choose commands from: Commands not in the ribbon. Now find “pivottable and pivotchart wizard” in the list, and add it to the list on the right. You will see a tiny pivottable icon in the toolbar now.

Go on and click that icon, and then:

Step 1. Choose multiple consolidation ranges
Step 2. Choose I will create the page fields
Step 2b. Select Range of the table then Add to
Step 3. Choose New Worksheet
Step 4. Click Finish
Step 5. on the new sheet – Pivot table field list –> uncheck [ ] Row and [ ] Column
Step 6. There will be one value exactly in your pivottable. Double click it.

You will now see a new sheet with a list built up of the columns Row, Column and Value.

Step 7 (optional). Create a pivottable from this list to analyze your data.

One important note: you can have exactly 1 field that will show up in your data next to your row and column fields. That field has to be in the far left column of the data you select before consolidating. If you need more data in your final analysis you can combine fields with the “&” operator, using a formula like e.g.in cell C2:  =A2 & B2.