# Spreadsheets & Programming Think Like a Programmer

hello again it’s me V Anton sprawl talking about how you can learn to think like a programmer as I’ve said before I’m trying to make videos that are useful both for people who have purchased my book and those that haven’t so I keep looking for ways to explain these ideas that are different from what I’ve already said in the book and the other day I thought of something pretty good spreadsheets I really enjoy creating spreadsheets I don’t know what that says about me actually I think I do know what it says it says I’m a bit of a geek but that’s okay everybody already knows that anyway when I say I like making spreadsheets I mean complicated sheets that involve chains of calculations from one cell to another I don’t mean writing macros writing macros can be fun but that’s normal programming fun solving a problem using cell formulas is a different kind of problem solving but I think it’s an excellent exercise for stretching problem solving abilities also you can often directly apply the logic used in a spreadsheet to solve the same problem in a program using array so let’s take a sample problem and solve it several different ways using spreadsheets and then we’ll take the spreadsheets and see what kind of programs are suggested by them we’ve got a list of numbers ministration I’ve got 20 numbers here that are all less than 100 but let’s say the problem definition allows for an unlimited count in any range of numbers the goal is to determine how many numbers in the list are duplicates as I always say depending on your experience level this problem may seem difficult or trivial it’s all about how we arrive at a solution not the particular problem one thing I always like to try is reducing the problem that is changing the problem definition to make it easier to solve by solving the easier variant I often come up with good ideas for solving the original problem in this case I ask myself what if we limited the numbers in the list to a small range like say 1 to 20 that leads to this solution this is a histogram account of the number of times particularly list as you can see I’ve got the numbers one through twenty running across the header of this grid each of the cells in this grid is going to be one where ever the number in the header matches the number in that row in the data list so this cell is one because this is the three column and the data in that row is also three by summing these columns at the bottom I can count how many we’ve got of each number then in the last row I count the duplicates anytime we’ve got a count that’s more than one those are all duplicates so for example if I’ve got three appearances of the number for all three of those or duplicate I just add up the total that works great but of course it’s a simplification of the original problem if we allow the data to be any valid integer it’s going to require more columns than we can possibly create then I had another idea what if the data was sorted if we sort the data it makes it easy to identify the duplicates a number is a duplicate if it’s the same as the number that appears previously or next in the list I wrote a rule for a cell using that idea and that again allows me to identify the duplicates summing this column gives me the total number of duplicates that works and it doesn’t require limiting the range of data however it does require sorting the data and I can’t just sort the data using cell formulas I have to copy the data and use the spreadsheet sort command also thinking ahead to writing this as a program I don’t want to have to sort the data if it’s not necessary that leads to this solution the idea here is to have each number in the list compared to every other number in the list the top row of the grid I have the numbers from the list again the logic in these cells is exactly the same as the histogram version that is wherever the number in the list and the number in the header match we get a one the bottom I some the columns just as before the logic in this bottom row is a little different though each column now represents one position in the array so either that position is a duplicate or it’s not I can identify the duplicates because the count is greater than one note that the count will always be at least one because it will match itself you can see all the values here along the diagonal or one I thought about removing the formulas from the cells along the diagonal but then that gave me another idea it occurred to me that each position in the list could determine if it was a duplicate just by looking for a match somewhere among the numbers that appeared previously in the list the only numbers this wouldn’t work for would be the first appearance of each value so for example there are three occurrences of 11 in the list the second and third appearances can determine that they are duplicates by looking at previous value but the first appearance of 11 can’t here’s where this logic led me this grid is doing less than half of the work of the previous grid the trick to making this work is the logic of this last row the idea is that each position is trying to identify itself as a duplicate by Counting matching numbers that appeared previously in the list so if the sum for a column is two or more that means that the number represented by this column is a duplicate and we should put a 1 in the last row if the count is exactly 1 though this is the trick we record that as a 2 in the last row that’s because this is the second appearance of this particular value in the list so we make it responsible not only for counting itself but also the first appearance of this particular value which was unable to count itself that logic is a little tricky and it’s this sort of thing that makes spreadsheets useful for me for a program logic I don’t know how readily I would have come up with that idea dealing with the problem directly in code so let’s take a look at the code I created from the ideas and the spreadsheets first here’s the setup code in the main function I’m using the data from the histogram example and here’s the function for the histogram solution to the problem I create an array dynamically which is why I have to pass in the maximum permissible value for a datum item then I make the histogram if you’re new to C++ this may look a little funky but it just means that I’m going to go to the position in the histogram array specified by the next data value and add one to it so if the next data value is four I would increment position four in the value counts array then I use the histogram to count the number of duplicates using the same logic as was in the spreadsheet now here is the version that sorts the data to make finding the duplicates easy I’m using the C++ library function Q sort again if you’re not into C++ all you need to know is that this call will in fact sort the array which is a copy of the original data then we count the duplicates using the same idea as in the spreadsheet a duplicate will have the same value as the next or previous value in the array we just have to remember that we can’t compare the first element in the array to a previous value or the last element in the array to a next value so we have to handle the first and last values in the array as separate cases outside of our Lu finally here is the half grid comparisons idea because this is a clear improvement over the full grid comparison I didn’t even implement that version as you can see this implementation requires a nested loop the outside loop variable is the position for which we’re trying to determine if it’s a duplicate the inside loop increments over all the previous positions and the array when the values in the two positions are the same we get a match and we and we’ve compared a particular value to all the previous values we look at the number of matches we found and then update the count of duplicates appropriately using the same logic as in the spreadsheet that is when the number of matches is exactly 1 we have to add 2 to the duplicate count because the second occurrence of a number is responsible for including the first occurrence in the duplicate count all three of these versions work which of them is the best sort of depends on the context obviously the histogram version can only be used when we have a relatively small range of values in the array when we can use the histogram version it’s going to be the fastest choice because there are no nested loot if we were doing normal analysis we would say it runs in linear time the other two used nested loops you don’t really see it in the sorting version because the nested loops would occur in the sort which we’re doing via a library call the sorting version would obviously be a great choice if we needed to sort the data anyway and the half grid comparison would be a good choice if we didn’t want to sort the data but all of these are useful techniques that were worth exploring because the techniques could be used in other problems anytime you come up with a solution that works it adds something to your toolkit even if you end up going with a different approach in the end and for that reason I’d like to give a couple of suggestions for other problems you can try solving using this spreadsheet first method problem 1 termen if a list has more odd numbers or even numbers problem number two find the two values in a list of numbers that are closest to each other to come up with multiple ways to solve these using spreadsheets then try turning those spreadsheets into code so that’s it I hope this helps you out thanks as always for watching and please do hit the like button and subscribe if you’d like to see more videos like this feel free to offer your comments on youtube or head over to my site if you’d want to contact me