Architectural typology classfication based on images

Brief Overview

The visual nature of a building gives us clues about the function and typology of a building. But in the 21st century, the boundary of visual interpretation has seemingly intermingled and no longer an apartment looks like an apartment or a hospital looks like one. Moreover, the two of them might share a lot of visual similarities.
This project is a work in progress research on architectural typology classification using state-of-the-art deep learning algorithms to find visual similarities and differences between various buildings of several typologies from a machine's point of view . I web scraped close to 6000 images, spread across 14 categories/classes and used Deep Convolutional Neural Network to classify images providing a quantifiable measure of difference and similarities between various typologies with an accuracy of 82%. The results were further analyzed and visualized to see which features were picked by the Neural Network to classify images. Please find code on my website.

Data Collection

Each category has minimum of 250 images with maximum being 600 images. The next version should have equal number of images to prevent bias. The current dataset is available to the public through my google drive link: The various categories are 'Apartment interiors', 'Apartments', 'Bridge', 'Church', 'Hotels', 'Kindergarten', 'Museum', 'Museum interior', 'Office', 'Office interior', 'Stadiums', 'Store', 'University'

Snapshot of various classes

System Architecture, Transfer Learning and Iterations

In the Deep Convolutional Neural Network the features that a network learns isn't hand engineered but is rather dependent on the training examples. We will use a particular kind of architecture under Deep Convolutional Neural Network for image classification called ResNet50. Rather than training the model from scratch, pretrained weights of ImageNet were used(Transfer Learning) initially but the results weren't satisfactory. Hence a different method of training was used.

Actual Typology vs Predicted Typology

55.6 % accuracy with ResNet50 using ImageNet weights after 10 epochs - Model 1
I started with a default set of pretrained weights to train my neural network using Transfer Learning. The resultant accuracy didn't improve significantly. The results were very intuitive and just like we human beings in the real world find it hard to differentiate between several typologies, the neural network was clearly struggling to identify it. The underlying reason after analysis came out that the ImageNet weights used were trained on dogs and cats.
Most confused/misclassified one                                                                                                                                                  Least confused/misclassified ones
('Office', 'Apartments', 37)                                                                                                                                                                            ('Kindergarten', 'Church', 2)
('Office', 'University', 24)                                                                                                                                                                ('Kindergarten', 'Museum interior', 2)
('Apartment interiors', 'Office interior', 22)                                                                                                                                             ('Museum interior', 'Airport', 2)
('Hotels', 'Apartments', 21)                                                                                                                                                                       ('Museum interior', 'Church', 2)

82.78 % accuracy with ResNet50 using Places365 weights after just 2 epochs - Model 12
After much attempts the accuracy wasn't increasing above 60% and the model started to overfit. Hence, I used weights of a pre-trained model like Places365 that is more similar to my dataset as compared to ImageNet. The accuracy increased by 18% and after few more epochs reached 80%. The batch size needs to be small so that the network could generate correlation between various images.
Most confused/misclassified one                                                                                                                                                   Least confused/misclassified ones
('Office', 'Apartments', 10)                                                                                                                                                                                       ('Church', 'Bridge', 2)
('Hotels', 'Apartments', 9)                                                                                                                                                                                        ('Hotels', 'Airport', 2)
('Office interior', 'Apartment interiors', 8)                                                                                                                                                    ('Kindergarten', 'Church', 2)
('University', 'Office', 8)                                                                                                                                                                                     ('Stadiums', 'Museum', 2)


Top losses(images that were most confusing to classify by machine) - Model 1
In the very initial model its was very hard for neural network to classify nearly 50 percent of the images into the right class. But on the other hand this also tells us that how close those typologies are to each other. With my experience, a human being visually looking at the above images cannot tell which one is which. For example: The second image, the machine thought that its actually a kindergarten but in reality its a university. Now in real life also it makes sense because its hard to differentiate a university from a kindergarten in one instance. Furthermore, the sixth example, the network learnt that long spires are a visual feature of a church but when it sees a building with spires which is a university, it misclassifes it.

Top losses(images that were most confusing to classify by machine) - Model 12
In this particular section we used a very different way of analysing the losses.The pale yellow patches tell us where the machine was actually looking to classify that particular image. Again going through some examples here gives us a much ore indepth knowledge about architectural typology. In example one, the machine is looking at the long tower in the background.After training for some time , the machine would have learnt that long towers are only associated with airports and hence it misclassifies this hotel. Lets look at the fourth image. I provide lot of images of stores which displayed the front facade. In this image, its easy for a human to guess that its a church because one can read it, but the machine couldn't and hence it took it as a store than a church.

Interpreting the correctly classified images

The buildings that had high probability
One can clearly see in the above corretly classified images that the machine eye, looks at the features which we humans also look. Going from top to bottom, using the first image above, the context around bridge is tough to understand only if somone has experinences it before, but the machine eye only looks at that feature and correctly classifies it with a probability of 0.9. I was particularly baffled by example two because the machine is only looking some parts of the roof but mostly focussing on the fireworks and it guesses with a porbabilty of 0.7 that its a stadium

The buildings that had low probability but were classified correctly
If I show the above images to an architect it would be very difficult for him/her to guess the typology of the building correctly in the very first go. Just by looking and analysing the facade at the same exact spot the machine was able to correctly clasify. From an architectural point of view we can definitely understand that form did a very important role in classifying the building typology correctly, but in the last decade or two, the boundary has been blurred because of cross country influences in design and architecture. This whole research was a step towards quantifying the classification process.

Varied features

The buildings that had high probability
In some of the examples above we also see that the machine intelligence looks at two very different features and predicts two different classes which may or may not be correct. Taking the first example and a very famous. The obselisk behind is considered as spire of the church while the when the machine looks at the buidings it thinks of a museum. The same happens in the third example where, the curvy nature of the roof, confuses the neural netwrok between an airport and a museum. In the final example the gabled roof depicts a kindergarten while the background depicts an apartment.


Neural Network Visualization
The visualizations below are particulary fascinating. It shows several layers of the neural network and depicts what all things are being extracted from the image. The above image is of the stadium and as the image goes forward only bare minimum essential features are extracted.

TSNE Visualization
TSNE is an algorithm that displays the various tyologies/classes in a lower dimension here two dimension. Furthermore, it also tells us which typology is closer to which one or which typology is farther away or different from which ones. Looking at the results, we can infer that Apartment, Museum, University and Office are really close to one another and its hard to differentiate between them sometimes. On the other hand, Kindergarten and Stadium are two very distinct entities which are opposite to each other.


The Jupyter file(.ipynb) can be viewed here.The pdf below is for the reference.


If you like my work and this website or would like to further discuss about any project, feel free to connect with me. Thank you so much for your time.