Business Applications

Mobile Apps
Product Reviews
Employee Attrition

Mobile Apps: Partial Environmental Analysis

Contents

  1. Summary of Results
  2. Data Source and Discussion
  3. Data Preprocessing
  4. Most Common Apps by Genre
  5. Most Popular Apps by Genre: App Store
  6. Most Popular Apps by Genre: Google Play

<h2>Summary of Results</h2>

From the perspective of gaps in market offerings that a new development company could leverage to launch a successful initial product, we aim in this project to understand trends in popular mobile apps. We will focus on behavior in the Google Play store and Apple iOS App store, and we will limit our attention to apps that are free to download and install. Broadly speaking, we will not account for political, economic, social, or legal factors, hence only performing part of the environmental analysis that such a company should complete.

We will build our work on the assumption that visibility in the respective app stores is essential to success. And, moreover, that visibility is a byproduct of algorithms that favor already popular apps. We will utilize included data on app genre and categorization so that our hypthetical development company may focus on some subset of the app store that is popular with users but also possesses potential for small companies to get their footing. For example, we will find that some categories are entirely dominated by large corporations, against which a small company would never be able to outspend nor outcompute.

Ultimately, we will identify three areas in the Apple store (reference, health and beauty, and catalogs) and five areas in the Google store (health and fitness, house and home, art and design, personalization, and tools) in which a novel idea has the room to rise to a prominent position in the “leaderboards.” The reasoning for including these and exluding other popular categories is included in the course of our discussion.

note: I acknowledge that my hypothetical premise of a new company looking to chase a trend to build an app is almost certainly not a very good recipe for the longevity of the company. If you prefer, please replace this framework for one where an established company is trying to understand the structure of the marketplace and why their existing app is not being recommended often enough.

<h2>Data Source and Discussion</h2>

The data come from two sources: data on the Google Play Store includes 10841 observations and data on the Apple iOS App Store includes 7197 observations.

Each data set possesses its own schema, as can be seen by the given columns:

#setup our standard environment and a dataframe of the dataset
import pandas as pd
import numpy as np
google_raw = pd.read_csv('googleplaystore.csv')
apple_raw = pd.read_csv('AppleStore.csv')
google_raw.columns
Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',
       'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',
       'Android Ver'],
      dtype='object')
apple_raw.columns
Index(['id', 'track_name', 'size_bytes', 'currency', 'price',
       'rating_count_tot', 'rating_count_ver', 'user_rating',
       'user_rating_ver', 'ver', 'cont_rating', 'prime_genre',
       'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'],
      dtype='object')

As such, our first task is to clean any problem points and designate those columns that will be useful for our present purposes.

<h2>Data Preprocessing</h2>

First, if we inspect the App variable of the Google dataset, we find that there are numerous duplicate apps. More precisely, the number of duplicate apps is:

len(google_raw['App'])-len(google_raw['App'].unique())
1181

If we inspect some of these duplicates:

google_raw[google_raw.duplicated(subset=['App'])]
App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
229 Quick PDF Scanner + OCR FREE BUSINESS 4.2 80805 Varies with device 5,000,000+ Free 0 Everyone Business February 26, 2018 Varies with device 4.0.3 and up
236 Box BUSINESS 4.2 159872 Varies with device 10,000,000+ Free 0 Everyone Business July 31, 2018 Varies with device Varies with device
239 Google My Business BUSINESS 4.4 70991 Varies with device 5,000,000+ Free 0 Everyone Business July 24, 2018 2.19.0.204537701 4.4 and up
256 ZOOM Cloud Meetings BUSINESS 4.4 31614 37M 10,000,000+ Free 0 Everyone Business July 20, 2018 4.1.28165.0716 4.0 and up
261 join.me - Simple Meetings BUSINESS 4.0 6989 Varies with device 1,000,000+ Free 0 Everyone Business July 16, 2018 4.3.0.508 4.4 and up
... ... ... ... ... ... ... ... ... ... ... ... ... ...
10715 FarmersOnly Dating DATING 3.0 1145 1.4M 100,000+ Free 0 Mature 17+ Dating February 25, 2016 2.2 4.0 and up
10720 Firefox Focus: The privacy browser COMMUNICATION 4.4 36981 4.0M 1,000,000+ Free 0 Everyone Communication July 6, 2018 5.2 5.0 and up
10730 FP Notebook MEDICAL 4.5 410 60M 50,000+ Free 0 Everyone Medical March 24, 2018 2.1.0.372 4.4 and up
10753 Slickdeals: Coupons & Shopping SHOPPING 4.5 33599 12M 1,000,000+ Free 0 Everyone Shopping July 30, 2018 3.9 4.4 and up
10768 AAFP MEDICAL 3.8 63 24M 10,000+ Free 0 Everyone Medical June 22, 2018 2.3.1 5.0 and up

1181 rows × 13 columns

google_raw[google_raw['App'] == '8 Ball Pool']
App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
1675 8 Ball Pool GAME 4.5 14198297 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
1703 8 Ball Pool GAME 4.5 14198602 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
1755 8 Ball Pool GAME 4.5 14200344 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
1844 8 Ball Pool GAME 4.5 14200550 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
1871 8 Ball Pool GAME 4.5 14201891 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
1970 8 Ball Pool GAME 4.5 14201604 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
3953 8 Ball Pool SPORTS 4.5 14184910 52M 100,000,000+ Free 0 Everyone Sports July 31, 2018 4.0.0 4.0.3 and up
google_raw[google_raw['App'] == 'Instagram']
App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
2545 Instagram SOCIAL 4.5 66577313 Varies with device 1,000,000,000+ Free 0 Teen Social July 31, 2018 Varies with device Varies with device
2604 Instagram SOCIAL 4.5 66577446 Varies with device 1,000,000,000+ Free 0 Teen Social July 31, 2018 Varies with device Varies with device
2611 Instagram SOCIAL 4.5 66577313 Varies with device 1,000,000,000+ Free 0 Teen Social July 31, 2018 Varies with device Varies with device
3909 Instagram SOCIAL 4.5 66509917 Varies with device 1,000,000,000+ Free 0 Teen Social July 31, 2018 Varies with device Varies with device

We find that it is predominantly the number of reviews and just simple duplicate entry that accounts for the duplication. As such, we will remove the duplicate observations through the built-in function:

google_raw = google_raw.drop_duplicates(subset=['App'])
#quick sanity check
len(google_raw['App'])-len(google_raw['App'].unique())
0

Next, we will limit our attention to apps in English. We will do so through a function that inspects the frequency of non-ASCII characters and then apply this to our App variable. Note: ultimately, this will remove 30 apps.

def isAscii(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
google_raw = google_raw[google_raw['App'].apply(isAscii)]

Our final piece of clean-up for the Google data is to filter out the paid apps, of which there are 753.

google_clean = google_raw[google_raw['Type']== 'Free']

We complete these same steps for the Apple data, without additional comment.

len(apple_raw['track_name'])-len(apple_raw['track_name'].unique())
2
apple_raw = apple_raw.drop_duplicates(subset=['track_name'])
apple_raw = apple_raw[apple_raw['track_name'].apply(isAscii)]
apple_clean = apple_raw[apple_raw['price']== 0]

<h2>Most Common Apps by Genre</h2>

We are interested in identifying the most successful apps on both markets. With our data now in a clean and usable format, we start this analysis by identifying the most common genres for each market. We accomplish this by looking at the frequency occurence of each genre.

#first we can inspect the counts
apple_clean['prime_genre'].value_counts()
Games                1872
Entertainment         254
Photo & Video         160
Education             118
Social Networking     106
Shopping               84
Utilities              81
Sports                 69
Music                  66
Health & Fitness       65
Productivity           56
Lifestyle              51
News                   43
Travel                 40
Finance                36
Weather                28
Food & Drink           26
Reference              18
Business               17
Book                   14
Medical                 6
Navigation              6
Catalogs                4
Name: prime_genre, dtype: int64
#a simple adjustment returns these as percentages of the total
apple_clean['prime_genre'].value_counts(normalize=True)
Games                0.581366
Entertainment        0.078882
Photo & Video        0.049689
Education            0.036646
Social Networking    0.032919
Shopping             0.026087
Utilities            0.025155
Sports               0.021429
Music                0.020497
Health & Fitness     0.020186
Productivity         0.017391
Lifestyle            0.015839
News                 0.013354
Travel               0.012422
Finance              0.011180
Weather              0.008696
Food & Drink         0.008075
Reference            0.005590
Business             0.005280
Book                 0.004348
Medical              0.001863
Navigation           0.001863
Catalogs             0.001242
Name: prime_genre, dtype: float64

Our results indicate that, amongst free English-langauge apps on the App store, games consititute more than half of the total. More generally, apps oriented towards amusement (e.g., games, entertainment, social networking, sports) comprise about 80% of the market, while apps with practical utility (e.g., education, shopping, utilities, productivity, lifestyle) comprise the remaining 20%. However, more work will be needed below as these counts do not indicate the number of users.

For the Google Play store, we find:

google_clean['Category'].value_counts(normalize=True)
FAMILY                 0.184404
GAME                   0.098747
TOOLS                  0.084415
BUSINESS               0.045932
LIFESTYLE              0.039048
PRODUCTIVITY           0.038935
FINANCE                0.037016
MEDICAL                0.035210
SPORTS                 0.033969
PERSONALIZATION        0.033179
COMMUNICATION          0.032389
HEALTH_AND_FITNESS     0.030809
PHOTOGRAPHY            0.029455
NEWS_AND_MAGAZINES     0.027988
SOCIAL                 0.026634
TRAVEL_AND_LOCAL       0.023361
SHOPPING               0.022458
BOOKS_AND_REFERENCE    0.021442
DATING                 0.018621
VIDEO_PLAYERS          0.017831
MAPS_AND_NAVIGATION    0.013994
EDUCATION              0.012865
FOOD_AND_DRINK         0.012414
ENTERTAINMENT          0.011285
LIBRARIES_AND_DEMO     0.009367
AUTO_AND_VEHICLES      0.009254
HOUSE_AND_HOME         0.008351
WEATHER                0.008013
EVENTS                 0.007110
ART_AND_DESIGN         0.006771
PARENTING              0.006546
COMICS                 0.006207
BEAUTY                 0.005981
Name: Category, dtype: float64

indicating a drastically different marketplace. There is a significantly greater emphasis on labelling apps intended for children (i.e., family). Games are significantly less represented. Amusement apps, more generally, occupy either about 22% or 40%, depending on whether we include the family category. And, practical utility apps occupy about 60% of the market.

Overall, because the markets utilize different schemas and the number of apps does not indicate the number of users, we need to extend this analysis.

<h2>Most Popular Apps by Genre: App Store</h2>

One measure of popularity would be the average number of installs for each app genre. Unfortunately, this information is not available in the Apple data, but we can use the number of user ratings as a proxy. Doing so does have a number of hidden variables (e.g.,user enthusiasm, user happiness, user expectations, genre variation), so our results should be taken with some hesitation.

#group by genre, limit results to rating count, generate median
apple_clean.groupby(by="prime_genre")['rating_count_tot'].median().sort_values(ascending=False)
prime_genre
Productivity         8737.5
Navigation           8196.5
Reference            6614.0
Shopping             5936.0
Social Networking    4199.0
Music                3850.0
Health & Fitness     2459.0
Photo & Video        2206.0
Finance              1931.0
Sports               1628.0
Food & Drink         1490.5
Catalogs             1229.0
Entertainment        1197.5
Business             1150.0
Lifestyle            1111.0
Utilities            1110.0
Games                 904.0
Travel                798.5
Education             606.5
Medical               566.5
Book                  421.5
News                  373.0
Weather               289.0
Name: rating_count_tot, dtype: float64
apple_clean.groupby(by="prime_genre")['rating_count_tot'].describe()
count mean std min 25% 50% 75% max
prime_genre
Book 14.0 39758.500000 71324.552101 0.0 2.25 421.5 61044.75 252076.0
Business 17.0 7491.117647 10937.996228 11.0 392.00 1150.0 8623.00 38681.0
Catalogs 4.0 4004.000000 6279.618354 213.0 376.50 1229.0 4856.50 13345.0
Education 118.0 7003.983051 21768.784453 0.0 72.25 606.5 3929.25 162701.0
Entertainment 254.0 14029.830709 38978.940379 0.0 190.50 1197.5 7534.00 308844.0
Finance 36.0 31467.944444 59291.260242 0.0 72.00 1931.0 26411.75 233270.0
Food & Drink 26.0 33333.923077 79117.344084 1.0 60.25 1490.5 9406.25 303856.0
Games 1872.0 22812.924679 94903.404955 0.0 127.75 904.0 7424.50 2130805.0
Health & Fitness 65.0 23298.015385 79689.893021 0.0 115.00 2459.0 7754.00 507706.0
Lifestyle 51.0 16485.764706 53313.761068 0.0 261.00 1111.0 6171.00 342969.0
Medical 6.0 612.000000 664.210509 0.0 7.25 566.5 1174.50 1341.0
Music 66.0 57326.530303 182381.869800 0.0 569.25 3850.0 23445.25 1126879.0
Navigation 6.0 86090.333333 140543.142497 5.0 1035.75 8196.5 119386.00 345046.0
News 43.0 21248.023256 59704.048306 0.0 113.50 373.0 9933.50 354058.0
Photo & Video 160.0 28441.543750 174228.470086 0.0 249.75 2206.0 11960.50 2161558.0
Productivity 56.0 21028.410714 35409.954611 0.0 647.75 8737.5 22278.25 161065.0
Reference 18.0 74942.111111 232099.622506 0.0 729.00 6614.0 18210.50 985920.0
Shopping 84.0 26919.690476 60002.319755 0.0 582.25 5936.0 19590.25 417779.0
Social Networking 106.0 71548.349057 310475.511080 0.0 434.75 4199.0 28539.75 2974676.0
Sports 69.0 23008.898551 50891.091415 0.0 131.00 1628.0 16979.00 290996.0
Travel 40.0 28243.800000 80065.894662 0.0 185.25 798.5 13284.00 446185.0
Utilities 81.0 18684.456790 58816.596859 0.0 83.00 1110.0 8873.00 479440.0
Weather 28.0 52279.892857 107252.562636 0.0 24.75 289.0 46570.00 495626.0

A box and whisker plot is a little easier to see the above (because of the difference in scale I won’t plot all of them)

apple_clean[apple_clean['prime_genre']== 'Catalogs'].boxplot(grid=False,column='rating_count_tot')
<AxesSubplot:>

png

But, if we focus on one of the genres with a large standard deviation, it becomes rather ineffective.

apple_clean[apple_clean['prime_genre']== 'Music'].boxplot(grid=False,column='rating_count_tot')
<AxesSubplot:>

png

This is because there are substantial outliers in 6 genres (Music, Navigation, Photo & Video, Reference, Social Networking, and Weather). It would be helpful to identify which apps are the outliers for these genres.

#filter to chosen genre, limit results to two columns, sort the values, and limit results to the top five
apple_clean[apple_clean['prime_genre'] == 'Music'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
4 Pandora - Music & Radio 1126879
8 Spotify Music 878563
35 Shazam - Discover music, artists, videos & lyrics 402925
60 iHeartRadio – Free Music & Radio Stations 293228
151 SoundCloud - Music & Audio 135744
apple_clean[apple_clean['prime_genre'] == 'Navigation'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
49 Waze - GPS Navigation, Maps & Real-time Traffic 345046
130 Google Maps - Navigation & Transit 154911
881 Geocaching® 12811
1633 CoPilot GPS – Car Navigation & Offline Maps 3582
3987 ImmobilienScout24: Real Estate Search in Germany 187
apple_clean[apple_clean['prime_genre'] == 'Photo & Video'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
1 Instagram 2161558
54 Snapchat 323905
65 YouTube - Watch Videos, Music, and Live Streams 278166
166 Pic Collage - Picture Editor & Photo Collage M... 123433
167 Funimate video editor: add cool effects to videos 123268
apple_clean[apple_clean['prime_genre'] == 'Reference'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
6 Bible 985920
90 Dictionary.com Dictionary & Thesaurus 200047
335 Dictionary.com Dictionary & Thesaurus for iPad 54175
551 Google Translate 26786
715 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q... 18418
apple_clean[apple_clean['prime_genre'] == 'Social Networking'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
0 Facebook 2974676
5 Pinterest 1061624
43 Skype for iPhone 373519
48 Messenger 351466
51 Tumblr 334293
apple_clean[apple_clean['prime_genre'] == 'Weather'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head()
track_name rating_count_tot
22 The Weather Channel: Forecast, Radar & Alerts 495626
89 The Weather Channel App for iPad – best local ... 208648
95 WeatherBug - Local Weather, Radar, Maps, Alerts 188583
133 MyRadar NOAA Weather Radar Forecast 150158
138 AccuWeather - Weather for Life 144214

We can infer from these results that new applications trying to enter the marketplace via one of these genres will have a difficult time pulling attention away from the juggernauts, with the possible exception of reference, which we discuss below. The genres besides these six may provide a more reasonable path to notoriety via recommentation algorithms that leverage “most popular” type metrics, and we inspect them below.

If we inspect the reference genre:

apple_clean[apple_clean['prime_genre'] == 'Reference'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
6 Bible 985920
90 Dictionary.com Dictionary & Thesaurus 200047
335 Dictionary.com Dictionary & Thesaurus for iPad 54175
551 Google Translate 26786
715 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q... 18418
738 New Furniture Mods - Pocket Wiki & Game Tools ... 17588
757 Merriam-Webster Dictionary 16849
913 Night Sky 12122
1106 City Maps for Minecraft PE - The Best Maps for... 8535
1451 LUCKY BLOCK MOD ™ for Minecraft PC Edition - T... 4693
2280 GUNS MODS for Minecraft PC Edition - Mods Tools 1497
2766 Guides for Pokémon GO - Pokemon GO News and Ch... 826
2844 WWDC 762
2895 Horror Maps for Minecraft PE - Download The Sc... 718
5721 VPN Express 14

We see that, while this genre does contain outliers, they are unique in the service they provide and do not serve the broader needs of the category. For the most part, once one moves beyond the top five apps, the remaining applications focus on reference features for popular media. This creates some potential for a new app to exist in this space without worrying about pulling users away from the christian bible or dictionaries. Secondly, based on the title offerings, there seems to be heavy correlation here with the popularity of the family genre, giving some indication of what topics have the potential to be successful within this genre. Finally, it is reasonable to assume that such apps allow for more on-screen time and there are reasonable methods that can be employed to increase user engagement (e.g., tip of the day, rare tidbits, top 10 secrets). All of the above in conjunction with our initial results in this section establishing the popularity of reference may result in an advantage for new developers to enter this space. Contrast this to, say, weather, which is popular but is also dominated by outliers, has very little latitude for a unique value proposition, and lends itself to quick checks.

Next, we see that productivity is rather evenly distributed and completely dominated by large tech companies, making it a very poor choice for a small developement company.

apple_clean[apple_clean['prime_genre'] == 'Productivity'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
123 Evernote - stay organized 161065
150 Gmail - email by Google: secure, fast & organized 135962
168 iTranslate - Language Translator & Dictionary 123215
186 Yahoo Mail - Keeps You Organized! 113709
291 Google Docs 64259
306 Google Drive - free online storage 59255
352 Dropbox 49578
361 Microsoft Word 47999
412 Microsoft OneNote 39638
475 Microsoft Outlook - email and calendar 32807
481 Hotspot Shield Free VPN Proxy & Wi-Fi Privacy 32499
514 Documents 6 - File manager, PDF reader and bro... 29110
585 Google Sheets 24602
590 Microsoft Excel 24430
644 Inbox by Gmail 21561

In shopping and food & drink, we observe that popular apps are predominantly tethered to an external store, making them also poor choices for our developers.

apple_clean[apple_clean['prime_genre'] == 'Shopping'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
30 Groupon - Deals, Coupons & Discount Shopping App 417779
71 eBay: Best App to Buy, Sell, Save! Online Shop... 262241
142 Wish - Shopping Made Fun 141960
157 shopkick - Shopping Rewards & Discounts 130823
162 Amazon App: shop, scan, compare, and read reviews 126312
198 Target 108131
212 Zappos: shop shoes & clothes, fast free shipping 103655
235 Walgreens – Pharmacy, Photo, Coupons and Shopping 88885
251 Best Buy 80424
274 Walmart: Free 2-Day Shipping,* Easy Store Shop... 70286
314 OfferUp - Buy. Sell. Simple. 57348
326 Apple Store 55171
329 Shop Savvy Barcode Scanner - Price Compare & D... 54630
382 Ibotta: Cash Back App, Grocery Coupons & Shopping 44313
422 letgo: Buy & Sell Second Hand Stuff 38424

Sports is also, as is to be expected, dominated by the large brands. It is very unlikely that an individual developer would be able to keep pace with the data collection and analysis that these organizations are able to perform, making this a poor choice.

apple_clean[apple_clean['prime_genre'] == 'Sports'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
62 ESPN: Get scores, news, alerts & watch live sp... 290996
94 Yahoo Fantasy Sports 190670
125 WatchESPN 159735
135 The Masters Tournament 148160
147 Yahoo Sports - Teams, Scores, News & Highlights 137951
289 ESPN Fantasy Football Baseball Basketball Hockey 64925
305 CBS Sports App - Sports Scores, News, Stats, W... 59639
313 FOX Sports Mobile 57500
334 2016 U.S. Open Golf Championship 54192
368 NBC Sports 47172
386 NBA 43682
411 ESPN Tournament Challenge 39642
428 2016 US Open Tennis Championships 37522
547 NFL 27317
637 MLB.com At Bat 21830

Health & Fitness does provide reasonable opportunity. It is fairly evenly distributed at the top and provides a wide array of functionality to the user. There is potential for a unique idea to establish itself within such a context.

apple_clean[apple_clean['prime_genre'] == 'Health & Fitness'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
20 Calorie Counter & Diet Tracker by MyFitnessPal 507706
42 Lose It! – Weight Loss Program and Calorie Cou... 373835
149 Weight Watchers 136833
209 Sleep Cycle alarm clock 104539
231 Fitbit 90496
338 Period Tracker Lite 53620
463 Nike+ Training Club - Workouts & Fitness Plans 33969
545 Plant Nanny - Water Reminder with Cute Plants 27421
758 Sworkit - Custom Workouts for Exercise & Fitness 16819
860 Clue Period Tracker: Period & Ovulation Tracker 13436
880 Headspace 12819
929 Fooducate - Lose Weight, Eat Healthy,Get Motiv... 11875
1000 Runtastic Running, Jogging and Walking Tracker 10298
1059 WebMD for iPad 9142
1089 8fit - Workouts, meal plans and personal trainer 8730

Finally, catalogs might provide some opportunity, under the right circumstances. Because there are so few apps, despite it being a less popular genre, there is some room for a free app to be noteworthy. However, such apps seem to require either substantial technical implemenation or a large offering of options, both of which will substantially increase the development time.

apple_clean[apple_clean['prime_genre'] == 'Catalogs'][['track_name','rating_count_tot']].sort_values(by=['rating_count_tot'], ascending=False).head(15)
track_name rating_count_tot
862 CPlus for Craigslist app - mobile classifieds 13345
2028 DRAGONS MODS FREE for Minecraft PC Game Edition 2027
3276 Face Swap and Copy Free – Switch & Fusion Face... 431
3891 Ringtone Remixes - Marimba Remix Ringtones 213

Ultimately, we have identified three areas (Reference, Health & Beauty, and Catalogs) where new developers may aim to gain visibility within the marketplace.

Closing Caveat: there is, of course, the idea that any developer is able to establish themselves with a succificiently unique value proposition, and the above analysis does not prohibit such a possibility. We are predominantly concerned with how they can establish visibility within each genre.

<h2>Most Popular Apps by Genre: Google Play</h2>

The Google data contain information on the number of installs (which seems like a reasonable metric on popularity, but could be improved if we knew the rate of duplicate installation by users), but we can see that it doesn’t provide fine detail:

google_clean['Installs'].unique()
array(['10,000+', '500,000+', '5,000,000+', '50,000,000+', '100,000+',
       '50,000+', '1,000,000+', '10,000,000+', '5,000+', '100,000,000+',
       '1,000,000,000+', '1,000+', '500,000,000+', '500+', '100+', '50+',
       '10+', '1+', '5+', '0+'], dtype=object)

Because we aren’t able to establish ordinality there will be room for improvement with our results, but these results will allow us to identify any trends within the most popular apps as a single category. For such purposes, we will identify each category with its minimum (e.g., associate [10k,50k) to 10k) and convert to a numeric data type to allow for computations.

google_clean.iloc[:,5] = google_clean.iloc[:,5].apply(lambda x: x.replace(',', '').replace('+', ''))
google_clean.iloc[:,5] = pd.to_numeric(google_clean.iloc[:,5], downcast="float")

Now we can continue as we did with the Apple data.

google_clean.groupby(by="Category")['Installs'].median().sort_values(ascending=False)
Category
ENTERTAINMENT          3000000.0
WEATHER                1000000.0
VIDEO_PLAYERS          1000000.0
EDUCATION              1000000.0
SHOPPING               1000000.0
GAME                   1000000.0
PHOTOGRAPHY            1000000.0
COMMUNICATION           500000.0
FOOD_AND_DRINK          500000.0
HEALTH_AND_FITNESS      500000.0
HOUSE_AND_HOME          500000.0
ART_AND_DESIGN          100000.0
TRAVEL_AND_LOCAL        100000.0
MAPS_AND_NAVIGATION     100000.0
AUTO_AND_VEHICLES       100000.0
PARENTING               100000.0
PERSONALIZATION         100000.0
PRODUCTIVITY            100000.0
SOCIAL                  100000.0
FAMILY                  100000.0
COMICS                  100000.0
SPORTS                  100000.0
TOOLS                   100000.0
BEAUTY                   50000.0
BOOKS_AND_REFERENCE      50000.0
NEWS_AND_MAGAZINES       50000.0
DATING                   10000.0
FINANCE                  10000.0
LIBRARIES_AND_DEMO       10000.0
LIFESTYLE                10000.0
MEDICAL                   1000.0
BUSINESS                  1000.0
EVENTS                    1000.0
Name: Installs, dtype: float32

Based on some of the reasoning we did with Apple data, there are a few categories we can quickly exclude: Weather (as discussed above), video players (technical competition, as discussed above), education (domain expertise), shopping and food/drink (brick-and-mortar requirement, as discussed above), photography (large tech domination, domain expertise), sports (as discussed above).

While Entertainment is very popular, it is dominated by very large streaming services (which would be far too difficult for a small company to try and tackle) and apps that are oriented towards children. It is theoretically possible for a small company to get a foothold in the latter consideration, but it would require substantial investments in art, design, and domain expertise in children’s entertainment. As such, this would only be a fruitful path for our hypothetical company if they wanted to focus on media for children and build out the relevant departments.

google_clean[google_clean['Category'] == 'ENTERTAINMENT'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
865 Google Play Games 1.000000e+09
855 Netflix 1.000000e+08
888 IMDb Movies & TV 1.000000e+08
893 Talking Ben the Dog 1.000000e+08
874 Talking Angela 1.000000e+08
866 Hotstar 1.000000e+08
889 Twitch: Livestream Multiplayer Games & Esports 5.000000e+07
879 Talking Ginger 2 5.000000e+07
892 PlayStation App 5.000000e+07
886 Amazon Prime Video 5.000000e+07
859 YouTube Kids 5.000000e+07
953 HBO GO: Stream with TV Package 1.000000e+07
955 PlayKids - Educational cartoons and games for ... 1.000000e+07
1002 SketchBook - draw and paint 1.000000e+07
1000 Imgur: Find funny GIFs, memes & watch viral vi... 1.000000e+07

We can see that the game category has a rather parallel difficulty to the children’s entertainment apps. Unless our hypothetical development company seeks to work in exclusivly game development and populate functional art and design departments, opportunities are limited here.

google_clean[google_clean['Category'] == 'GAME'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
1654 Subway Surfers 1.000000e+09
1655 Candy Crush Saga 5.000000e+08
1722 My Talking Tom 5.000000e+08
1661 Temple Run 2 5.000000e+08
1662 Pou 5.000000e+08
1758 Hungry Shark Evolution 1.000000e+08
5950 Banana Kong 1.000000e+08
6554 Skater Boy 1.000000e+08
9166 Modern Combat 5: eSports FPS 1.000000e+08
1781 Trivia Crack 1.000000e+08
1773 Extreme Car Driving Simulator 1.000000e+08
1764 Pokémon GO 1.000000e+08
1763 Piano Tiles 2™ 1.000000e+08
1653 ROBLOX 1.000000e+08
1743 Hill Climb Racing 2 1.000000e+08

Health and Fitness, in contrast to the Apple data, is rather homogeneous, consisting of various iterations of tracking applications. This may indicate a gap that could be filled, or it could point to a substantial difference in how the two markets operate. We will mark this as a potential category, but we will need to do future work that inspects if the difference in categorization accounts for the difference in app variety between the two data sets.

google_clean[google_clean['Category'] == 'HEALTH_AND_FITNESS'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
5596 Samsung Health 500000000.0
1360 Period Tracker - Period Calendar Ovulation Tra... 100000000.0
1286 Calorie Counter - MyFitnessPal 50000000.0
1256 Home Workout - No Equipment 10000000.0
1344 Headspace: Meditation & Mindfulness 10000000.0
1283 Garmin Connect™ 10000000.0
1361 Period Tracker Clue: Period and Ovulation Tracker 10000000.0
1357 Period Tracker 10000000.0
1289 Endomondo - Running & Walking 10000000.0
1292 Runkeeper - GPS Track Run Walk 10000000.0
1277 Runtastic Running App & Mile Tracker 10000000.0
1296 8fit Workouts & Meal Planner 10000000.0
1317 Google Fit - Fitness Tracking 10000000.0
1316 Daily Workouts - Exercise Fitness Routine Trainer 10000000.0
1312 Nike Training Club - Workouts & Fitness Plans 10000000.0

House and Home contains some potential in the design aspects (I assume rental/sales aggregators would be difficult to gain traction against). If we combine this with our observations from the reference category in the apple data, there may be room here to focus on or partner with the correct brand to create a focused app with a pre-existing user base.

google_clean[google_clean['Category'] == 'HOUSE_AND_HOME'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
1446 Zillow: Find Houses for Sale & Apartments for ... 10000000.0
1454 Trulia Real Estate & Rentals 10000000.0
1449 Realtor.com Real Estate: Homes for Sale and Rent 10000000.0
4205 tinyCam Monitor FREE 10000000.0
1456 Houzz Interior Design Ideas 10000000.0
1470 DaBang - Rental Homes in Korea 5000000.0
1465 Trulia Rent Apartments & Homes 5000000.0
1482 Alfred Home Security Camera 5000000.0
1450 Real Estate sale & rent Trovit 5000000.0
1453 CYANOGEN. Rent, buy an apartment, a room, a co... 1000000.0
9984 Room Creator Interior Design 1000000.0
1479 Apartments.com Rental Search 1000000.0
1491 ColorSnap® Visualizer 1000000.0
1476 Rent.com Apartment Homes 1000000.0
1494 Room Painting Ideas 1000000.0

Art and Design certainly is dominated by technical implementations that would be hard for a small company to compete with. But, like House and Home above, apps like Tattoo Name and Garden Coloring indicate that niche subjects may be able to get a foothold in this category. Of course, how to identify the correct niche so that the dice roll is guaranteed to be a success is beyond my expertise.

google_clean[google_clean['Category'] == 'ART_AND_DESIGN'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
3 Sketch - Draw & Paint 50000000.0
42 Textgram - write on photos 10000000.0
45 Canva: Poster, banner, card maker & graphic de... 10000000.0
19 ibis Paint X 10000000.0
12 Tattoo Name On My Photo Editor 10000000.0
2 U Launcher Lite – FREE Live Cool Themes, Hide ... 5000000.0
18 FlipaClip - Cartoon animation 5000000.0
37 Floor Plan Creator 5000000.0
7 Infinite Painter 1000000.0
8 Garden Coloring Book 1000000.0
10 Text on Photo - Fonteee 1000000.0
11 Name Art Photo Editor - Focus n Filters 1000000.0
22 Superheroes Wallpapers | 4K Backgrounds 500000.0
16 Photo Designer - Write your name with shapes 500000.0
1 Coloring book moana 500000.0

Travel and local would be very difficult for a small company to break into. Certainly, there may be local apps that could gain traction in a focused locality. But, by definition, these could never be large, break-out applications.

google_clean[google_clean['Category'] == 'TRAVEL_AND_LOCAL'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
3127 Google Street View 1.000000e+09
3117 Maps - Navigate & Explore 1.000000e+09
3121 Google Earth 1.000000e+08
3112 Booking.com Travel Deals 1.000000e+08
3115 TripAdvisor Hotels Flights Restaurants Attract... 1.000000e+08
3103 trivago: Hotels & Travel 5.000000e+07
3125 VZ Navigator 5.000000e+07
9833 MAPS.ME – Offline Map and Travel Navigation 5.000000e+07
3151 2GIS: directory & navigator 5.000000e+07
3130 Goibibo - Flight Hotel Bus Car IRCTC Booking App 1.000000e+07
3142 Foursquare Swarm: Check In 1.000000e+07
3143 PagesJaunes - local search 1.000000e+07
3144 Flightradar24 Flight Tracker 1.000000e+07
3145 Yatra - Flights, Hotels, Bus, Trains & Cabs 1.000000e+07
3149 Despegar.com Hotels and Flights 1.000000e+07

The peronalization and tools categories appear to be rather haphazard grab-bags of add-on features. Some of these would require large catalogs to compete against (e.g., Backgrounds HD). The antivirus applications would provide the same technical domination that we have seen numerous times above. But, given the correct novel add-on feature, there does seem to be room here to gain some traction. Hence, we will mark them as possibilities.

google_clean[google_clean['Category'] == 'PERSONALIZATION'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
3446 GO Keyboard - Emoticon keyboard, Free Theme, GIF 100000000.0
4474 Parallel Space - Multiple accounts & Two face 100000000.0
3354 ZEDGE™ Ringtones & Wallpapers 100000000.0
3425 Backgrounds HD (Wallpapers) 100000000.0
3385 Hola Launcher- Theme,Wallpaper 100000000.0
3360 CM Launcher 3D - Theme, Wallpapers, Efficient 100000000.0
3374 APUS Launcher - Theme, Wallpaper, Hide Apps 100000000.0
4812 GO Launcher - 3D parallax Themes & HD Wallpapers 100000000.0
3400 Ringtone Maker 50000000.0
3428 Koi Free Live Wallpaper 50000000.0
3382 Yandex Browser with Protect 50000000.0
3365 ZenUI Launcher 50000000.0
3352 Nova Launcher 50000000.0
3443 iKeyboard - emoji, emoticons 10000000.0
3444 Simple Neon Blue Future Tech Keyboard Theme 10000000.0
google_clean[google_clean['Category'] == 'TOOLS'][['App','Installs']].sort_values(by=['Installs'], ascending=False).head(15)
App Installs
3234 Google 1.000000e+09
3265 Gboard - the Google Keyboard 5.000000e+08
7536 Security Master - Antivirus, VPN, AppLock, Boo... 5.000000e+08
3235 Google Translate 5.000000e+08
4005 Clean Master- Space Cleaner & Antivirus 5.000000e+08
3255 SHAREit - Transfer & Share 5.000000e+08
4808 Avast Mobile Security 2018 - Antivirus & App Lock 1.000000e+08
3266 Google Korean Input 1.000000e+08
3333 Speedtest by Ookla 1.000000e+08
4151 Google Now Launcher 1.000000e+08
3272 Share Music & Transfer Files - Xender 1.000000e+08
5077 AppLock 1.000000e+08
8896 DU Battery Saver - Battery Charger & Battery Life 1.000000e+08
4080 Lookout Security & Antivirus 1.000000e+08
5695 AVG AntiVirus 2018 for Android Security 1.000000e+08

Thus, we have identified five categories in which a novel idea may be able to gain visibility: health and fitness, house and home, art and design, personlization, and tools