Author Topic: Native habitats spreadsheet  (Read 1734 times)

KarenRei

  • Arctic Member
  • Hero Member
  • *****
  • Posts: 1806
    • Reykjavík, Iceland
    • View Profile
Native habitats spreadsheet
« on: April 30, 2018, 07:55:55 AM »
So, I made a thing  ;)

https://docs.google.com/spreadsheets/d/1P-AoOGdl2ZHcm1Es0p5VLGsLWp3ffKbpYQuhpkcXCSg/edit?usp=sharing

Still very much a work in progress, but I thought I'd share: as a tiny portion of the database project I've been working on, I wrote a program that parses plant habitat descriptions and combines that with climate data to determine what sort of native environments they're grown in (this is combined with some curated data concerning what conditions the plants are known to like growing in.).    It looks at the plant's native altitude range and only includes points within the habitat areas which are within that altitude range.   If it can't find any locations that match the stated altitude in the stated range (e.g. the resolution is too coarse), it uses what data it did find and adjusts temperature, etc for the altitude difference. 

Now, some caveats.

1) It's a computer without a brain having to read text.  Subtlety will pass it by.  I've tried to include common edge cases - for example, "Located in X, not found in Y" - it'll see Y but not parse it.  It also tries to, when it sees "Found in X (Y)", where either X or Y is a subregion of the other, only parsing the subregion, not the whole.  But expect some mistakes.

2)  A lot of the mistakes are in the habitat description itself. For example: Artocarpus lakucha comes up with an average wintertime low in its range of 6,1°C.  Now, we know that it's not native to such cold climates. How did it come up with that?   Well, the range description is "E. Asia - Malaysia, Sumatra, China, India, Nepal, Sikkim, Myanmar, Laos, Vietnam, Indonesia, Philippines".  So it's looking at the climate of all of those - including all of China.  We know in reality that you'd only find it in the warmer parts of China, but that's not what the description says.  So, vaguery = bad  ;)

Feel free to improve habitat descriptions to be more accurate!  You can use countries, subregions, cities, etc.  Just try to make sure that you don't name a place that has a more major (population, significance, etc) area with the same name!  It understands adjectives such as cardinal directions (including e.x. "northeast", but not "northnortheast") as well as "central".  Scattered city listings are just fine.  Let me know when you make any changes of significance and I can re-run it.

3) It does not understand the common wording "through" - e.g. "Portugal through Greece".  It will only look at the endpoints.  Again, feel free to improve this by being more specific.

4) The data behind it is the same data behind this site:

http://climatemaps.romgens.com/

But some of that data doesn't match other sites.  For example, they show a much dimmer winter in Manaus than you get when you punch Manaus into PVWatts.  I'm still trying to reconcile that.  Another example is in humidity; the average humidity figures seem to match the figures you'll see reported for average humidities for cities on Wikipedia, but when you go to daily weather histories on Weather Underground, it feels a bit off, particularly on daytime humidity.  Again, not sure how to reconcile that; this is just coming from the data I'm given.

5) There's still a good number of duplicate / synonym species that I need to work out, and a LOT more curating that needs to be done.  A known bug is that sometimes it'll list "kill temperatures" as "minimum acceptable temperatures", although it's generally very obvious when it does that.

6) It ignores everything up to the first dash (if an "early first dash" is present), to avoid parsing e.g. whole continents.

All that said... enjoy  :)  Don't be too hard on me about errors, this is a first draft (there might even be some alignment errors pasting into Google Docs!  I haven't had much time to go over it). Just point them out, and where you can and/or improve the data (anything at where it says "Known preferred climate" or to the right) and let me know what you changed (otherwise it might accidentally get wiped out when I do future runs!)  I see one issue I'm going to check into this evening, where an acacia is getting a lower temperature rating than I'd expect.  I want to make sure that at least the algorithm is doing everything right, even if some of the habitats are poorly described  :)

(I have a LOT more data I'm collecting, but I'm still going through it)
« Last Edit: April 30, 2018, 08:49:02 AM by KarenRei »
Já, ég er að rækta suðrænar plöntur á Íslandi. Nei, ég er ekki klikkuð. Jæja, kannski...

Future

  • The Future
  • Hero Member
  • *****
  • Posts: 2030
    • View Profile
Re: Native habitats spreadsheet
« Reply #1 on: April 30, 2018, 09:05:58 AM »
...and what do you do for a living?

KarenRei

  • Arctic Member
  • Hero Member
  • *****
  • Posts: 1806
    • Reykjavík, Iceland
    • View Profile
Re: Native habitats spreadsheet
« Reply #2 on: April 30, 2018, 09:14:42 AM »
Professional nerd.  ;)   (computer programmer)
Já, ég er að rækta suðrænar plöntur á Íslandi. Nei, ég er ekki klikkuð. Jæja, kannski...

HibachiDrama

  • Member
  • ***
  • Posts: 88
    • Jacksonville
    • View Profile
Re: Native habitats spreadsheet
« Reply #3 on: April 30, 2018, 10:35:18 AM »
Where did you aggregate all this info? Is the min/max data based on converting known habitats aka Range to discrete points / well defined geographical areas and further translating those areas to min/max values? Or just a composite reported from various sources?

Speaking of synonyms, how are you deduping? Are you tracking common names as well?

Interesting project!

KarenRei

  • Arctic Member
  • Hero Member
  • *****
  • Posts: 1806
    • Reykjavík, Iceland
    • View Profile
Re: Native habitats spreadsheet
« Reply #4 on: April 30, 2018, 11:08:43 AM »
Where did you aggregate all this info? Is the min/max data based on converting known habitats aka Range to discrete points / well defined geographical areas and further translating those areas to min/max values? Or just a composite reported from various sources?

Speaking of synonyms, how are you deduping? Are you tracking common names as well?

Interesting project!

Common names are tracked elsewhere in the DB.  This is just one tiny part of it.  :)  Honestly, if I had to choose the hardest part of the DB, it's finding pollination data for different species. There's no central collections, and not all info you can find is as readily available as others. E.g. whether a plant has hermaphroditic flowers may be proportionally common info, whether they're functionally dioecious or not is less common (and if so, what kind of dioecy), whether the plant will self-fertilize or is parthenocarpic is less common still, and probably the least common: whether a plant that isn't self fertile is nonetheless self-compatible.  For far too many species I'm having to flag data as "best guesses based on  relatives", which I don't like doing.

Habitat and range info is originally from PFAF and the sister site Useful Tropical Plants; I made donations to each of them for use of their data (of course, they just took their habitat info from various books and papers, so it's not really "their" data to begin with - but I also wanted to support their good aggregation work, and plan to support them more in the future - and encourage others to as well, as it makes projects like this possible  :)   ).  However, as I've been going through curating species some of them have been being changed to be more accurate.  Altitudes come from a variety of sources, including habitat descriptions, but also other sources.  One weakness I forgot to mention above is sometimes a plant has different altitude ranges in different areas; my code is only equipped to deal with a single altitude range.

The hardest part in the above program was defining geographic boundaries; I found some couple hundred boundary files (and use them), but boundaries for most places aren't so readily available. So I use the Geonames heirarchy; if an object in the database (with its own lat, lon, and alt coordinates) says it belongs to some higher-order region, then it adds its lat/lon/alt to that region.  Lat/lon/alt entries are grouped into grid points, 360 of lat by 720 of lon, in order to map to the IPCC climate data (I use climate data from the past 20 years, although I could see arguments for using older data instead).

Once I have grid points for each place - be it an entire country, state/province, country, city, other place, etc, it parses the text, and attempts to match up direction adjectives (north, northwest, west,... etc)  with nouns (aka placenames) - e.g. "northern and central Kenya" maps to "north Kenya" and "central Kenya".  If there's no adjective it uses the entire locale stated.  Northwest / southwest / northeast / southeast are interpreted as being anywhere to one side of the place's centre on both axes (aka, quadrants).  North / south / east / west have to be at least a certain distance away from the centre on their defining axis (aka, rather than each taking up about half of the country, they take up more like a quarter of the locale). The adjective "central" describes about a third of the place in question.  The exact amounts depend on the shape.  Of course, if a species' range description says "China", but they really mean "southeast China"... hey, the program isn't psychic!  ;)

By mapping the species to locales, we now have a list of gridpoints for each species.  It then averages the data from the gridpoints (in each category, and across all 12 months), for all of them that fall within the species' altitude range (with the aforementioned exception that if none fall within the altitude range, it uses all datapoints present and applies a lapse rate to adjust for altitude). It then returns, for each category, the averages for the lowest-value month (so for example, for rainfall, average per-day rainfall in the driest month), the average of the whole year (in the aforementioned example, average per-day rainfall across the year), and the highest-value month (e.g., the per-day rainfall for the wettest month)

Oh, lastly species duplications: I use two authoritative species list registries (unfortunately, I don't remember which ones off the top of my head), which come with synonym lists.  Unfortunately, the synonym lists are only within each given genus, so for example you'll see some rheedias in there and things like that.  But I'll work out the glaring examples of that eventually  :)  The policy is to always use current names, even if they're annoying and nobody uses them - for example, Rosenbergiodendron formosum rather than Randia formosa.

My purpose for making the DB is to serve my needs, and it hasn't really been designed with the general public in mind, as there's some data that I don't have the rights to share (and so I've been writing a number of things in Icelandic  ;)  ).  But I figured that this aspect of it might be nice to share with people  :)
« Last Edit: April 30, 2018, 11:26:20 AM by KarenRei »
Já, ég er að rækta suðrænar plöntur á Íslandi. Nei, ég er ekki klikkuð. Jæja, kannski...

HibachiDrama

  • Member
  • ***
  • Posts: 88
    • Jacksonville
    • View Profile
Re: Native habitats spreadsheet
« Reply #5 on: April 30, 2018, 12:13:59 PM »
How are you applying/utilizing what you've created?

I'd been thinking about doing something similar, but mostly from the perspective of "What doesn't grow here, but could?", and also to index "time from germination to fruiting" (I think you asked about quick fruiting plants previously?), productivity, usefulness.

I'm a SQL Server consultant with a background in data services, so this kind of thing is interesting to me.

KarenRei

  • Arctic Member
  • Hero Member
  • *****
  • Posts: 1806
    • Reykjavík, Iceland
    • View Profile
Re: Native habitats spreadsheet
« Reply #6 on: April 30, 2018, 12:51:15 PM »
How are you applying/utilizing what you've created?

I'd been thinking about doing something similar, but mostly from the perspective of "What doesn't grow here, but could?", and also to index "time from germination to fruiting" (I think you asked about quick fruiting plants previously?), productivity, usefulness.

I'm a SQL Server consultant with a background in data services, so this kind of thing is interesting to me.

I'm involved at the design phase in a project to build a group of Biodomes in Reykjavík, designed to incorporate cultivation of exotics into commercial / retail space with wellness centre, tourism, etc potential. Aka, lots of glass and plants around the periphery filter away ~85% of it, with shade plants / vines around the middle.  So we want to maximize production, beauty, fruit quality, "interestingness", and many other parameters in each of several different environmental zones  (I'm pushing for four main zones - ~85% mediterranean / 15% desert in the main dome to maintain comfort, ~75% ultratropical / 25% cerrado in the secondary dome). And of course since plants grow and take various lengths of time to reach productivity with various lifespans, we need a succession plan for each area. And since no plant is guaranteed (survival, productivity, quality, excess males, etc), we need to cultivate more than we have place for, with more "surplus" available the further we go into the future. Every area likewise has to have a pruning / training plan to maximize floor space - and plants that can't be pruned (e.g. palms, etc) need to have a relocation or end-of-life plan.  Large numbers of plants to be grown, huge number of possibilities for them, many different parameters... a database was clearly called for to help manage it all!

The main thing I'm glad about is that the project isn't trying to "skimp" (if I were in charge I probably would be because I'm cheap, lol  ;)  ).  The value of the commercial space is so high that if you're going to pay for all of that, there's no point to skimping on care of the plants that are the attraction.  So for example, when I showed Hjördis the financials for running several hundred kilowatts of lights, enough to recreate full sun levels in Manaus, she hardly blinked; she had already penciled in a similar cost for grow lighting.  Maybe we'll even have the budget to outdo the sun in some places  ;) She wants quality fixtures, too, which would be lovely to work with vs. the cheap Chinese junk I'm used to. With a lot of these systems you can alter the spectrum over the course of the day. Since red is more efficient, but white is more comfortable, I'm pushing for "warm white" during the day, with extended "sunrise" / "sunset" periods that are more heavily red - and then of course red after closing time.  Of course, no supplement is needed in our summer's 24/7 lighting   ;)   The budget also means that we don't have to do everything from seed, or even small plants; we can afford to ship in most at ~1m tall (for species that are available from sellers under phytosanitary control) and a few select plants at several meters high.  So as you can imagine, we'll initially flush the place green with fast growing vines, weedy trees like moringa, edible bamboos, etc, plus early producers like bananas, papaya, bushes, annuals, etc - and then slowly phase them out for plants that need a few years to get established... and then those that need many years to get established.  I'm pushing to have at least one "showpiece tree" in the highest point in the main dome, allowed to get up to ~15m or so (but they'll need to be fairly narrow). Something to really impress.
« Last Edit: April 30, 2018, 12:57:10 PM by KarenRei »
Já, ég er að rækta suðrænar plöntur á Íslandi. Nei, ég er ekki klikkuð. Jæja, kannski...

KarenRei

  • Arctic Member
  • Hero Member
  • *****
  • Posts: 1806
    • Reykjavík, Iceland
    • View Profile
Re: Native habitats spreadsheet
« Reply #7 on: April 30, 2018, 03:32:08 PM »
ED: Just figured out why that acacia was yielding such cold climate figures: it stated "naturalized in S. Europe".... and Russia had been included in Europe (!)... and so it was including south Russia in the list!  You know, all of those Russian acacia trees  ;)

I've gone and custom defined  not just Europe, but the specific regions of Europe, so that should fix the problem.  Re-running the data will (as always) take several hours, however.
Já, ég er að rækta suðrænar plöntur á Íslandi. Nei, ég er ekki klikkuð. Jæja, kannski...

 

SMF spam blocked by CleanTalk