I've seen a number of critiques and write-ups recently about how monolithic repositories are intrinsically better for developing large projects than using a multi-repository approach. In the past year, we went the other direction, splitting our monolithic repository into individual component repositories, each with their own history, tests, and documentation. This is a summary of our experience.
After we split into component repositories, one thing we noticed almost immediately was how many dependencies each component had. Surprisingly, most of these were listed as optional dependencies, or development dependencies, but it was not unusual for a component to pull in a dozen others. As such, one big effort has been identifying which components are truly required, and we've been able to reduce overall dependencies enormously. In one extreme case, zend-mvc, we went from almost three dozen dependencies down to fewer than a dozen!
When developing within a monolithic framework, it's easy to see that a piece of code duplicates something from elsewhere in the framework, so you just re-use that existing code. The problem is that in many cases, this is not repetition; the solution, while it may be the same, should be considered part of the current domain. As an example, zend-filter provides a number of "inflector" filters, for transforming things like CamelCase words to dash-separated (and vice versa). As such, if a component needed to do something like this, it would pull in zend-filter - when it could be inlined into the component in a single regex.
Once we started seeing the pattern, we realized that one way to reduce dependencies was to identify these cases of code re-use and consider if DRY really applied, or if the logic was something that should be considered part of the domain.
Remember how I mentioned a lot of dependencies were listed as suggestions? Well, it turns out that we have a lot of optional functionality such as adapter implementations that may have specific requirements if used, but are not necessarily a core feature.
We've discovered that if we separate those into their own packages, we make it more clear to end-users what functionality is really required by a package, and we are able to do some cool optimizations such as making the new packages depend on specific PHP extensions, or third-party libraries, making it crystal clear to end-users what is required and why.
Due to the size of our documentation, we split it out into its own repository when we started development on Zend Framework 2. I'd argue this was one of the biggest mistakes we have made in the project's lifespan. Due to the fact that the documentation was in an entirely different repository, it was logistically difficult to require that new feature contributions were documented; we'd merge the feature, and request the author write up the docs for it… and essentially forget about it entirely from that point forward. This has led to having out-of-date documentation in the best situations, and huge feature sets completely undocumented in the worst.
With the split into individual component repositories, documentation becomes smaller, and trivial to include directly in the repository. This then allows us to block new features until the author submits documentation. It also makes finding documentation simpler (it's where the code is!), as well as contributing documentation simpler (again, it's where the code is, and follows the same process as any other contribution).
We also switched our format once again. In the Zend Framework 1 days, we used DocBook XML, because of the rich toolset. But it was insanely difficult to build. For ZF2, we switched to reStructured Text, which was simpler, but, again, didn't have a toolset that made rendering easy, and even GitHub doesn't render every aspect of it correctly. In the vein of simplification, we now use Markdown, which has fostered more contributions, as it's essentially the lingua franca of the development world.
I was often approached by would-be contributors with questions of "where do I fix this?" With a large repository, finding the affected code can often resemble looking for a needle in a haystack, even when the code is segregated in subdirectories. Additionally, we'd see a lot of contributions that spanned multiple components, not because they needed to, but because, as the contributor was tracing calls, they'd notice something and change it.
Switching to individual component repositories has made the contributions laser-focused. Developers know exactly where a change needs to be made, and they cannot and do not change code across multiple repositories unnecessarily.
With each repository serving a very specific purpose, developers are able to better grasp the component in its entirety, without the distraction of everything else. We've seen more and better contributions, as developers are not intimated by contributing to THE FRAMEWORK, but are instead focused on the specific component in which they see a need. With everything they need right in front of them, they make quality changes, and get immediate Continuous Integration feedback (tests take seconds, not minutes!), allowing them to know the correctness of their approach.
Several people have also stepped forward to maintain these individual repositories, allowing them to advance in ways they would not have under a monolithic repo.
One thing I neglected to mention was that having the repositories split has also meant that we're able to advance much more quickly. Components that are getting attention can get immediate versions, instead of waiting for an accumulation of changes that warrant a new framework release. As such, we now have some components that are barely beyond their 2.4 (our LTS release) counterparts, and others that are versioned in 2.7 or 2.8 series.
In other words, separating the components has allowed us to get fixes and new features into users hands faster, which makes everyone happy.
There have been some definite issues - coordinating the refactors around zend-servicemanager and zend-eventmanager forwards compatibility was difficult due to dependency issues. But that also helped us identify when we had too many components intermingling dependencies, allowing us to untangle those and simplify.
Educating users that they can specify a minimum or a maximum version in their composer.json files has also been interesting. But those conversations also help raise the overall awareness of these tools, and increase their reach in the PHP ecosystem.